----------
From: Stefano Mazzocchi stefano@apache.org
To: _ Cocoon cocoon@list.working-dogs.com
Subject: [INFO] Some light on Cocoon's technolgies
Date: Wed, Jun 23, 1999, 5:59 PM
Hi,
since many people do not seem to understand the global picture about the
technologies used by Cocoon, I will try to explain my vision of these
technologies as well as some information that might be useful to you to
jump in and help with its development.
XML
---
XML (eXtended Markup Language) is an SGML (Structured Generic Markup
Language, but I'm not perfectly sure about this acronym) syntax (or you
may call it a meta-language) that allows common formatting of data.
SGML is the father of all markup languages and its a 15-years old ISO
standard for creating languages. XML is a lighter version of SGML. So
XML is NOT a language, but a syntax, unlike HTML.
XML is usually referred to as "portable data" in the sense that its
parsing is "application indipendent" and one XML parser can read every
possible XML document, one describing your bank account, another
describing your favorite italian meal, etc. This is, as you all know,
impossible with other file formats which are text based or binary. Some
sort of equivalent in the old days are CSV (comma separated values)
files which use a very simple syntax (use comma to separate values and
the first raw to outline the content of the columns) and are portable to
every implementation. XML, unlike CSV, is much more flexible and
structured even if it's much simpler than SGML.
A particular XML language is defined by its Document Type Definition
(DTD) which is described inside the XML specification. An XML document
may be validated against a DTD (if present) and if the validation is
successful the document is said "valid XML based on the particular DTD",
if a DTD is not present and the parser does not encounter syntax errors
parsing the file, the XML document is said "well-formed". If errors are
found, the document is not XML compliant.
So, any valid XML document is well-formed and an XML document valid for
one particular DTD may not necessary be valid for another DTD.
For example, HTML is not an XML language because the br tag is not XML
compliant. XHTML (where br is replaced by br/) is XML compliant.
While HTML pages are not always XML documents (some pages might be),
XHTML pages are always well-formed and valid if matched against the
right XHTML DTD.
So far for the technical differences, but why HTML was not good enough?
HTML is a language for describing graphics, behavior and hyperlinks on
web pages. HTML is NOT able to "contextualize" (means "give meaning to
some text") in the sense that if you look for the "title" of your page,
a nice HTML tag gives you that, but if you look at the author or version
or something more specific like the author mail address, even if this
information is present in the text you don't have a way to "isolate"
this (contextualize it) from the surrounding information.
In some XHTML like this
center
h1This is my article/h1
h3by Stefano Mazzocchi <stefano@apache.org>/h3
/center
you don't have a secure way to extract the mail address, while in the
following
page
titleThis is my article/title
author
nameStefano Mazzocchi/name
mailstefano@apache.org/mail
/author
...
/page
it's trivial and algorithmically certain.
I don't picture XML take over HTML in web publishing since HTML is great
for small needs. Like I usually say when they ask me about this, HTML is
a DTD for homepages since it was designed just for that. HTML was NOT
designed for publishing and treatment of large quantity of data and
complex dynamic information systems, but only to parallelize and
simplify the deployment and management of personal information.
The img tag created all this mess I'm (very modestly) trying to clean
up :)
XSL
---
As you see, XML alone is useless without some defined semantics: even if
an application is able to parse a document in memory, it must be able to
"understand" that the markup means. This is why XML browsers are
meaningless and not more useful than text editors, XML-wise.
This is why XSL (eXtensible Stylesheet Language) was created.
XSL, as of the latest working draft (this technology is not yet stable
so beware!), is divided in two parts: transformation (XSLT) and
formatting objects (sometimes referred to FO, XSL:FO or simply XSL).
Both are XML DTDs that define a particular XML syntax. So every XSL and
XSLT document is a well-formed XML document.
XSLT
----
XSLT is a language for transforming one well-formed XML document into
another (some might disagree on this, but I'll skip this discussion).
This means that you may go from one DTD to another in an procedural way
that is defined inside your XSLT document. Even if the name tells the
opposite, this language as very little to do with styling:
transformations may be applied to one particular XML DTD and come up
with some "graphical description" of the original content. This is
called "styling", but, as you can see, this is just one of the possible
uses of the technology.
For example, the above XHTML may be created from the second XML file
given a particular transformation-sheet (which in this case becomes a
stylesheet). As you can see, the data is all there: we just have to tell
the transformer how to come up with the XHTML document once all the data
is parsed in memory.
Usually, transformation sheets work from one DTD to another and for this
reason may be used in chain: transformA goes from DTD1 to DTD2 and
transformB from DTD2 to DTD3 or graphically
DTD1 --- (transformA) --> DTD2 --- (transformB) ---> DTD3
I call DTD1 the "original DTD" (because of the "origin"), DTD2 some
"intermediate DTD", DTD3 the "final DTD". It can be shown that a
transformation can always be created to go directly from DTD1 to DTD3,
but this might be much more complicated and less human
readable/manageable.
XSL:FO
------
FO is a language (an XML DTD) for describing 2D graphics of text in both
printed and digital media. I will not contentrate on the graphical
abilities that FO gives you, but rather on the fact that FO is most of
the time a "final DTD", meaning that a transformation is used to
generate a FO document starting from a general XML file.
The example above would lead:
<fo:block
font-size="36pt"
text-align-last="centered">
This is my article
/fo:block
<fo:block
font-size="24pt"
text-align-last="centered">
Stefano Mazzocchi <stefano@apache.org>
/fo:block
which tells the FO formatter (the rendering engine), how to "draw" and
place the text on the screen or on paper. The FO formatter is not
generally XSLT based since the possible results could be a file format
which is not XML compabible (say PDF or PostScript) or some on-screen
operations (such for FO-aware browsers).
FO and XSLT are created by the same working group and show very high
synergies, I implemented Cocoon to show that the use of XSLT might well
be greater than for just styling.
XSP
---
And to do this, considering XSL a very nice way of finally separating
content and style (something that is NEVER entirely possible with HTML),
I moved on and defined a new XML DTD for separating content and logic
for compiled server pages.
XSP (eXtensible Server Pages) is, exactly like XSL:FO, a "final DTD" in
the sense that is the result of one or more transformation steps and,
exactly like FO, must be rendered by some formatter that is generally
XSLT based since the results are not XML documents but source code that
is then compiled into machine code.
In every dynamic page, there is a mix of static content and logic that
work together to create the final result, usually using run-time or
time-dependent input. In dynamic content generation technology, content
and logic are mixed in the same page. XSP is no exception.
XSP defines a syntax to mix static content and programmatic logic in a
way that is indipendent of the programming language used and on the
binary results that the final source-rendering gives.
XSP is just a piece of the framework: exactly like FO mixes style and
content, XSP mix logic and content. On the other hand, being both XML
DTDs, XSLT can be used to move from pure content to these final DTDs,
placing the style and logic on the transformation layers.
Cocoon 2
--------
So far so good (I hope), but how do I mix those things?
Here is a simple drawing that shows how the Cocoon 2 publishing
framework will solve this problem:
TemplateXML --(Logicsheet)--> XSP --> SourceCode --> Bytecode +
|
+-------------------(run-time execution)----------------------+
|
+--> DynamicXML --(Stylesheet)--> FO -> FinalDocument
With final document it could be PDF or HTML, or the stylesheet may
generate XHTML directly or even plain HTML. In some simple cases, FO
formatting may be skipped and XHTML may be generated directly by the
styling transformation.
I hope to have answered many questions and to have triggered new ones.
As always, I'm here to hear your feedback. :)
--
Stefano Mazzocchi A language that doesn't affect the way you
think about programming, is not worth knowing.
stefano@apache.org Alan J. Perlis
--------------------------------------------------------------------- |