![]()
![]() |
XML won. Did we win?
|
||
Is there a reason to be sad about XML? There cannot be a reason:
Is there a reason for the user to be happy about XML? There are two:
Happiness everywhere?
User's paradise will get tools and support. And we got promises that the complicated SGML things are now as easy as HTML but more powerful. I do not believe that.
The user will not face any better situation in her architectural needs nor in her editorial requirements. Good information webs need effort and design, for weaving the link net, for qualifying the information modules; there is no difference between XML and SGML. The more information is stored the more information management systems are needed and the more interesting workflow management becomes.
So the situation will not change dramatically. Hard work there, hard work here. No wizardry.
XML is good and important for the software industry only. Implementing structure becomes easier. Five years too late.
And what have we lost?
The XML revolution was a reaction to the filigree tissues of SGML. A considerable part of SGML is historical. The standard tried to heal the times without SGML editors, when capture had to be simplified. And the standard tried to explore unknown land giving some concepts, but not thought through nor have they ever been implemented (e.g., the CONCUR and LINK features never made it). When SGML editors evolved the need for optimized markup vanished.
Some other parts simply are designed poorly. So it is for example no problem, to get rid of a NUMBER type of an attribute, since this type never has been defined in a consistent and international way. Some other parts of SGML are hard to implement and contradict the requirements for modular documentation. The exceptions (exclusions and inclusions) become an insoluble problem when we try to divide a document into parts losing the context by that.
So it seems as if SGML would have needed a reworking. And because the standard itself does not allow for such deep changes, olá, let us make a new one. The power of the facts ...
We lost something. In terms of syntactic richness not enough to be against XML but enough to stick if possible with SGML for authoring in more complex, i.e. in normal environments. The list of the most missed features will differ from one SGML expert to the other. My list is not too long.
The first report of loss has to do with conversion. It is less serious.
In SGML it is allowed to omit tags at the beginning or at the end of an element when the tags can be deduced by context. Such a possibility could be declared by the omittag indicators after the element name. So "<!ELEMENT chapter O O (title, a+)>" means that something like "<title> ... </title><a> ... </a> <title> ... </title> ..." will parse correctly in SGML even when the markup for <chapter> is missing. XML asks for full markup.
This is not important for editing, since the majority of SGML editors automatically creates the full markup. It becomes a life saver for conversions up to SGML.
When using a professional, SGML aware conversion tool (there are not many; I am using OmniMark) there is a possible interaction between the pattern matching part and the parser. In the vast majority of cases, where non-nested formats usually have to be transformed into nested ones, it is a big advantage to have the help of the parser for the generation of nesting tags. It may save days of work and testing, not being forced to use stacks and counters in order to find the correct nesting information for the tagging. It even paid to generate an intermediate DTD just for the purpose to let the parser do half of the work.
There might be people outside (e.g. in the Ottawa area) which do such things in an even more sophisticated way using the SHORTREF and USEMAP features. Their complaints might be even louder.
Since there is hope that tools like OmniMark will get an additional XML mode it might be expected that it will not lose the SGML capabilities but new XML based tools will be necessarily poorer.
XML complies with ISO 10646. The 2 Byte subset of this standard defines the Basic Multilingual Plane, which is published as Unicode 1.1. XML defines Unicode as the base character set..
XML does not require any longer the character entity scheme as it was used in SGML. In SGML special characters have names. There was a lot of freedom for the user but most sticked to the character sets as published by the ISO. So it was well known, that a public identifier "ISO 8879-1986//ENTITIES Added Latin 2//EN" addressed the needs of the Eastern European languages and in sequence that a character like ž was allowed and defined from then on.
There are three huge advantages with this method:
These advantages are lost with XML.
The complaint does not go about Unicode. The complaint goes about the fact that in XML the layer for defining a set of possible characters was abolished. This is my reason number 1 for SGML.
SGML allows for exceptions to an element's content model.
Inclusions freely allow elements for the content model or any of its subelements.
Inclusions count as a mortal sin because they hurt any structural rule, so let us be happy that they are out, even if they were useful from time to time.
Exclusions work the other way around: they forbid the use of an element in a content or in any subelements. They serve two very important purposes:
<!ENTITY % geog-sp "mountain | river | city" > <!ENTITY % bio-sp "death | birth | profession" > <!ENTITY % body-content "#PCDATA | em | %geog-sp; | %bio-sp;" > <!ELEMENT bio-art - - ( keyword, body ) ( %geog-sp; ) > <!ELEMENT geog-art - - ( keyword, body ) ( %bio-sp; ) > <!ELEMENT body - - ( %body-content; )* >
This makes it possible to have to maintain only one %body-content and yet at the same time to increase the quality of the documents and to decrease the number of choices for the authors (and by that way to decrease the probability of wrong markup). In reality there are not 2 but 12 different article types and it is easy to see that XML will force us to model 12 different body elements with different names, which will not only affect the complexity of the DTD but as well and even more the complexity of all other processes, e.g. rendering, transformations and typesetting.
XML has however good reasons for not allowing exceptions. They require a huge and difficult tracing of the context. And in times of document-less information modules such contexts simply do not exist anymore, so we have to leave this feature sooner or later.
Nonetheless and due to the practical impact: this is my reason number 2 for SGML.
There might be more reasons for others to miss SGML which are beyond my professional experience or of minor importance. Some examples:
So one or the other might have additional reasons to stick with SGML.
The most important reason however currently and hopefully temporarily is the following.
DTDs are the base of the new information age. When modeling information for customers there is no way to do this in a professional way without a DTD. And there are no document types, except generated ones, where we would not need a DTD for data capture and data maintenance. So we need DTD based editing environments.
As long as the XML based applications are so awfully poor there is no way to avoid outstanding tools such as Arbortext's ADEPT editor. And there is a huge distance from these introduced tools to the ones coming out of the garages. So we need either a huge investment (just guess who could make it!) or we need to wait quite a long time until user friendliness, robustness and compliance of the current SGML tools will be reached. It is not only mastering the complexity of SGML which adds to a good editor; it is much more and the tool makers overestimate the complexity of SGML and underestimate the complexity of a good structure editor.
The lack of tools is my reason number 3 for SGML it is the most important one but hopefully one which will be overcome any time.
Want to know what development will get my first price?
The editor who makes it to interpret the XML Data DTDs (see the proposal on the W3C server). If there would be an addition of conditional content and if the mapping ideas would be augmented by concepts for element in context and generation of text then we would get a completely new SGML or XML. And there is already another proposal: "Schema for object-oriented XML", SOX for short, which gives a similar approach to this topic. My god, how will we call such a Super Markup Language?
SML?
An XML editor will not be enough for this language, no illusions, please. In the disguise of XML we will get a new language much more powerful than SGML, much more complicated than XML, and exactly the one we need.
Then XML would have been a step for popularizing such great ideas. In this sense we know already, that we needed XML after SGML in order to reach this SML.
So: no reason to be sad about XML.
© Organon Knowledge Architectures 1998
Publication: 1998