Hi:
On Oct 27, 2005, at 09:47 , Rees, NP (Nick) wrote:
.... With all the changes to the
V4 db file syntax, have we considered adopting XML?
I thought the same when rewriting the archiver toolset,
and now it uses XML formats for all its configuration files.
1. The conversion between the V3 .db and .dbd file format and XML is
trivial (The same cannot be said of the V4 format as it currently
exists).
Indeed it was fairly simple to create a perl script that converts
the 'old' archive config file format into the new one.
Since nobody at the SNS is generating the config files
from e.g. a RDB directly into XML,
and everybody dislikes manual editing of the XML file format,
I think everybody still uses the "old" format & the converter.
2. Parsers exist for all languages and platforms.
True. The portions of the archiver toolset written in C++
supports expat (simple, no checking) and Xerces (big, DTD checking).
Compilation for the various variants of Unix works OK.
3. Validation definition languages are available and so we can formally
define the file in a way that it can be automatically validated (at
least to a large degree). This removes the need for a lot of bespoke
validation.
This is where I would have hoped for more.
Assume your DTD should enforce something like this:
<database>
<record>
<type>ai</type>
<name>fred</name>
<scan>1</scan>
</record>
<record>
<type>bo</type>
<name>jane</name>
</record>
</database>
A simple parser based on expat would only check the nesting of tags.
I.e. <A> <B> </B> </A> is OK,
I.e. <A> <B> </A> </B> gives an error message.
With xerces, I managed to have it validate things like this:
- that each file contains a "database"
- that there's zero or more "record" entries per "database"
- that each "record" contains a "type" and "name".
- that "type" contains "ai" or "bo" or ... (some fixed list of options)
- there might be a "scan" entry, but doesn't have to be
What it didn't do:
- Assert that "scan" contains a number.
All I found was "#PCDATA", which can be any text.
What XML doesn't seem to handle in general:
- Include files.
Agreed, the checking provided by xerces is already helpful,
but there's still a lot of programming to do:
You either get a callback for each "opening" and "closing" tag
as well as character in between,
which is very close to parsing it all yourself,
or you get an in-memory tree structure of the file,
and then you still have to check if there's a "scan" child node
in your "record fred" subtree, and if it's numeric.
Since type checking and include mechanisms are generally useful,
maybe they're there and I just didn't see it?
So I still have code that e.g. reads an "engine configuration".
Before, it read the ASCII file line by line. Now it parses the result
of the
XML parser. The amount of code that I had to write is
the same or even more in the XML case, since I support both expat and
xerces.
So what does this all mean?
I think XML is just a more restrictive file format than plain ASCII.
I don't see how it has made my life any easier.
But maybe that's because I used it only for one version of archive
config files,
and all will be much better when every EPICS tool uses XML files
for several generations of tools,
so convertibility and backwards-compatibility become more important.
-Kay