Experimental Physics and
Industrial Control System

Claude Saunders <[email protected]> · Thu, 27 Oct 2005 09:54:45 -0500

I strongly support using XML over custom formats. Also, I would suggest

looking into using XSD (XML Schema Definition) instead of DTD for

validation. XSD has much more functionality than DTD, and is more

readable as well. The downside, I suppose, is that parser/validators may

be less prevalent than with DTD. But XSD has pretty much completely

superceded DTD in the corporate website world, so if there is any way we

could use that instead. Even if you don't use XSD to validate, it's a

great format to specify the syntax even as reference documentation.

Yes XML is harder to hand edit properly (ie. with vi), but shouldn't we

be creating some tools here to coordinate the creation of dbd and db

data? I would think a simple set of plugins for EPICS-Office would allow

for this.

  - Claude

Kay-Uwe Kasemir wrote:

Hi:

On Oct 27, 2005, at 09:47 , Rees, NP (Nick) wrote:

.... With all the changes to the
V4 db file syntax, have we considered adopting XML?

I thought the same when rewriting the archiver toolset,
and now it uses XML formats for all its configuration files.

1. The conversion between the V3 .db and .dbd file format and XML is
trivial (The same cannot be said of the V4 format as it currently
exists).

Indeed it was fairly simple to create a perl script that converts
the 'old' archive config file format into the new one.
Since nobody at the SNS is generating the config files
from e.g. a RDB directly into XML,
and everybody dislikes manual editing of the XML file format,
I think everybody still uses the "old" format & the converter.

2. Parsers exist for all languages and platforms.

True. The portions of the archiver toolset written in C++
supports expat (simple, no checking) and Xerces (big, DTD checking).
Compilation for the various variants of Unix works OK.

3. Validation definition languages are available and so we can  formally
define the file in a way that it can be automatically validated (at
least to a large degree). This removes the need for a lot of bespoke
validation.

This is where I would have hoped for more.
Assume your DTD should enforce something like this:
<database>
    <record>
        <type>ai</type>
        <name>fred</name>
        <scan>1</scan>
    </record>
    <record>
        <type>bo</type>
        <name>jane</name>
    </record>
</database>

A simple parser based on expat would only check the nesting of tags.
I.e.   <A> <B> </B> </A> is OK,
I.e.   <A> <B> </A> </B> gives an error message.

With xerces, I managed to have it validate things like this:
- that each file contains a "database"
- that there's zero or more "record" entries per "database"
- that each "record" contains a "type" and "name".
- that "type" contains "ai" or "bo" or ... (some fixed list of options)
- there might be a "scan" entry, but doesn't have to be

What it didn't do:
- Assert that "scan" contains a number.
  All I found was "#PCDATA", which can be any text.

What XML doesn't seem to handle in general:
- Include files.

Agreed, the checking provided by xerces is already helpful,
but there's still a lot of programming to do:
You either get a callback for each "opening" and "closing" tag
as well as character in between,
which is very close to parsing it all yourself,
or you get an in-memory tree structure of the file,
and then you still have to check if there's a "scan" child node
in your "record fred" subtree, and if it's numeric.

Since type checking and include mechanisms are generally useful,
maybe they're there and I just didn't see it?
So I still have code that e.g. reads an "engine configuration".

Before, it read the ASCII file line by line. Now it parses the result

of the

XML parser. The amount of code that I had to write is

the same or even more in the XML case, since I support both expat and

xerces.

So what does this all mean?
I think XML is just a more restrictive file format than plain ASCII.
I don't see how it has made my life any easier.

But maybe that's because I used it only for one version of archive

config files,

and all will be much better when every EPICS tool uses XML files
for several generations of tools,
so convertibility and backwards-compatibility become more important.

-Kay

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System