EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Help with streamDevice parsing HTML
From: Dirk Zimoch <[email protected]>
To: Rod Nussbaumer <[email protected]>
Cc: [email protected]
Date: Fri, 06 Mar 2009 12:15:28 +0100
Hi Rod

Rod Nussbaumer wrote:
Dirk Zimoch wrote:
There was a mistake in my previous mail. You must specify the field in the redirections (usually VAL).

    in "%*/Measure\(PPM\):/%f"
       "%*/Sample flow\(CC\):/%(\$1:FLOW.VAL)f"
       "%*/Scale:/%(\$1:SCALE.VAL)s"
       "%*/System status:/%(\$1:STATUS.VAL)s";

Dirk


Thanks for your help. I do now have a workable solution for this application. It turns out that the application actually does not require the use of regular expressions. However, I am still confused about the syntax used to extract data matching a regex, and especially where there are multiple components to be extracted.

In the example protocol file, there is an entry like:

in "%.1/<TITLE>(.+)<\/TITLE>/";

As I understand it, the notation "%.1" says to use the data matched by the first parenthesized component (ie the TITLE element data). Logically, it follows that one could match a second parenthesized regex component, and assign the matching data to a specified record field. Is this true, and if so, what notation would be used?

No, you can read only one value per %. The .1 notation allows you to select another parenthesized component than the first one but does not allow you to assign more than one.
Example:
in "%.2/<(TITLE|title|Title)>(.+)<\/(TITLE|title|Title)>/";

An example of this, using HTML again, might be:

<FONT color="black" size="6">

One can devise a regex that matches the two quoted values of the attributes, 'color' and 'size'; something like

/<FONT\s+color="(.+)"\s+size="(.+)">/

which contains two parenthesized parts of the regular expression, and which I might like to assign to separate records. Can this be done, and if so, what notation would be required?

In that case you need 2 regular expressions:
in '%.1/<FONT\s+color="(.+)"/%(OtherRecord.VAL).1/^\s+size="(.+)">/';

You can use ^ in the second expression to make sure that it immediately follows the first one.

Unfortunately, a regexp is always a string. If you want to parse numbers, it is often better to use the regexp only to find the correct location and then to parse the number with one of the other formats:

in '%*/<FONT[^>]+size="/%f">';

When parsing text, keep in mind that StreamDevice always parses in one pass from start to end. That means that values must always be in the same order as defined by the protocol. That means the same protocol can not handle both of:
<FONT color="black" size="6">
<FONT size="6" color="black">

One can probably do that with an "I/O Intr" record. The master record fetches the page and parses one value with
in '%.1/<FONT.*?color="(.+)".*?>/';
The slave record is "I/O Intr" and parses the other value:
in '%.1/<FONT.*?size="(.+)".*?>/';

In this case, both record parse the same input from start to end. Thus the save record should probably have more context in its expression to get the correct FONT value (it would get the first in this example).

Of course this method is quite inefficient to get many values because each value requires a separate record and each of them has to parse the complete input. Better use a specific HTML/XML parser for that.

Dirk

--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182

References:
Help with streamDevice parsing HTML Rod Nussbaumer
Re: Help with streamDevice parsing HTML Dirk Zimoch
Re: Help with streamDevice parsing HTML Rod Nussbaumer

Navigate by Date:
Prev: Re: Help with streamDevice parsing HTML Rod Nussbaumer
Next: Varian Turbo Pump Controllers Dave Reid
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Help with streamDevice parsing HTML Rod Nussbaumer
Next: Re-wrap CaChannel based on PythonCA Wang Xiaoqiang
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 31 Jan 2014 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·