Hi Rod
Rod Nussbaumer wrote:
Dirk Zimoch wrote:
There was a mistake in my previous mail. You must specify the field in
the redirections (usually VAL).
in "%*/Measure\(PPM\):/%f"
"%*/Sample flow\(CC\):/%(\$1:FLOW.VAL)f"
"%*/Scale:/%(\$1:SCALE.VAL)s"
"%*/System status:/%(\$1:STATUS.VAL)s";
Dirk
Thanks for your help. I do now have a workable solution for this
application. It turns out that the application actually does not require
the use of regular expressions.
However, I am still confused about the syntax used to extract data
matching a regex, and especially where there are multiple components to
be extracted.
In the example protocol file, there is an entry like:
in "%.1/<TITLE>(.+)<\/TITLE>/";
As I understand it, the notation "%.1" says to use the data matched by
the first parenthesized component (ie the TITLE element data).
Logically, it follows that one could match a second parenthesized regex
component, and assign the matching data to a specified record field. Is
this true, and if so, what notation would be used?
No, you can read only one value per %. The .1 notation allows you to select
another parenthesized component than the first one but does not allow you to
assign more than one.
Example:
in "%.2/<(TITLE|title|Title)>(.+)<\/(TITLE|title|Title)>/";
An example of this, using HTML again, might be:
<FONT color="black" size="6">
One can devise a regex that matches the two quoted values of the
attributes, 'color' and 'size'; something like
/<FONT\s+color="(.+)"\s+size="(.+)">/
which contains two parenthesized parts of the regular expression, and
which I might like to assign to separate records. Can this be done, and
if so, what notation would be required?
In that case you need 2 regular expressions:
in '%.1/<FONT\s+color="(.+)"/%(OtherRecord.VAL).1/^\s+size="(.+)">/';
You can use ^ in the second expression to make sure that it immediately follows
the first one.
Unfortunately, a regexp is always a string. If you want to parse numbers, it is
often better to use the regexp only to find the correct location and then to
parse the number with one of the other formats:
in '%*/<FONT[^>]+size="/%f">';
When parsing text, keep in mind that StreamDevice always parses in one pass from
start to end. That means that values must always be in the same order as defined
by the protocol. That means the same protocol can not handle both of:
<FONT color="black" size="6">
<FONT size="6" color="black">
One can probably do that with an "I/O Intr" record. The master record fetches
the page and parses one value with
in '%.1/<FONT.*?color="(.+)".*?>/';
The slave record is "I/O Intr" and parses the other value:
in '%.1/<FONT.*?size="(.+)".*?>/';
In this case, both record parse the same input from start to end. Thus the save
record should probably have more context in its expression to get the correct
FONT value (it would get the first in this example).
Of course this method is quite inefficient to get many values because each value
requires a separate record and each of them has to parse the complete input.
Better use a specific HTML/XML parser for that.
Dirk
--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182
- References:
- Help with streamDevice parsing HTML Rod Nussbaumer
- Re: Help with streamDevice parsing HTML Dirk Zimoch
- Re: Help with streamDevice parsing HTML Rod Nussbaumer
- Navigate by Date:
- Prev:
Re: Help with streamDevice parsing HTML Rod Nussbaumer
- Next:
Varian Turbo Pump Controllers Dave Reid
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Help with streamDevice parsing HTML Rod Nussbaumer
- Next:
Re-wrap CaChannel based on PythonCA Wang Xiaoqiang
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|