EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Help with streamDevice parsing HTML
From: Dirk Zimoch <[email protected]>
To: Rod Nussbaumer <[email protected]>
Cc: [email protected]
Date: Thu, 05 Mar 2009 14:37:26 +0100
Hi Rod,

First of all, your regexp ^MeasurePM*([0-9.]+) requires that the HTML input starts with MeasurePM, which is not the case. The ^ at the beginning is the problem. It does not refer to the beginning of a line but of the parsed input. That is all input minus the text that has already been parsed before. In your case it is the complete reply from the web server including the HTTP header "HTTP/1.0 ..." which a browser does not show.

Parsing multiple values from one input is possible with the syntax %(otherrecord.VAL)... but only if the values always come in the same order.

What you can do is (assuming constant order):
* find "Measure(PPM):"
* read value
* find "Sample flow(CC):"
* read value
* find "Scale:"
* read value
* find "System status:"
* read value

Each find can be done with a regexp which is then ignored with %*:

read {
    extraInput=ignore;
    interminator = "</HTML>";  # terminators can have arbitrary length
    connect 1000;              # connect to server, 1 second timeout
    out "GET http://142.90.132.71/lcd.cgi";;    # HTTP request
    in "%*/Measure\(PPM\):/%f"
       "%*/Sample flow\(CC\):/%(\$1:FLOW.VAL)f"
       "%*/Scale:/%(\$1:SCALE)s"
       "%*/System status:/%(\$1:STATUS)s";
}

The "master" record is PPM. This is the only one with StreamDevice support. The other records (FLOW, SCALE, STATUS) are passive soft records. Their device name is passed to the protocol by argument #1 and replace \$1 in the redirected formats.

record (ai, "ISAC2:N2ANAL1:PPM")
{
    field (DTYP, "stream")
    field (FLNK, "ISAC2:N2ANAL1:FLOW")
    field (INP,  "@regexp.proto read(ISAC2:N2ANAL1) n2anal1")
}

record (ai, "ISAC2:N2ANAL1:FLOW")
{
}

record (stringin, "ISAC2:N2ANAL1:SCALE")
{
}

record (stringin, "ISAC2:N2ANAL1:STATUS")
{
}

Maybe you want to parse STATUS with %{OK|...} and put it into an mbbi. The same for SCALE if there is a small set of scales only.

I hope that helps,
Dirk


Rod Nussbaumer wrote:
Greetings, EPICS users.

I am trying to use streamDevice with regular expression support to parse input from an instrument containing an embedded web server. So far, I've successfully built and run the streamApp sample IOC that gets built with the streamDevice distribution. I am able to parse a single HTML element (TITLE, with minor mods to the example protocol file) from a static web page served by an Apache web server. So, I think I've built everything properly (streamDevice 2.4, asyn 4-9, EPICS 3.14.10, Linux=Scientific Linux 4.x).

My difficulty is that the instrument produces a page containing three strings that I wish to extract into three EPICS records. I'm testing with string-in records for now, but eventually I would prefer to scan to analog-in records. I am able to extract the first element from the HTML, although I do get error messages from streamDevice:

"Input "HTTP/1.0 200 OK<0d><0a>Con..." does not match format %.1/^MeasurePM*([0-9.]+)/"

However, I am unsure about how to get multiple fields scanned from the HTML using a single HTTP GET. I would really like to avoid doing one whole HTTP fetch per record, since the instrument is fairly primitive, and can probably become overwhelmed by excessive traffic, and it may be the case that readings on a single fetch are synchronized in time. From my interpretation of the documentation, the scanner should stop, leaving unread text in the pipeline once it has made a successful match. However, I don't understand how to continue scanning the remainder of the HTML page to extract the remaining data.

I attach below, the streamDevice protocol file I'm presently using (the latest flavor that gets me closest to working), the relevant database, and the entire HTML that the instrument returns. Much of this is based on the examples from streamDevice.

#=========================<protocol file>=============================
# regular expression example
# extract the title of from a web page

outterminator = NL;
interminator = "</HTML>";  # terminators can have arbitrary length

# Web servers close the connection after sending a page.
# Thus, we can't use autoconnect (see drvAsynIPPortConfigure)
# Handle connection manually in protocol.

readTitle {
    extraInput=ignore;
    interminator = "</HTML>";  # terminators can have arbitrary length
    connect 1000;            # connect to server, 1 second timeout
    out "GET http://isacwserv.triumf.ca";;    # HTTP request
    in "%.1/<TITLE>(.+)<\/TITLE>/";  # get string in <TITLE></TITLE>
    disconnect;
}

readPPM {
    extraInput=ignore;
    interminator = "</HTML>";  # terminators can have arbitrary length
    connect 1000;              # connect to server, 1 second timeout
    out "GET http://142.90.132.71/lcd.cgi";;    # HTTP request
in "%.1/^Measure\(PPM\):\s*([0-9.]+)/"; # get string in Measure field
}

readFLOW {
    extraInput=ignore;
    in "%.1/^Sample flow\(CC\):\\s*([0-9.]+)/"; # get string in Flow field
}

readSCALE {
    extraInput=ignore;
    in "%.1/Scale:\s*(.+)\n/";       # get string in Scale field
    disconnect;                      # servers closes, so do we.
}

#==================< EPICS db file >===========================
record (stringin, "ISAC:WSERV:TITLE")
{
    field (DTYP, "stream")
    field (INP,  "@regexp.proto readTITLE isacwserv")
}

record (stringin, "ISAC2:N2ANAL1:PPM")
{
    field (DTYP, "stream")
    field (FLNK, "ISAC2:N2ANAL1:FLOW")
    field (INP,  "@regexp.proto readPPM n2anal1")
}

record (stringin, "ISAC2:N2ANAL1:FLOW")
{
    field (DTYP, "stream")
    field (FLNK, "ISAC2:N2ANAL1:SCALE")
    field (INP,  "@regexp.proto readFLOW n2anal1")
}

record (stringin, "ISAC2:N2ANAL1:SCALE")
{
    field (DTYP, "stream")
    field (INP,  "@regexp.proto readSCALE n2anal1")
}


#==================< Instrument web page >=======================
# (additional line breaks inserted by e-mail; hopefully this gets
#  displayed as HTML text, and not as a web page)
#
<HTML><HEAD><meta http-equiv="refresh" content="1"></HEAD><BODY bgcolor="#BEF06E"><!-- Affiche le contenu de l'écran -->
<FONT color="black" size="6"><B><PRE>
              &lt;&lt;RUN MODE>>        F4:RET
Measure(PPM): 2.3
Sample flow(CC): 50.0   Scale:0-10   PPM
System status:OK                        </PRE></FONT></B></BODY></HTML>


===============================================================


I am trying to extract the values for 'Measure (PPM)', 'flow(CC)', & 'Scale:'. When the 'FLOW' and 'SCALE' records process, streamDevice reports:


2009/03/04 10:49:59.678 n2anal1 ISAC2:N2ANAL1:PPM: Input "HTTP/1.0 200 OK<0d><0a>Con..." does not match format %.1/^MeasurePM*([0-9.]+)/ 2009/03/04 10:49:59.680 n2anal1 ISAC2:N2ANAL1:FLOW: asynError in read: 142.90.132.71:80 connection closed 2009/03/04 10:49:59.680 n2anal1 ISAC2:N2ANAL1:FLOW: I/O error after reading 0 bytes: ""
2009/03/04 10:49:59.680 n2anal1 ISAC2:N2ANAL1:FLOW: Protocol aborted
2009/03/04 10:50:00.677 timerQueue ISAC2:N2ANAL1:SCALE: I/O error after reading 0 bytes: ""
2009/03/04 10:50:00.677 timerQueue ISAC2:N2ANAL1:SCALE: Protocol aborted

If I add some more backslashes to the regex in front of the parentheses that are embedded in the HTML text, the error messages more closely match the actual HTML text. I can't seem to find a 'right' combination of back-slashes that makes streamDevice and PCRE happy.

Thanks for any insight.

Rod Nussbaumer
ISAC Controls, TRIUMF
Vancouver, Canada.





--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182

Replies:
Re: Help with streamDevice parsing HTML Dirk Zimoch
References:
Help with streamDevice parsing HTML Rod Nussbaumer

Navigate by Date:
Prev: Re-wrap CaChannel based on PythonCA Wang Xiaoqiang
Next: Re: Help with streamDevice parsing HTML Dirk Zimoch
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Help with streamDevice parsing HTML Rod Nussbaumer
Next: Re: Help with streamDevice parsing HTML Dirk Zimoch
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 31 Jan 2014 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·