g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
<== Date ==> <== Thread ==>

Subject: RE: asynPortDriver
From: Mark Rivers <rivers@cars.uchicago.edu>
To: "'Szalata, Zenon M.'" <zms@slac.stanford.edu>
Cc: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Thu, 26 Jul 2012 16:15:27 +0000
OK, good I have committed the fix to SVN and it will be in R4-20, which should be released soon.

Mark


-----Original Message-----
From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu] 
Sent: Thursday, July 26, 2012 10:45 AM
To: Mark Rivers
Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
Subject: RE: asynPortDriver

Hi Mark,
After nearly 24 hours of running and about 8.2 million trigger still no errors.
Thank you for tracking this down.
Zen

> -----Original Message-----
> From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> Sent: Wednesday, July 25, 2012 12:47 PM
> To: Szalata, Zenon M.
> Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> Subject: RE: asynPortDriver
>
> Hi Zen,
>
> OK, good news.
>
> Sorry for the trouble.  I'm quite sure I introduced that error when I was adding
> support for additional array types (int8, int16, etc.) when initially developing
> areaDetector.  Previously the device support made a copy of the array in the
> callback function.  But that is not practical with megapixel arrays, so I wrote
> to the record directly, forgetting to protect the access.
>
> Mark
>
>
> -----Original Message-----
> From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu]
> Sent: Wednesday, July 25, 2012 2:38 PM
> To: Mark Rivers
> Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> Subject: RE: asynPortDriver
>
> Hi Mark,
> Instead, I edited the patched devAsynXXXArray.h file, where I commented out
> the two lines that refer to timestamp.
> With this change, my IOC has processed nearly 1 million triggers without an
> error.  Perhaps it is too early to declare victory but it smells like this is it.
> Thanks for your help,
> Zen
>
> > -----Original Message-----
> > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > Sent: Wednesday, July 25, 2012 7:02 AM
> > To: Szalata, Zenon M.
> > Subject: RE: asynPortDriver
> >
> > Hi Zen,
> >
> > I realized you probably cannot just modify the patch, since the line
> > numbers won't match.
> >
> > What you want to do is just to bracket the two statements that access
> > the waveform record with pdf to be bracketed by a dbScanLock like this:
> >
> >     dbScanLock((dbCommon *)pwf); \
> >     if (len > pwf->nelm) len = pwf->nelm; \
> >     for (i=0; i<(int)len; i++) pData[i] = value[i]; \
> >     dbScanUnlock((dbCommon *)pwf); \
> >
> > Mark
> >
> > -----Original Message-----
> > From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu]
> > Sent: Tuesday, July 24, 2012 10:41 PM
> > To: Mark Rivers
> > Subject: RE: asynPortDriver
> >
> > Hi Mark,
> > Asyn does not build after I patch the file.  I get:
> > ../../asyn/devEpics/devAsynInt8Array.c: In function 'callbackWf':
> > ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no
> > member named 'timestamp'
> > ../../asyn/devEpics/devAsynInt8Array.c: In function 'interruptCallbackInput':
> > ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no
> > member named 'timestamp'
> > make[2]: *** [devAsynInt8Array.o] Error 1
> >
> > Zen
> >
> >
> > > -----Original Message-----
> > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > Sent: Tuesday, July 24, 2012 3:35 PM
> > > To: Szalata, Zenon M.
> > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Zen,
> > >
> > > I think we were on the right track with the gotValue check, but the
> > > timing problem is a little different.
> > >
> > > Here is the problem I thought might be happening:
> > >
> > > - Your driver calls doCallbacksInt32Array.
> > > - This results in a call to interruptCallbackInput in
> > > asyn/devEpics/devAsynXXXArray.h
> > > - interruptCallbackInput copies the data to the waveform record VAL
> > > field and then calls scanIoRequest
> > > - Another interrupt occurs before the record processes and calls
> > > devAsynXXXArray::processCommon
> > > - gotValue will be 1 because the processCommon has not yet reset
> > > gotValue=0
> > >
> > > You showed that this problem can happen, but only at higher callback
> rates.
> > >
> > > Here is the problem I now think is happening:
> > > - Your driver calls doCallbacksInt32Array.
> > > - This results in a call to interruptCallbackInput in
> > > asyn/devEpics/devAsynXXXArray.h
> > > - interruptCallbackInput copies the data to the waveform record VAL
> > > field and then calls scanIoRequest
> > > - scanIoRequest calls processCommon, which sets gotValue=0
> > > - Another interrupt occurs before the record has finished posting
> > > monitors.  It overwrites the data in the record.  It calls scanIoRequest
> again.
> > > - The waveform record then processes twice with the data from the
> > > second callback.
> > >
> > > This appears to be what you are seeing.
> > >
> > > I just realized that the problem is that there no mutex preventing
> > > the callback from overwriting the waveform data at any time when the
> > > waveform record is processing, which is bad.  Note that is problem
> > > is confined to I/O Intr scanned waveform records, it does not happen
> > > on other records that use a ring buffer.
> > >
> > > I think the solution is to call dbScanLock during the time the array
> > > data is being modified in the callback.
> > >
> > > Here is a patch against asyn R4-19.  Can you try it?  It also sets
> > > the timestamp from the callback.
> > >
> > > corvette:asyn/asyn/devEpics>svn diff -r1995 devAsynXXXArray.h
> > > Index: devAsynXXXArray.h
> > >
> > =============================================================
> > ======
> > > --- devAsynXXXArray.h   (revision 1995)
> > > +++ devAsynXXXArray.h   (working copy)
> > > @@ -254,6 +254,7 @@
> > >          pPvt->pasynUser, pwf->bptr, pwf->nelm, &nread); \
> > >      asynPrint(pasynUser, ASYN_TRACEIO_DEVICE, \
> > >                "%s %s::callbackWf\n", pwf->name, driverName); \
> > > +    pwf->time = pasynUser->timestamp; \
> > >      if (status == asynSuccess) { \
> > >          pwf->udf=0; \
> > >          pwf->nord = nread; \
> > > @@ -280,8 +281,11 @@
> > >          (char *)value, len*sizeof(EPICS_TYPE), \
> > >          "%s %s::interruptCallbackInput\n", \
> > >          pwf->name, driverName); \
> > > +    dbScanLock((dbCommon *)pwf); \
> > >      if (len > pwf->nelm) len = pwf->nelm; \
> > >      for (i=0; i<(int)len; i++) pData[i] = value[i]; \
> > > +    pwf->time = pasynUser->timestamp; \
> > > +    dbScanUnlock((dbCommon *)pwf); \
> > >      pPvt->gotValue = 1; \
> > >      pPvt->nord = len; \
> > >      if (pPvt->status == asynSuccess) pPvt->status =
> > > pasynUser->auxStatus; \
> > >
> > >
> > >
> > > Mark
> > >
> > >
> > > -----Original Message-----
> > > From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu]
> > > Sent: Tuesday, July 24, 2012 4:47 PM
> > > To: Mark Rivers
> > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Mark,
> > > I did couple of things to help understand what is happening.
> > > 1. I put a printf statement in interruptCallbackInput to print a
> > > message when gotValue not equals 0.
> > >     This message does not get printed when the error condition occurs.
> > > I tested that the message gets
> > >     printed by increasing the trigger rate until the IOC "fall apart"
> > > and those messages then come out abundantly.
> > >
> > > 2. I have simplified my diagnostic mechanism to remove as much
> > > ambiguity as possible to the point
> > >     that I think the problem happens between posting the data in the
> > > device driver and the waveform
> > >     record processing (I/O Intr).
> > >
> > > My diagnostic mechanism consists of two sets of EPICS records.
> > > The data are passed from the device driver as an array of 17 integers.
> > > The first is a gate counter, which is incremented just before it is
> > > put in the
> > array.
> > > This counter is followed by 16 ADC values.  The array is received in
> > > a waveform record.  The waveform record triggers processing of an
> > > aSub record and a chain of sub array records where the ADC values
> > > are extracted and put into ai records.
> > > The aSub record compares the value of the gate counter with the
> > > value of gate counter from a previous event.  If this value is not
> > > greater by 1 than the previous value, it prints a message, which
> > > shows this and
> > previous values.
> > >
> > > The second set of diagnostic records is processed when the gate
> > > counter is extracted from the waveform record and put into longin
> > > record.  there is a sub record which does similar checking, but on
> > > discrete data and prints a message if the counter value did not
> > > change by 1.  Printed outputs from both sets indicate the same problem.
> > >
> > > Here is a typical output:
> > >
> > > asubMiss: Something wrong: n=53385,lastn=53383
> > > asubMiss: Something wrong: n=53385,lastn=53385
> > > subASMissing: got a missing ++++++++++++++++
> > >
> > a=53385.000,b=0.000,c=4294967295.000,d=1.000,e=53383.000,f=1.000,g=1.
> > 0
> > > 00
> > > k=2.000,l=1.000,val=2.000
> > > subASMissing: trigger number this=53385.0, prev=53385.0
> > >
> > > The first two messages come from the aSub record, the next three
> > > come from the sub record.
> > > I am not sure how to interpret the presence of the second of the two
> > > messages from the aSub record and the third from the sub record.  It
> > > looks as if the waveform record processed twice when the problem
> > occurred.
> > >
> > > The sub record message prints the values of the fields a through l,
> > > where a is this counter and e is previous counter.
> > >
> > > Thank you Mark for looking at this,
> > > Zen
> > >
> > >
> > > > -----Original Message-----
> > > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > > Sent: Thursday, July 19, 2012 6:10 PM
> > > > To: Szalata, Zenon M.
> > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Zen,
> > > >
> > > > What is the diagnostic mechanism you are using to determine that
> > > > the counter has incremented by 2?  Is there a possibility that the
> > > > waveform record is actually processing every callback OK, but that
> > > > arrays are being dropped between the waveform record and whatever
> > > > diagnostic client you are running?
> > > >
> > > > Mark
> > > >
> > > > ________________________________________
> > > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > > Sent: Thursday, July 19, 2012 5:23 PM
> > > > To: Mark Rivers
> > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Mark,
> > > > I modified my version 2 asynPortDriver based device driver so that now
> it:
> > > > 1. in ISR (at interrupt level) it read data from all 16 channels
> > > > and sends the data by message queue.
> > > > 2. In a high priority thread the data is received from the message
> > > > queue, unpacked, and posted as an array.
> > > >     The unpacked data are put into an array where the first
> > > > element is a counter, which is incremented in the
> > > >     data unpacking thread, and elements 1 through 16 are the ADC values.
> > > > In the db layer I still see missing data.  Once in a while the
> > > > counter value, which is the first element in the data array, is
> > > > larger by 2 than the previous counter value.  The test is done at 100Hz.
> > > >
> > > > The version 3 of the device driver, which is not using asyn, does
> > > > not exhibit this behavior.  In fact I can run the system with a
> > > > trigger rate of up to 400Hz before it starts falling apart.  Since
> > > > I have a device driver that works (version 3) I could just use it and move
> on.
> > > > But at this point I am still reluctant to just abandon the
> > > > asynPortDriver version.  Can you think of some tests that could
> > > > help
> > > understand what is causing this problem?
> > > > Thanks,
> > > > Zen
> > > >
> > > > > -----Original Message-----
> > > > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > > > Sent: Sunday, July 15, 2012 6:59 PM
> > > > > To: Szalata, Zenon M.
> > > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > > Subject: RE: asynPortDriver
> > > > >
> > > > > Hi Zen,
> > > > >
> > > > > > In the first version of the device driver, I saw the effect
> > > > > > that Kevin Tsubota described in his reply to my email. This is what he
> said:
> > > > >
> > > > > > "At Keck we ran into a similar problem where we weren't seeing
> > > > > > all the events we expected with a different driver and it
> > > > > > turned out to be that ASYN
> > > > > only posts on change."
> > > > >
> > > > > OK, there are 3 parts to my reply about that:
> > > > >
> > > > > 1) This is not an issue with ASYN in general, it is specific to
> > > asynPortDriver.
> > > > >
> > > > > 2) A future version of asynPortDriver should have a variant of
> > > > > the setXXXParam that takes a "force" flag, i.e.
> > > > > setIntegerParam(addr, myParam, value, forceCallback) which will
> > > > > force a callback the next time callParamCallbacks(addr) is
> > > > > called, even if the value has not changed.
> > > > >
> > > > > 3) Meanwhile it is trivial to force such behavior in the current
> > > > > version by doing setIntegerParam(addr, myParam, value+1);
> > > > > setIntegerParam(addr, myParam, value);
> > > > >
> > > > > This will fool the parameter library into thinking that the
> > > > > value has changed so it will do the callback on "value".  The
> > > > > overhead of doing the additional call to setIntegerParam is
> insignificant.
> > > > >
> > > > > > In all three version of the device driver The data is read out
> > > > > > at interrupt level, which is needed to clear the interrupt.
> > > > > > Then I schedule a callback to continue processing the data.
> > > > > > This consists of
> > > > > unpacking the data  words into various components.
> > > > >
> > > > > I would have to see your code, but what you are doing sounds
> dangerous.
> > > > > What if a second interrupt occurs before the callback occurs to
> > > > > process the first?  If your data is not put in a queue, then you
> > > > > risk missing data.  In my Ip330 and ipUnidig drivers I send the
> > > > > data from the interrupt service routine to the thread that will
> > > > > do callbacks via an epicsMessageQueue.  Then even if the thread
> > > > > that does callbacks momentarily falls behind I will not lose
> > > > > data unless the message queue
> > > > overflows, and I have a flag to signal if that occurs.
> > > > >
> > > > > Mark
> > > > >
> > > > > ________________________________________
> > > > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > > > Sent: Sunday, July 15, 2012 8:20 PM
> > > > > To: Mark Rivers
> > > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > > Subject: RE: asynPortDriver
> > > > >
> > > > > Hi Mark,
> > > > > In the first version of the device driver, I saw the effect that
> > > > > Kevin Tsubota described in his reply to my email.  This is what he said:
> > > > >
> > > > > "At Keck we ran into a similar problem where we weren't seeing
> > > > > all the events we expected with a different driver and it turned
> > > > > out to be that ASYN only posts on change."
> > > > >
> > > > > In all three version of the device driver The data is read out
> > > > > at interrupt level, which is needed to clear the interrupt.
> > > > > Then I schedule a callback to continue processing the data.
> > > > > This consists of unpacking the data words into various components.
> > > > > In the case of the first version of the device driver, the
> > > > > callback requests to process the records are done in a for loop.
> > > > > Perhaps I have abandoned the first version too quickly.
> > > > > Anyway it was clear to me that it would be better to bundle the
> > > > > trigger number and data from all ADC channels and pass that to
> > > > > the db
> > > layer.
> > > > >
> > > > > I will modify the C subroutine of the subroutine record to also
> > > > > print a message when the current gate number is less than the
> > > > > previous
> > > value.
> > > > >
> > > > > Thanks,
> > > > > Zen
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > > > > Sent: Sunday, July 15, 2012 3:19 PM
> > > > > > To: Szalata, Zenon M.
> > > > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > > > Subject: RE: asynPortDriver
> > > > > >
> > > > > > How are you passing the data from your interrupt service
> > > > > > routine to the thread that is calling callParamCallbacks()?
> > > > > > Are you using a mechanism with a queue, like an
> epicsMessageQueue?
> > > > > >
> > > > > > I'd like to understand why "the order of record processing is
> > > > > > somewhat random" in method 1.  I would think that it would be
> > > > > > deterministic because there is only 1 callback thread that is
> > > > > > processing all the records in the order in which the callback
> > > > > > requests were issued, which I assume your driver does in a loop like:
> > > > > >
> > > > > > for (i=0; i<16; i++) {
> > > > > >   callParamCallbacks(i);
> > > > > > }
> > > > > >
> > > > > > Mark
> > > > > >
> > > > > > ________________________________
> > > > > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > > > > Sent: Sunday, July 15, 2012 12:15 PM
> > > > > > To: Mark Rivers
> > > > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > > > Subject: asynPortDriver
> > > > > >
> > > > > > Hi Mark,
> > > > > > I am trying to understand why my device driver for a CAEN VME
> > > > > > gated ADC module, which is based on asynPortDriver class works
> > > incorrectly.
> > > > > > The module has 16 ADC channels and is setup to generate an
> > > > > > interrupt for each gate signal.  The interrupt triggers data
> > > > > > readout and record processing.  For testing, the trigger/gate
> > > > > > rate is 100
> > > Hz.
> > > > > > The device driver maintains a gate counter, which gets
> > > > > > incremented each time the data
> > > > > is processed.
> > > > > > So far I have three versions of the device driver.
> > > > > >
> > > > > > 1. In this version I have 16 I/O Intr scanned ai records.  The
> > > > > > gate counter is pushed to a longin record.
> > > > > >     This version has the problem that the order of record
> > > > > > processing is somewhat random and the IOC
> > > > > >     using this VME module requires the processing to be deterministic.
> > > > > > To overcome this difficulty,
> > > > > >    I wrote the second version.
> > > > > >
> > > > > > 2. In this version I pack the gate counter and all 16 ADC
> > > > > > channel values into an array and this array gets
> > > > > >     pushed to a waveform record.  Then, the data gets
> > > > > > distributed to ai records using  sub array records.
> > > > > >    This part works fine.  I have subroutine record which is
> > > > > > processed each time the data waveform record
> > > > > >    is processed.  It checks for missing triggers, by expecting
> > > > > > that the gate counter value for this data array
> > > > > >    is larger by 1 than the previous one.  This is where the
> > > > > > device driver code seems to fail.
> > > > > >    I let the IOC run for a bit over 35 hours.  More precisely
> > > > > > the time corresponds to 12700898 triggers.
> > > > > >    During this time the subroutine record reported 42 missing
> > > > > > triggers.  They occurred spread out
> > > > > >    throughout the run nearly but not equally spaced in time. I
> > > > > > have a simple print statement
> > > > > >    in the C routine which prints a few numbers.  These numbers
> > > > > > tell me that the trigger incremented by 2.
> > > > > >    I just looked at the logic in the C routine and I see that
> > > > > > I am not printing messages when the new gate
> > > > > >    counter is less than the previous one.  So it is possible
> > > > > > that from time to time the waveform record gets
> > > > > >    the data out of order.  It is also possible, but I think
> > > > > > unlikely, that from time to time the data is lost.
> > > > > >
> > > > > > 3. This version of the device driver is non-asyn.  It consists
> > > > > > of two parts, device support and record support.
> > > > > >     I wrote it, by taking the second version of the device
> > > > > > driver and replaced the asynPortDriver part with
> > > > > >    device support.  Also, I created a new version of the IOC,
> > > > > > by taking the IOC which used the second version of
> > > > > >    the device driver and modifying the db files and the st.cmd
> > > > > > file as needed.  I ran this IOC version
> > > > > >   overnight, so far it has processed 4 and half million
> > > > > > triggers and not a single missed trigger.
> > > > > >
> > > > > > I wonder if the behavior observed using the second version of
> > > > > > the device driver is to be expected?
> > > > > > Does it mean, that for some applications the asynPortDriver
> > > > > > approach to device driver implementation might be not appropriate?
> > > > > > Zen



References:
asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.

Navigate by Date:
Prev: waveform display John Dobbins
Next: RE: waveform display Chen, Xihui
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
Navigate by Thread:
Prev: RE: asynPortDriver Szalata, Zenon M.
Next: SR620 Time Interval / Frequency Counter support? Bohler, Dorian K.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·