g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
<== Date ==> <== Thread ==>

Subject: RE: asynPortDriver
From: Mark Rivers <rivers@cars.uchicago.edu>
To: "'Szalata, Zenon M.'" <zms@slac.stanford.edu>
Cc: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Wed, 25 Jul 2012 19:46:39 +0000
Hi Zen,

OK, good news.

Sorry for the trouble.  I'm quite sure I introduced that error when I was adding support for additional array types (int8, int16, etc.) when initially developing areaDetector.  Previously the device support made a copy of the array in the callback function.  But that is not practical with megapixel arrays, so I wrote to the record directly, forgetting to protect the access.

Mark


-----Original Message-----
From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu] 
Sent: Wednesday, July 25, 2012 2:38 PM
To: Mark Rivers
Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
Subject: RE: asynPortDriver

Hi Mark,
Instead, I edited the patched devAsynXXXArray.h file, where I commented out the two lines that refer to timestamp.
With this change, my IOC has processed nearly 1 million triggers without an error.  Perhaps it is too early to declare victory but it smells like this is it.
Thanks for your help,
Zen

> -----Original Message-----
> From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> Sent: Wednesday, July 25, 2012 7:02 AM
> To: Szalata, Zenon M.
> Subject: RE: asynPortDriver
>
> Hi Zen,
>
> I realized you probably cannot just modify the patch, since the line numbers
> won't match.
>
> What you want to do is just to bracket the two statements that access the
> waveform record with pdf to be bracketed by a dbScanLock like this:
>
>     dbScanLock((dbCommon *)pwf); \
>     if (len > pwf->nelm) len = pwf->nelm; \
>     for (i=0; i<(int)len; i++) pData[i] = value[i]; \
>     dbScanUnlock((dbCommon *)pwf); \
>
> Mark
>
> -----Original Message-----
> From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu]
> Sent: Tuesday, July 24, 2012 10:41 PM
> To: Mark Rivers
> Subject: RE: asynPortDriver
>
> Hi Mark,
> Asyn does not build after I patch the file.  I get:
> ../../asyn/devEpics/devAsynInt8Array.c: In function 'callbackWf':
> ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no member
> named 'timestamp'
> ../../asyn/devEpics/devAsynInt8Array.c: In function 'interruptCallbackInput':
> ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no member
> named 'timestamp'
> make[2]: *** [devAsynInt8Array.o] Error 1
>
> Zen
>
>
> > -----Original Message-----
> > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > Sent: Tuesday, July 24, 2012 3:35 PM
> > To: Szalata, Zenon M.
> > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Zen,
> >
> > I think we were on the right track with the gotValue check, but the
> > timing problem is a little different.
> >
> > Here is the problem I thought might be happening:
> >
> > - Your driver calls doCallbacksInt32Array.
> > - This results in a call to interruptCallbackInput in
> > asyn/devEpics/devAsynXXXArray.h
> > - interruptCallbackInput copies the data to the waveform record VAL
> > field and then calls scanIoRequest
> > - Another interrupt occurs before the record processes and calls
> > devAsynXXXArray::processCommon
> > - gotValue will be 1 because the processCommon has not yet reset
> > gotValue=0
> >
> > You showed that this problem can happen, but only at higher callback rates.
> >
> > Here is the problem I now think is happening:
> > - Your driver calls doCallbacksInt32Array.
> > - This results in a call to interruptCallbackInput in
> > asyn/devEpics/devAsynXXXArray.h
> > - interruptCallbackInput copies the data to the waveform record VAL
> > field and then calls scanIoRequest
> > - scanIoRequest calls processCommon, which sets gotValue=0
> > - Another interrupt occurs before the record has finished posting
> > monitors.  It overwrites the data in the record.  It calls scanIoRequest again.
> > - The waveform record then processes twice with the data from the
> > second callback.
> >
> > This appears to be what you are seeing.
> >
> > I just realized that the problem is that there no mutex preventing the
> > callback from overwriting the waveform data at any time when the
> > waveform record is processing, which is bad.  Note that is problem is
> > confined to I/O Intr scanned waveform records, it does not happen on
> > other records that use a ring buffer.
> >
> > I think the solution is to call dbScanLock during the time the array
> > data is being modified in the callback.
> >
> > Here is a patch against asyn R4-19.  Can you try it?  It also sets the
> > timestamp from the callback.
> >
> > corvette:asyn/asyn/devEpics>svn diff -r1995 devAsynXXXArray.h
> > Index: devAsynXXXArray.h
> >
> =============================================================
> ======
> > --- devAsynXXXArray.h   (revision 1995)
> > +++ devAsynXXXArray.h   (working copy)
> > @@ -254,6 +254,7 @@
> >          pPvt->pasynUser, pwf->bptr, pwf->nelm, &nread); \
> >      asynPrint(pasynUser, ASYN_TRACEIO_DEVICE, \
> >                "%s %s::callbackWf\n", pwf->name, driverName); \
> > +    pwf->time = pasynUser->timestamp; \
> >      if (status == asynSuccess) { \
> >          pwf->udf=0; \
> >          pwf->nord = nread; \
> > @@ -280,8 +281,11 @@
> >          (char *)value, len*sizeof(EPICS_TYPE), \
> >          "%s %s::interruptCallbackInput\n", \
> >          pwf->name, driverName); \
> > +    dbScanLock((dbCommon *)pwf); \
> >      if (len > pwf->nelm) len = pwf->nelm; \
> >      for (i=0; i<(int)len; i++) pData[i] = value[i]; \
> > +    pwf->time = pasynUser->timestamp; \
> > +    dbScanUnlock((dbCommon *)pwf); \
> >      pPvt->gotValue = 1; \
> >      pPvt->nord = len; \
> >      if (pPvt->status == asynSuccess) pPvt->status =
> > pasynUser->auxStatus; \
> >
> >
> >
> > Mark
> >
> >
> > -----Original Message-----
> > From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu]
> > Sent: Tuesday, July 24, 2012 4:47 PM
> > To: Mark Rivers
> > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Mark,
> > I did couple of things to help understand what is happening.
> > 1. I put a printf statement in interruptCallbackInput to print a
> > message when gotValue not equals 0.
> >     This message does not get printed when the error condition occurs.
> > I tested that the message gets
> >     printed by increasing the trigger rate until the IOC "fall apart"
> > and those messages then come out abundantly.
> >
> > 2. I have simplified my diagnostic mechanism to remove as much
> > ambiguity as possible to the point
> >     that I think the problem happens between posting the data in the
> > device driver and the waveform
> >     record processing (I/O Intr).
> >
> > My diagnostic mechanism consists of two sets of EPICS records.
> > The data are passed from the device driver as an array of 17 integers.
> > The first is a gate counter, which is incremented just before it is put in the
> array.
> > This counter is followed by 16 ADC values.  The array is received in a
> > waveform record.  The waveform record triggers processing of an aSub
> > record and a chain of sub array records where the ADC values are
> > extracted and put into ai records.
> > The aSub record compares the value of the gate counter with the value
> > of gate counter from a previous event.  If this value is not greater
> > by 1 than the previous value, it prints a message, which shows this and
> previous values.
> >
> > The second set of diagnostic records is processed when the gate
> > counter is extracted from the waveform record and put into longin
> > record.  there is a sub record which does similar checking, but on
> > discrete data and prints a message if the counter value did not change
> > by 1.  Printed outputs from both sets indicate the same problem.
> >
> > Here is a typical output:
> >
> > asubMiss: Something wrong: n=53385,lastn=53383
> > asubMiss: Something wrong: n=53385,lastn=53385
> > subASMissing: got a missing ++++++++++++++++
> >
> a=53385.000,b=0.000,c=4294967295.000,d=1.000,e=53383.000,f=1.000,g=1.
> 0
> > 00
> > k=2.000,l=1.000,val=2.000
> > subASMissing: trigger number this=53385.0, prev=53385.0
> >
> > The first two messages come from the aSub record, the next three come
> > from the sub record.
> > I am not sure how to interpret the presence of the second of the two
> > messages from the aSub record and the third from the sub record.  It
> > looks as if the waveform record processed twice when the problem
> occurred.
> >
> > The sub record message prints the values of the fields a through l,
> > where a is this counter and e is previous counter.
> >
> > Thank you Mark for looking at this,
> > Zen
> >
> >
> > > -----Original Message-----
> > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > Sent: Thursday, July 19, 2012 6:10 PM
> > > To: Szalata, Zenon M.
> > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Zen,
> > >
> > > What is the diagnostic mechanism you are using to determine that the
> > > counter has incremented by 2?  Is there a possibility that the
> > > waveform record is actually processing every callback OK, but that
> > > arrays are being dropped between the waveform record and whatever
> > > diagnostic client you are running?
> > >
> > > Mark
> > >
> > > ________________________________________
> > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > Sent: Thursday, July 19, 2012 5:23 PM
> > > To: Mark Rivers
> > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Mark,
> > > I modified my version 2 asynPortDriver based device driver so that now it:
> > > 1. in ISR (at interrupt level) it read data from all 16 channels and
> > > sends the data by message queue.
> > > 2. In a high priority thread the data is received from the message
> > > queue, unpacked, and posted as an array.
> > >     The unpacked data are put into an array where the first element
> > > is a counter, which is incremented in the
> > >     data unpacking thread, and elements 1 through 16 are the ADC values.
> > > In the db layer I still see missing data.  Once in a while the
> > > counter value, which is the first element in the data array, is
> > > larger by 2 than the previous counter value.  The test is done at 100Hz.
> > >
> > > The version 3 of the device driver, which is not using asyn, does
> > > not exhibit this behavior.  In fact I can run the system with a
> > > trigger rate of up to 400Hz before it starts falling apart.  Since I
> > > have a device driver that works (version 3) I could just use it and move on.
> > > But at this point I am still reluctant to just abandon the
> > > asynPortDriver version.  Can you think of some tests that could help
> > understand what is causing this problem?
> > > Thanks,
> > > Zen
> > >
> > > > -----Original Message-----
> > > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > > Sent: Sunday, July 15, 2012 6:59 PM
> > > > To: Szalata, Zenon M.
> > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Zen,
> > > >
> > > > > In the first version of the device driver, I saw the effect that
> > > > > Kevin Tsubota described in his reply to my email. This is what he said:
> > > >
> > > > > "At Keck we ran into a similar problem where we weren't seeing
> > > > > all the events we expected with a different driver and it turned
> > > > > out to be that ASYN
> > > > only posts on change."
> > > >
> > > > OK, there are 3 parts to my reply about that:
> > > >
> > > > 1) This is not an issue with ASYN in general, it is specific to
> > asynPortDriver.
> > > >
> > > > 2) A future version of asynPortDriver should have a variant of the
> > > > setXXXParam that takes a "force" flag, i.e.
> > > > setIntegerParam(addr, myParam, value, forceCallback) which will
> > > > force a callback the next time callParamCallbacks(addr) is called,
> > > > even if the value has not changed.
> > > >
> > > > 3) Meanwhile it is trivial to force such behavior in the current
> > > > version by doing setIntegerParam(addr, myParam, value+1);
> > > > setIntegerParam(addr, myParam, value);
> > > >
> > > > This will fool the parameter library into thinking that the value
> > > > has changed so it will do the callback on "value".  The overhead
> > > > of doing the additional call to setIntegerParam is insignificant.
> > > >
> > > > > In all three version of the device driver The data is read out
> > > > > at interrupt level, which is needed to clear the interrupt.
> > > > > Then I schedule a callback to continue processing the data.
> > > > > This consists of
> > > > unpacking the data  words into various components.
> > > >
> > > > I would have to see your code, but what you are doing sounds dangerous.
> > > > What if a second interrupt occurs before the callback occurs to
> > > > process the first?  If your data is not put in a queue, then you
> > > > risk missing data.  In my Ip330 and ipUnidig drivers I send the
> > > > data from the interrupt service routine to the thread that will do
> > > > callbacks via an epicsMessageQueue.  Then even if the thread that
> > > > does callbacks momentarily falls behind I will not lose data
> > > > unless the message queue
> > > overflows, and I have a flag to signal if that occurs.
> > > >
> > > > Mark
> > > >
> > > > ________________________________________
> > > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > > Sent: Sunday, July 15, 2012 8:20 PM
> > > > To: Mark Rivers
> > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Mark,
> > > > In the first version of the device driver, I saw the effect that
> > > > Kevin Tsubota described in his reply to my email.  This is what he said:
> > > >
> > > > "At Keck we ran into a similar problem where we weren't seeing all
> > > > the events we expected with a different driver and it turned out
> > > > to be that ASYN only posts on change."
> > > >
> > > > In all three version of the device driver The data is read out at
> > > > interrupt level, which is needed to clear the interrupt.  Then I
> > > > schedule a callback to continue processing the data.  This
> > > > consists of unpacking the data words into various components.
> > > > In the case of the first version of the device driver, the
> > > > callback requests to process the records are done in a for loop.
> > > > Perhaps I have abandoned the first version too quickly.
> > > > Anyway it was clear to me that it would be better to bundle the
> > > > trigger number and data from all ADC channels and pass that to the
> > > > db
> > layer.
> > > >
> > > > I will modify the C subroutine of the subroutine record to also
> > > > print a message when the current gate number is less than the
> > > > previous
> > value.
> > > >
> > > > Thanks,
> > > > Zen
> > > >
> > > > > -----Original Message-----
> > > > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > > > Sent: Sunday, July 15, 2012 3:19 PM
> > > > > To: Szalata, Zenon M.
> > > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > > Subject: RE: asynPortDriver
> > > > >
> > > > > How are you passing the data from your interrupt service routine
> > > > > to the thread that is calling callParamCallbacks()?  Are you
> > > > > using a mechanism with a queue, like an epicsMessageQueue?
> > > > >
> > > > > I'd like to understand why "the order of record processing is
> > > > > somewhat random" in method 1.  I would think that it would be
> > > > > deterministic because there is only 1 callback thread that is
> > > > > processing all the records in the order in which the callback
> > > > > requests were issued, which I assume your driver does in a loop like:
> > > > >
> > > > > for (i=0; i<16; i++) {
> > > > >   callParamCallbacks(i);
> > > > > }
> > > > >
> > > > > Mark
> > > > >
> > > > > ________________________________
> > > > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > > > Sent: Sunday, July 15, 2012 12:15 PM
> > > > > To: Mark Rivers
> > > > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > > > Subject: asynPortDriver
> > > > >
> > > > > Hi Mark,
> > > > > I am trying to understand why my device driver for a CAEN VME
> > > > > gated ADC module, which is based on asynPortDriver class works
> > incorrectly.
> > > > > The module has 16 ADC channels and is setup to generate an
> > > > > interrupt for each gate signal.  The interrupt triggers data
> > > > > readout and record processing.  For testing, the trigger/gate
> > > > > rate is 100
> > Hz.
> > > > > The device driver maintains a gate counter, which gets
> > > > > incremented each time the data
> > > > is processed.
> > > > > So far I have three versions of the device driver.
> > > > >
> > > > > 1. In this version I have 16 I/O Intr scanned ai records.  The
> > > > > gate counter is pushed to a longin record.
> > > > >     This version has the problem that the order of record
> > > > > processing is somewhat random and the IOC
> > > > >     using this VME module requires the processing to be deterministic.
> > > > > To overcome this difficulty,
> > > > >    I wrote the second version.
> > > > >
> > > > > 2. In this version I pack the gate counter and all 16 ADC
> > > > > channel values into an array and this array gets
> > > > >     pushed to a waveform record.  Then, the data gets
> > > > > distributed to ai records using  sub array records.
> > > > >    This part works fine.  I have subroutine record which is
> > > > > processed each time the data waveform record
> > > > >    is processed.  It checks for missing triggers, by expecting
> > > > > that the gate counter value for this data array
> > > > >    is larger by 1 than the previous one.  This is where the
> > > > > device driver code seems to fail.
> > > > >    I let the IOC run for a bit over 35 hours.  More precisely
> > > > > the time corresponds to 12700898 triggers.
> > > > >    During this time the subroutine record reported 42 missing
> > > > > triggers.  They occurred spread out
> > > > >    throughout the run nearly but not equally spaced in time. I
> > > > > have a simple print statement
> > > > >    in the C routine which prints a few numbers.  These numbers
> > > > > tell me that the trigger incremented by 2.
> > > > >    I just looked at the logic in the C routine and I see that I
> > > > > am not printing messages when the new gate
> > > > >    counter is less than the previous one.  So it is possible
> > > > > that from time to time the waveform record gets
> > > > >    the data out of order.  It is also possible, but I think
> > > > > unlikely, that from time to time the data is lost.
> > > > >
> > > > > 3. This version of the device driver is non-asyn.  It consists
> > > > > of two parts, device support and record support.
> > > > >     I wrote it, by taking the second version of the device
> > > > > driver and replaced the asynPortDriver part with
> > > > >    device support.  Also, I created a new version of the IOC, by
> > > > > taking the IOC which used the second version of
> > > > >    the device driver and modifying the db files and the st.cmd
> > > > > file as needed.  I ran this IOC version
> > > > >   overnight, so far it has processed 4 and half million triggers
> > > > > and not a single missed trigger.
> > > > >
> > > > > I wonder if the behavior observed using the second version of
> > > > > the device driver is to be expected?
> > > > > Does it mean, that for some applications the asynPortDriver
> > > > > approach to device driver implementation might be not appropriate?
> > > > > Zen



Replies:
RE: asynPortDriver Szalata, Zenon M.
References:
asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.

Navigate by Date:
Prev: RE: asynPortDriver Szalata, Zenon M.
Next: More than 16 enumerations mbbi / mbbo Hinko Kocevar
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
Navigate by Thread:
Prev: RE: asynPortDriver Szalata, Zenon M.
Next: RE: asynPortDriver Szalata, Zenon M.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·