g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
<== Date ==> <== Thread ==>

Subject: RE: asynPortDriver
From: Mark Rivers <rivers@cars.uchicago.edu>
To: "'Szalata, Zenon M.'" <zms@slac.stanford.edu>
Cc: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Tue, 24 Jul 2012 22:35:13 +0000
Hi Zen,

I think we were on the right track with the gotValue check, but the timing problem is a little different.

Here is the problem I thought might be happening:

- Your driver calls doCallbacksInt32Array.
- This results in a call to interruptCallbackInput in asyn/devEpics/devAsynXXXArray.h
- interruptCallbackInput copies the data to the waveform record VAL field and then calls scanIoRequest
- Another interrupt occurs before the record processes and calls devAsynXXXArray::processCommon
- gotValue will be 1 because the processCommon has not yet reset gotValue=0

You showed that this problem can happen, but only at higher callback rates.

Here is the problem I now think is happening:
- Your driver calls doCallbacksInt32Array.
- This results in a call to interruptCallbackInput in asyn/devEpics/devAsynXXXArray.h
- interruptCallbackInput copies the data to the waveform record VAL field and then calls scanIoRequest
- scanIoRequest calls processCommon, which sets gotValue=0
- Another interrupt occurs before the record has finished posting monitors.  It overwrites the data in the record.  It calls scanIoRequest again.
- The waveform record then processes twice with the data from the second callback.

This appears to be what you are seeing.

I just realized that the problem is that there no mutex preventing the callback from overwriting the waveform data at any time when the waveform record is processing, which is bad.  Note that is problem is confined to I/O Intr scanned waveform records, it does not happen on other records that use a ring buffer.

I think the solution is to call dbScanLock during the time the array data is being modified in the callback.

Here is a patch against asyn R4-19.  Can you try it?  It also sets the timestamp from the callback.

corvette:asyn/asyn/devEpics>svn diff -r1995 devAsynXXXArray.h
Index: devAsynXXXArray.h
===================================================================
--- devAsynXXXArray.h   (revision 1995)
+++ devAsynXXXArray.h   (working copy)
@@ -254,6 +254,7 @@
         pPvt->pasynUser, pwf->bptr, pwf->nelm, &nread); \
     asynPrint(pasynUser, ASYN_TRACEIO_DEVICE, \
               "%s %s::callbackWf\n", pwf->name, driverName); \
+    pwf->time = pasynUser->timestamp; \
     if (status == asynSuccess) { \
         pwf->udf=0; \
         pwf->nord = nread; \
@@ -280,8 +281,11 @@
         (char *)value, len*sizeof(EPICS_TYPE), \
         "%s %s::interruptCallbackInput\n", \
         pwf->name, driverName); \
+    dbScanLock((dbCommon *)pwf); \
     if (len > pwf->nelm) len = pwf->nelm; \
     for (i=0; i<(int)len; i++) pData[i] = value[i]; \
+    pwf->time = pasynUser->timestamp; \
+    dbScanUnlock((dbCommon *)pwf); \
     pPvt->gotValue = 1; \
     pPvt->nord = len; \
     if (pPvt->status == asynSuccess) pPvt->status = pasynUser->auxStatus; \



Mark


-----Original Message-----
From: Szalata, Zenon M. [mailto:zms@slac.stanford.edu] 
Sent: Tuesday, July 24, 2012 4:47 PM
To: Mark Rivers
Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
Subject: RE: asynPortDriver

Hi Mark,
I did couple of things to help understand what is happening.
1. I put a printf statement in interruptCallbackInput to print a message when gotValue not equals 0.
    This message does not get printed when the error condition occurs.  I tested that the message gets
    printed by increasing the trigger rate until the IOC "fall apart" and those messages then come out abundantly.

2. I have simplified my diagnostic mechanism to remove as much ambiguity as possible to the point
    that I think the problem happens between posting the data in the device driver and the waveform
    record processing (I/O Intr).

My diagnostic mechanism consists of two sets of EPICS records.
The data are passed from the device driver as an array of 17 integers.  The first is a gate counter, which is incremented just before it is put in the array.  This counter is followed by 16 ADC values.  The array is received in a waveform record.  The waveform record triggers processing of an aSub record and a chain of sub array records where the ADC values are extracted and put into ai records.
The aSub record compares the value of the gate counter with the value of gate counter from a previous event.  If this value is not greater by 1 than the previous value, it prints a message, which shows this and previous values.

The second set of diagnostic records is processed when the gate counter is extracted from the waveform record and put into longin record.  there is a sub record which does similar checking, but on discrete data and prints a message if the counter value did not change by 1.  Printed outputs from both sets indicate the same problem.

Here is a typical output:

asubMiss: Something wrong: n=53385,lastn=53383
asubMiss: Something wrong: n=53385,lastn=53385
subASMissing: got a missing ++++++++++++++++
a=53385.000,b=0.000,c=4294967295.000,d=1.000,e=53383.000,f=1.000,g=1.000
k=2.000,l=1.000,val=2.000
subASMissing: trigger number this=53385.0, prev=53385.0

The first two messages come from the aSub record, the next three come from the sub record.
I am not sure how to interpret the presence of the second of the two messages from the aSub record and the third from the sub record.  It looks as if the waveform record processed twice when the problem occurred.

The sub record message prints the values of the fields a through l, where a is this counter and e is previous counter.

Thank you Mark for looking at this,
Zen


> -----Original Message-----
> From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> Sent: Thursday, July 19, 2012 6:10 PM
> To: Szalata, Zenon M.
> Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> Subject: RE: asynPortDriver
> 
> Hi Zen,
> 
> What is the diagnostic mechanism you are using to determine that the counter
> has incremented by 2?  Is there a possibility that the waveform record is
> actually processing every callback OK, but that arrays are being dropped
> between the waveform record and whatever diagnostic client you are
> running?
> 
> Mark
> 
> ________________________________________
> From: Szalata, Zenon M. [zms@slac.stanford.edu]
> Sent: Thursday, July 19, 2012 5:23 PM
> To: Mark Rivers
> Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> Subject: RE: asynPortDriver
> 
> Hi Mark,
> I modified my version 2 asynPortDriver based device driver so that now it:
> 1. in ISR (at interrupt level) it read data from all 16 channels and sends the data
> by message queue.
> 2. In a high priority thread the data is received from the message queue,
> unpacked, and posted as an array.
>     The unpacked data are put into an array where the first element is a counter,
> which is incremented in the
>     data unpacking thread, and elements 1 through 16 are the ADC values.
> In the db layer I still see missing data.  Once in a while the counter value, which
> is the first element in the data array, is larger by 2 than the previous counter
> value.  The test is done at 100Hz.
> 
> The version 3 of the device driver, which is not using asyn, does not exhibit this
> behavior.  In fact I can run the system with a trigger rate of up to 400Hz before
> it starts falling apart.  Since I have a device driver that works (version 3) I could
> just use it and move on.  But at this point I am still reluctant to just abandon the
> asynPortDriver version.  Can you think of some tests that could help
> understand what is causing this problem?
> Thanks,
> Zen
> 
> > -----Original Message-----
> > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > Sent: Sunday, July 15, 2012 6:59 PM
> > To: Szalata, Zenon M.
> > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Zen,
> >
> > > In the first version of the device driver, I saw the effect that
> > > Kevin Tsubota described in his reply to my email. This is what he said:
> >
> > > "At Keck we ran into a similar problem where we weren't seeing all
> > > the events we expected with a different driver and it turned out to
> > > be that ASYN
> > only posts on change."
> >
> > OK, there are 3 parts to my reply about that:
> >
> > 1) This is not an issue with ASYN in general, it is specific to asynPortDriver.
> >
> > 2) A future version of asynPortDriver should have a variant of the
> > setXXXParam that takes a "force" flag, i.e.
> > setIntegerParam(addr, myParam, value, forceCallback) which will force
> > a callback the next time callParamCallbacks(addr) is called, even if
> > the value has not changed.
> >
> > 3) Meanwhile it is trivial to force such behavior in the current
> > version by doing setIntegerParam(addr, myParam, value+1);
> > setIntegerParam(addr, myParam, value);
> >
> > This will fool the parameter library into thinking that the value has
> > changed so it will do the callback on "value".  The overhead of doing
> > the additional call to setIntegerParam is insignificant.
> >
> > > In all three version of the device driver The data is read out at
> > > interrupt level, which is needed to clear the interrupt.  Then I
> > > schedule a callback to continue processing the data.  This consists
> > > of
> > unpacking the data  words into various components.
> >
> > I would have to see your code, but what you are doing sounds dangerous.
> > What if a second interrupt occurs before the callback occurs to
> > process the first?  If your data is not put in a queue, then you risk
> > missing data.  In my Ip330 and ipUnidig drivers I send the data from
> > the interrupt service routine to the thread that will do callbacks via
> > an epicsMessageQueue.  Then even if the thread that does callbacks
> > momentarily falls behind I will not lose data unless the message queue
> overflows, and I have a flag to signal if that occurs.
> >
> > Mark
> >
> > ________________________________________
> > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > Sent: Sunday, July 15, 2012 8:20 PM
> > To: Mark Rivers
> > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Mark,
> > In the first version of the device driver, I saw the effect that Kevin
> > Tsubota described in his reply to my email.  This is what he said:
> >
> > "At Keck we ran into a similar problem where we weren't seeing all the
> > events we expected with a different driver and it turned out to be
> > that ASYN only posts on change."
> >
> > In all three version of the device driver The data is read out at
> > interrupt level, which is needed to clear the interrupt.  Then I
> > schedule a callback to continue processing the data.  This consists of
> > unpacking the data words into various components.
> > In the case of the first version of the device driver, the callback
> > requests to process the records are done in a for loop.  Perhaps I
> > have abandoned the first version too quickly.
> > Anyway it was clear to me that it would be better to bundle the
> > trigger number and data from all ADC channels and pass that to the db layer.
> >
> > I will modify the C subroutine of the subroutine record to also print
> > a message when the current gate number is less than the previous value.
> >
> > Thanks,
> > Zen
> >
> > > -----Original Message-----
> > > From: Mark Rivers [mailto:rivers@cars.uchicago.edu]
> > > Sent: Sunday, July 15, 2012 3:19 PM
> > > To: Szalata, Zenon M.
> > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > How are you passing the data from your interrupt service routine to
> > > the thread that is calling callParamCallbacks()?  Are you using a
> > > mechanism with a queue, like an epicsMessageQueue?
> > >
> > > I'd like to understand why "the order of record processing is
> > > somewhat random" in method 1.  I would think that it would be
> > > deterministic because there is only 1 callback thread that is
> > > processing all the records in the order in which the callback
> > > requests were issued, which I assume your driver does in a loop like:
> > >
> > > for (i=0; i<16; i++) {
> > >   callParamCallbacks(i);
> > > }
> > >
> > > Mark
> > >
> > > ________________________________
> > > From: Szalata, Zenon M. [zms@slac.stanford.edu]
> > > Sent: Sunday, July 15, 2012 12:15 PM
> > > To: Mark Rivers
> > > Cc: tech-talk@aps.anl.gov; Williams Jr., Ernest L.
> > > Subject: asynPortDriver
> > >
> > > Hi Mark,
> > > I am trying to understand why my device driver for a CAEN VME gated
> > > ADC module, which is based on asynPortDriver class works incorrectly.
> > > The module has 16 ADC channels and is setup to generate an interrupt
> > > for each gate signal.  The interrupt triggers data readout and
> > > record processing.  For testing, the trigger/gate rate is 100 Hz.
> > > The device driver maintains a gate counter, which gets incremented
> > > each time the data
> > is processed.
> > > So far I have three versions of the device driver.
> > >
> > > 1. In this version I have 16 I/O Intr scanned ai records.  The gate
> > > counter is pushed to a longin record.
> > >     This version has the problem that the order of record processing
> > > is somewhat random and the IOC
> > >     using this VME module requires the processing to be deterministic.
> > > To overcome this difficulty,
> > >    I wrote the second version.
> > >
> > > 2. In this version I pack the gate counter and all 16 ADC channel
> > > values into an array and this array gets
> > >     pushed to a waveform record.  Then, the data gets distributed to
> > > ai records using  sub array records.
> > >    This part works fine.  I have subroutine record which is
> > > processed each time the data waveform record
> > >    is processed.  It checks for missing triggers, by expecting that
> > > the gate counter value for this data array
> > >    is larger by 1 than the previous one.  This is where the device
> > > driver code seems to fail.
> > >    I let the IOC run for a bit over 35 hours.  More precisely the
> > > time corresponds to 12700898 triggers.
> > >    During this time the subroutine record reported 42 missing
> > > triggers.  They occurred spread out
> > >    throughout the run nearly but not equally spaced in time. I have
> > > a simple print statement
> > >    in the C routine which prints a few numbers.  These numbers tell
> > > me that the trigger incremented by 2.
> > >    I just looked at the logic in the C routine and I see that I am
> > > not printing messages when the new gate
> > >    counter is less than the previous one.  So it is possible that
> > > from time to time the waveform record gets
> > >    the data out of order.  It is also possible, but I think
> > > unlikely, that from time to time the data is lost.
> > >
> > > 3. This version of the device driver is non-asyn.  It consists of
> > > two parts, device support and record support.
> > >     I wrote it, by taking the second version of the device driver
> > > and replaced the asynPortDriver part with
> > >    device support.  Also, I created a new version of the IOC, by
> > > taking the IOC which used the second version of
> > >    the device driver and modifying the db files and the st.cmd file
> > > as needed.  I ran this IOC version
> > >   overnight, so far it has processed 4 and half million triggers and
> > > not a single missed trigger.
> > >
> > > I wonder if the behavior observed using the second version of the
> > > device driver is to be expected?
> > > Does it mean, that for some applications the asynPortDriver approach
> > > to device driver implementation might be not appropriate?
> > > Zen



Replies:
Re: asynPortDriver Hinko Kocevar
Re: asynPortDriver Andrew Johnson
References:
asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.

Navigate by Date:
Prev: RE: asynPortDriver Szalata, Zenon M.
Next: Re: asynPortDriver Hinko Kocevar
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
Navigate by Thread:
Prev: RE: asynPortDriver Szalata, Zenon M.
Next: Re: asynPortDriver Hinko Kocevar
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·