EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: asynPortDriver deadlock when using asynOctet
From: [email protected]
To: Mark Rivers <[email protected]>
Cc: [email protected]
Date: Wed, 03 Jun 2015 19:05:14 +0200 (CEST)
Hi Mark,

Thanks for looking into this.

My driver indeed is synchronous, in that it does not set ASYN_CANBLOCK. But it returns from all read/write methods immediately (does not wait for any I/O to complete).

Regarding your attempt to reproduce this, it seems generally OK, but:

- I suggest you double check that the two records you're testing with (the waveform and the NoiseAmplitude) are indeed in the same lock-set. I just realized that the example I put down is actually insufficient because IS_RUNNING and RESULT_STRING are not connected by internal DB links. I think that a dummy calc record with INPA and INPB set to the two records (normal links not CA) will put them in the same lock-set.

- I think that the probability of the deadlock would be very small, the way you're testing it. In my application it happens every time due to the temporal coupling; a write to START parameter triggers both a synchronous read of IS_RUNNING and the asynchronous update of RESULT_STRING. You may be able to reproduce it more easily if you put a sleep in one of the two possible places, when one lock is taken but before the other, e.g. in the driver where it's doing an asynchronous update of the waveform, before it does callParamCallbacks().

Best regards,
Ambroz

----- Original Message -----
From: "Mark Rivers" <[email protected]>
To: "[email protected]" <[email protected]>, [email protected]
Sent: Wednesday, June 3, 2015 6:22:37 PM
Subject: RE: asynPortDriver deadlock when using asynOctet

Hi Ambroz,

Thanks for reporting this issue. 

Your analysis seems correct.  Now I am trying to figure out why we have not seen this problem previously, and how to reproduce it so that it fails quickly.

The deadlock should occur under the following conditions:

1) A record with SCAN=I/O Intr calls dbScanLock with the asynPortDriver lock held.

2) Another record tries to do a write or read from the same driver.  The record processing will call dbScanLock, and then the device support calls the asynPortDriver driver which calls asynPortDriver::lock().

The deadlock should occur when:

- Thread 1) has the asynPortDriver::lock() held

- Thread 2) calls dbScanLock and then asynPortDriver::lock.  It blocks on asynPortDriver::lock because thread 1) has it.

- Thread 1) does an interrupt callback and calls dbScanLock in the device support interrupt callback function.  We now have a deadlock.

I think we have not seen this problem before for 3 reasons:

1) The standard asyn device support for asynInt32, asynFloat64, and asynUInt32Digital uses ring buffers, so it does not need to call dbScanLock.  The exception are the asynEnum callbacks that occur when the enum choices for an mbbi/mbbo or bi/bo record change.  But that is very infrequent.

2) The only standard asyn device support that does call dbScanLock is asynOctet and asynXXXArray.  They call dbScanLock if ring buffers are not being used, which is the default case.

3) I believe that in thread 2) above the record lock will only be held when the record is talking to the driver if the asynPortDriver is synchronous, i.e. the ASYN_CANBLOCK flag is not set when the driver is created.  If the driver is asynchronous then the call to asynPortDriver::lock happens in another thread, with the dbScanLock lock not held.

I think we have not seen this because it only happens with asynOctet or asynXXXArray callbacks from synchronous drivers, which are relatively uncommon.

So my hypothesis is that your driver is synchronous, i.e. it did not set the ASYN_CANBLOCK flag.  Can you tell me if this is the case?

I just did a quick test with the testAsynPortDriver application that comes with asyn.  It is the digital scope simulator.  It does SCAN=I/O Intr callbacks on the waveform record.  I created a CA client task to quickly write to another PV, the NoiseAmplitude record.  The driver is synchronous, ASYN_CANBLOCK is not set.  This should have reproduced the problem, but in 15 minutes of testing with 10 Hz interrupt callbacks and 100 Hz CA puts I could not reproduce the problem, it did not get into a deadlock.  

Do you think that this should have had the same problem you saw?

Thanks,
Mark


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
Sent: Monday, June 01, 2015 11:03 AM
To: [email protected]
Subject: asynPortDriver deadlock when using asynOctet

Hello,

We've come across a deadlock issue when using asynPortDriver with a string parameter combined with a waveform record in I/O Intr scan mode.

We have a custom thread which locks the asynPortDriver port before updating some parameters and calling callParamCallbacks (still within the lock), which then deadlocks. Some debugging revealed that it blocks when the devEpics "interrupt callback" tries to lock the record's lock-set. Here:
https://svn.aps.anl.gov/trac/epics/asyn/browser/trunk/asyn/devEpics/devAsynOctet.c?rev=2517#L390 

The deadlock seems to be because just at the right time a non-IO-Intr input record attached to the same asynPortDriver is being processed from a DB scan thread, as part of some database logic triggered by CA monitors. This involves the asyn device support trying to lock the asynPortDriver while the same thread has the lock-set locked. So we have two threads taking the same two locks (port lock, lock-set lock) in a different order, and if they meet at just the right time, they deadlock.

The issue was introduced by asyn revision 2454. https://svn.aps.anl.gov/trac/epics/asyn/changeset/2454
Interestingly essentially the same issue is mentioned in the release notes for R4-10: "Undid the change that was done in R4-9 with direct calls to dbScanLock and process in the interrupt callback functions. This could lead to deadlocks in some circumstances."

It seems that it can be worked around by using the new queue feature for strings in R4-26.

Here's a (partial) backtrace of our thread doing the asyn callParamCallbacks:

#0  0x000000310ba0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000310ba09523 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x000000310ba09407 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004e4e46 in mutexLock (pmutex=0x284ad70) at ../../../src/libCom/osi/os/posix/osdMutex.c:44
#4  epicsMutexOsdLock (pmutex=0x284ad70) at ../../../src/libCom/osi/os/posix/osdMutex.c:116
#5  0x000000000047c1c3 in dbScanLock (precord=0x2796cf0) at ../dbLock.c:264
#6  0x0000000000422501 in interruptCallbackWaveform (drvPvt=0x283c120, pasynUser=0x2864af8, 
    data=0x7fac8404d280 "P4PADP_PpsSendCommandFailed", numchars=28, eomReason=<value optimized out>)
    at ../../asyn/devEpics/devAsynOctet.c:289
#7  0x0000000000420886 in paramList::octetCallback(int, int) ()
#8  0x000000000042102f in paramList::callCallbacks(int) ()
...
#11 0x00000000004e4067 in start_routine (arg=0x2721000) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#12 0x000000310ba079d1 in start_thread () from /lib64/libpthread.so.0
#13 0x000000310b6e88fd in clone () from /lib64/libc.so.6

Here's a backtrace of a DB processing thread doing a synchronous read from some asyn parameter via devAsynInt32:

#0  0x000000310ba0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000310ba09523 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x000000310ba09407 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004e4e46 in mutexLock (pmutex=0x271f260) at ../../../src/libCom/osi/os/posix/osdMutex.c:44
#4  epicsMutexOsdLock (pmutex=0x271f260) at ../../../src/libCom/osi/os/posix/osdMutex.c:116
#5  0x000000000041f8d0 in asynPortDriver::lock() ()
#6  0x000000000041c4d2 in readInt32 ()
#7  0x0000000000425dc5 in processCallbackInput (pasynUser=0x2817688) at ../../asyn/devEpics/devAsynInt32.c:425
#8  0x000000000043da18 in queueRequest (pasynUser=0x2817688, priority=asynQueuePriorityLow, timeout=0)
    at ../../asyn/asynDriver/asynManager.c:1483
#9  0x000000000042472d in processBi (pr=0x2794d70) at ../../asyn/devEpics/devAsynInt32.c:936
#10 0x0000000000450b7e in readValue (prec=0x2794d70) at ../biRecord.c:266
#11 process (prec=0x2794d70) at ../biRecord.c:131
#12 0x000000000047dc45 in dbProcess (precord=0x2794d70) at ../dbAccess.c:643
#13 0x000000000047da63 in dbGetLinkValue (plink=0x279e568, dbrType=8, pbuffer=0x279e908, poptions=0x0, 
    pnRequest=<value optimized out>) at ../dbAccess.c:850
#14 0x0000000000456635 in fetch_values (prec=0x279e2f0) at ../calcoutRecord.c:631
#15 process (prec=0x279e2f0) at ../calcoutRecord.c:220
#16 0x000000000047dc45 in dbProcess (precord=0x279e2f0) at ../dbAccess.c:643
#17 0x0000000000489e45 in onceTask (arg=<value optimized out>) at ../dbScan.c:513
#18 0x00000000004e4067 in start_routine (arg=0x284fd00) at ../../../src/libCom/osi/os/posix/osdThread.c:385

Additionally, I've put together some pseudo-code that may reproduce this, at least as far as I understand the issue. The port driver would be something like this.

asynStatus MyPort::writeInt32(...)
{
    asynStatus r = asynPortDriver::writeInt32(...);
    if (START parameter is being written) {
        setIntegerParam(IS_RUNNING, 1);
        set event object;
    }
    return r;
}

void MyPort::io_thread()
{
    while (1) {
        wait for event object;

        lock();
        setStringParam(RESULT_STRING, ...);
        setIntegerParam(IS_RUNNING, 0);
        callParamCallbacks();
        unlock();
    }
}

And the associated DB code...

record(calcout, "CHECK_RESULTS")
{
    ...
    field(INPA, "IS_RUNNING CPP")
    field(INPB, "IS_RUNNING_SYNC PP")
    ...
}

record(bi, "IS_RUNNING")
{
    field(DTYP, "asynInt32")
    field(SCAN, "I/O Intr")
    field(INP, "@asyn(...) IS_RUNNING")
}

record(bi, "IS_RUNNING_SYNC")
{
    field(DTYP, "asynInt32")
    field(INP, "@asyn(...) IS_RUNNING")     
}

record(waveform, "RESULT")
{
    field(DTYP, "asynOctetRead")
    field(SCAN, "I/O Intr")
    field(INP,  "@asyn(...) RESULT")
    ...
}

The intention here is for CHECK_RESULTS to process process whenever IS_RUNNING changes, but then synchronously read out the latest value of IS_RUNNING (and ignore the actual monitor value). It might seem like a strange thing to do, but it could as well be reading some other parameter in the same port. The scenario when this would lead to a deadlock is:
- CA client puts START=1. This changes IS_RUNNING parameter to 1, posting a monitor event for the CHECK_RESULTS record, and also wakes up the io_thread.
- (1) io_thread proceeds to lock the asynPortDriver, set parameters and callParamCallbacks(), which involves the interrupt callback for RESULT_STRING calling dbScanLock()...
- (2) At the same time the CHECK_RESULTS record is being processed due to the monitor event, which involves processing the IS_RUNNING_SYNC record, which involves reading the IS_RUNNING parameter, which involves locking the asynPortDriver...
- Threads for (1) and (2) deadlock when (1) holds the port lock and is wants to take the lock-set lock, and (2) holds the lock-set and wants to lock the port.

Best regards,
Ambroz

Replies:
RE: asynPortDriver deadlock when using asynOctet Mark Rivers
RE: asynPortDriver deadlock when using asynOctet Mark Rivers
References:
RE: asynPortDriver deadlock when using asynOctet Mark Rivers

Navigate by Date:
Prev: RE: asynPortDriver deadlock when using asynOctet Mark Rivers
Next: RE: asynPortDriver deadlock when using asynOctet Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: asynPortDriver deadlock when using asynOctet Mark Rivers
Next: RE: asynPortDriver deadlock when using asynOctet Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 16 Dec 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·