EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: asynPortDriver deadlock when using asynOctet
From: [email protected]
To: [email protected]
Date: Mon, 01 Jun 2015 18:03:00 +0200 (CEST)
Hello,

We've come across a deadlock issue when using asynPortDriver with a string parameter combined with a waveform record in I/O Intr scan mode.

We have a custom thread which locks the asynPortDriver port before updating some parameters and calling callParamCallbacks (still within the lock), which then deadlocks. Some debugging revealed that it blocks when the devEpics "interrupt callback" tries to lock the record's lock-set. Here:
https://svn.aps.anl.gov/trac/epics/asyn/browser/trunk/asyn/devEpics/devAsynOctet.c?rev=2517#L390 

The deadlock seems to be because just at the right time a non-IO-Intr input record attached to the same asynPortDriver is being processed from a DB scan thread, as part of some database logic triggered by CA monitors. This involves the asyn device support trying to lock the asynPortDriver while the same thread has the lock-set locked. So we have two threads taking the same two locks (port lock, lock-set lock) in a different order, and if they meet at just the right time, they deadlock.

The issue was introduced by asyn revision 2454. https://svn.aps.anl.gov/trac/epics/asyn/changeset/2454
Interestingly essentially the same issue is mentioned in the release notes for R4-10: "Undid the change that was done in R4-9 with direct calls to dbScanLock and process in the interrupt callback functions. This could lead to deadlocks in some circumstances."

It seems that it can be worked around by using the new queue feature for strings in R4-26.

Here's a (partial) backtrace of our thread doing the asyn callParamCallbacks:

#0  0x000000310ba0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000310ba09523 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x000000310ba09407 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004e4e46 in mutexLock (pmutex=0x284ad70) at ../../../src/libCom/osi/os/posix/osdMutex.c:44
#4  epicsMutexOsdLock (pmutex=0x284ad70) at ../../../src/libCom/osi/os/posix/osdMutex.c:116
#5  0x000000000047c1c3 in dbScanLock (precord=0x2796cf0) at ../dbLock.c:264
#6  0x0000000000422501 in interruptCallbackWaveform (drvPvt=0x283c120, pasynUser=0x2864af8, 
    data=0x7fac8404d280 "P4PADP_PpsSendCommandFailed", numchars=28, eomReason=<value optimized out>)
    at ../../asyn/devEpics/devAsynOctet.c:289
#7  0x0000000000420886 in paramList::octetCallback(int, int) ()
#8  0x000000000042102f in paramList::callCallbacks(int) ()
...
#11 0x00000000004e4067 in start_routine (arg=0x2721000) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#12 0x000000310ba079d1 in start_thread () from /lib64/libpthread.so.0
#13 0x000000310b6e88fd in clone () from /lib64/libc.so.6

Here's a backtrace of a DB processing thread doing a synchronous read from some asyn parameter via devAsynInt32:

#0  0x000000310ba0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000310ba09523 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x000000310ba09407 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004e4e46 in mutexLock (pmutex=0x271f260) at ../../../src/libCom/osi/os/posix/osdMutex.c:44
#4  epicsMutexOsdLock (pmutex=0x271f260) at ../../../src/libCom/osi/os/posix/osdMutex.c:116
#5  0x000000000041f8d0 in asynPortDriver::lock() ()
#6  0x000000000041c4d2 in readInt32 ()
#7  0x0000000000425dc5 in processCallbackInput (pasynUser=0x2817688) at ../../asyn/devEpics/devAsynInt32.c:425
#8  0x000000000043da18 in queueRequest (pasynUser=0x2817688, priority=asynQueuePriorityLow, timeout=0)
    at ../../asyn/asynDriver/asynManager.c:1483
#9  0x000000000042472d in processBi (pr=0x2794d70) at ../../asyn/devEpics/devAsynInt32.c:936
#10 0x0000000000450b7e in readValue (prec=0x2794d70) at ../biRecord.c:266
#11 process (prec=0x2794d70) at ../biRecord.c:131
#12 0x000000000047dc45 in dbProcess (precord=0x2794d70) at ../dbAccess.c:643
#13 0x000000000047da63 in dbGetLinkValue (plink=0x279e568, dbrType=8, pbuffer=0x279e908, poptions=0x0, 
    pnRequest=<value optimized out>) at ../dbAccess.c:850
#14 0x0000000000456635 in fetch_values (prec=0x279e2f0) at ../calcoutRecord.c:631
#15 process (prec=0x279e2f0) at ../calcoutRecord.c:220
#16 0x000000000047dc45 in dbProcess (precord=0x279e2f0) at ../dbAccess.c:643
#17 0x0000000000489e45 in onceTask (arg=<value optimized out>) at ../dbScan.c:513
#18 0x00000000004e4067 in start_routine (arg=0x284fd00) at ../../../src/libCom/osi/os/posix/osdThread.c:385

Additionally, I've put together some pseudo-code that may reproduce this, at least as far as I understand the issue. The port driver would be something like this.

asynStatus MyPort::writeInt32(...)
{
    asynStatus r = asynPortDriver::writeInt32(...);
    if (START parameter is being written) {
        setIntegerParam(IS_RUNNING, 1);
        set event object;
    }
    return r;
}

void MyPort::io_thread()
{
    while (1) {
        wait for event object;

        lock();
        setStringParam(RESULT_STRING, ...);
        setIntegerParam(IS_RUNNING, 0);
        callParamCallbacks();
        unlock();
    }
}

And the associated DB code...

record(calcout, "CHECK_RESULTS")
{
    ...
    field(INPA, "IS_RUNNING CPP")
    field(INPB, "IS_RUNNING_SYNC PP")
    ...
}

record(bi, "IS_RUNNING")
{
    field(DTYP, "asynInt32")
    field(SCAN, "I/O Intr")
    field(INP, "@asyn(...) IS_RUNNING")
}

record(bi, "IS_RUNNING_SYNC")
{
    field(DTYP, "asynInt32")
    field(INP, "@asyn(...) IS_RUNNING")     
}

record(waveform, "RESULT")
{
    field(DTYP, "asynOctetRead")
    field(SCAN, "I/O Intr")
    field(INP,  "@asyn(...) RESULT")
    ...
}

The intention here is for CHECK_RESULTS to process process whenever IS_RUNNING changes, but then synchronously read out the latest value of IS_RUNNING (and ignore the actual monitor value). It might seem like a strange thing to do, but it could as well be reading some other parameter in the same port. The scenario when this would lead to a deadlock is:
- CA client puts START=1. This changes IS_RUNNING parameter to 1, posting a monitor event for the CHECK_RESULTS record, and also wakes up the io_thread.
- (1) io_thread proceeds to lock the asynPortDriver, set parameters and callParamCallbacks(), which involves the interrupt callback for RESULT_STRING calling dbScanLock()...
- (2) At the same time the CHECK_RESULTS record is being processed due to the monitor event, which involves processing the IS_RUNNING_SYNC record, which involves reading the IS_RUNNING parameter, which involves locking the asynPortDriver...
- Threads for (1) and (2) deadlock when (1) holds the port lock and is wants to take the lock-set lock, and (2) holds the lock-set and wants to lock the port.

Best regards,
Ambroz

Replies:
RE: asynPortDriver deadlock when using asynOctet Mark Rivers

Navigate by Date:
Prev: MAXv controller limits Phil Brown
Next: Re: MAXv controller limits Ron Sluiter
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: MAXv controller limits Ron Sluiter
Next: RE: asynPortDriver deadlock when using asynOctet Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 16 Dec 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·