EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: asyn+streamDevice hangs with synchronousLock:Yes
From: Mark Rivers <[email protected]>
To: "'Brian McAllister'" <[email protected]>
Cc: "'[email protected]'" <[email protected]>
Date: Mon, 8 Jul 2013 17:46:03 +0000
Brian,

> I had guessed that there was a deadlock involving another mutex created by StreamDevice, but that does not seem to be the case.

I spoke too soon there.  When you get the hang you are seeing these 2 mutexes always locked:

epicsMutexId 0x894b988 source ../../asyn/asynDriver/asynManager.c line 1868
epicsMutexId 0x89a2710 source ../../../src/libCom/osi/epicsMutex.cpp line 222

I did not see a epicsMutex being created in StreamDevice, but I was wrong.  When epicsMutexCreate() is called from C code it stores the file and line number where the mutex was created.  However, StreamDevice creates an epicsMutex using the C++ constructor for the epicsMutex class, rather than directly calling epicsMutexCreate().  The problem is that the epicsMutex::epicsMutex constructor (in base/src/libCom/osi/epicsMutex.cpp) itself calls epicsMutexCreate(), so the file and line number that are stored for that mutex are the file and line number of the C++ constructor, not the location where the constructor is called from, so it is always the same!  This should be fixed in a future version of EPICS base, since it would be more useful to know where the epicsMutex constructor was called from.

Fortunately StreamDevice appears to only create a single epicsMutex object, and this occurs in the constructor for the Stream class.

It looks to me like there is a subtle bug in the StreamDevice + asyn that is creating a classic deadlock.

This could occur under the following circumstances:

Thread #1 made calls in this order:
   pasynManager->lockPort();
   Stream::mutex.lock();

Thread #2 made calls in this order:
  Stream::mutex.lock();
  pasynManager->lockPort();

The deadlock occurs when Thread #1 calls pasynManager->lockPort() and then Thread #2 immediately calls Stream::mutex.lock().  Now each thread is blocked trying to take the mutex that the other thread has locked.

I don't think it's possible for the deadlock to be directly caused by code in asyn, because asyn does not know anything about the StreamDevice mutex.

My questions are:

- Why have we not seen this before if it occurs with a single I/O Intr scanned record using StreamDevice?  I think that many sites are doing that.

- Does this occur in a "stripped down" IOC that is only running this single StreamDevice record and nothing else?  Could it be caused by some other code in your IOC?

Mark




-----Original Message-----
From: Mark Rivers 
Sent: Monday, July 08, 2013 10:01 AM
To: 'Brian McAllister'
Cc: Dirk Zimoch; [email protected]
Subject: RE: asyn+streamDevice hangs with synchronousLock:Yes 

> The reference to asynManager.c is in registerPort().  It seems odd that
> this function would be called other than during iocInit(), and that it
> would get stuck trying to create what should be a new mutex.

registerPort() not getting called again.  epicsMutexShowAll reports the line in which the mutex was originally created, which in this case happened before iocInit when registerPort() was called.

I had guessed that there was a deadlock involving another mutex created by StreamDevice, but that does not seem to be the case.

It would be helpful if you could run the gdb debugger on the IOC application when it is hung up.

Do that as follows:

- Your IOC is running as a process on the Linux system.  Get its process ID using the "ps" command.

For example, here I will debug the testAsynPortDriver IOC application:
corvette:asyn/iocBoot/ioctestAsynPortDriver>ps ax | grep testAsyn
26058 pts/33   Sl+    0:00 ../../bin/linux-x86/testAsynPortDriver st.cmd
26188 pts/9    S+     0:00 grep --color=auto testAsyn

So the IOC process ID is 26058.

- Start gdb as follows:

gdb [path_to_IOC_application] [process_ID]

For my example above:
corvette:asyn/iocBoot/ioctestAsynPortDriver>gdb ../../bin/linux-x86/testAsynPortDriver 26058

Once the debugger starts issue the following command:

(gdb) thread apply all backtrace

For example, the beginning of the output for my example is:
Thread 22 (Thread 0xf73f7b70 (LWP 26059)):
#0  0xf775d430 in __kernel_vsyscall ()
#1  0x4d3b84f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x4d301d84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0xf75b81b8 in condTimedwait (time=0xf73f7218, mutexId=0x97cfa58, condId=0x97cfa70) at ../../../src/libCom/osi/os/posix/osdEvent.c:65
#4  epicsEventWaitWithTimeout (pevent=0x97cfa58, timeout=1.7976931348623157e+308) at ../../../src/libCom/osi/os/posix/osdEvent.c:156
#5  0xf75b0bf5 in epicsEvent::wait(double) () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#6  0xf75be805 in timerQueueActive::run() () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#7  0xf75af312 in epicsThreadCallEntryPoint () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#8  0xf75b5cad in start_routine (arg=0x97cfbd0) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#9  0x4d3b4a2e in start_thread () from /lib/libpthread.so.0
#10 0x4d2f381e in clone () from /lib/libc.so.6

Thread 21 (Thread 0xf7376b70 (LWP 26060)):
#0  0xf775d430 in __kernel_vsyscall ()
#1  0x4d3b84f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x4d301d84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0xf75b81b8 in condTimedwait (time=0xf7376218, mutexId=0x97cff88, condId=0x97cffa0) at ../../../src/libCom/osi/os/posix/osdEvent.c:65
#4  epicsEventWaitWithTimeout (pevent=0x97cff88, timeout=1.7976931348623157e+308) at ../../../src/libCom/osi/os/posix/osdEvent.c:156
#5  0xf75b0bf5 in epicsEvent::wait(double) () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#6  0xf75be805 in timerQueueActive::run() () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#7  0xf75af312 in epicsThreadCallEntryPoint () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#8  0xf75b5cad in start_routine (arg=0x97d0100) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#9  0x4d3b4a2e in start_thread () from /lib/libpthread.so.0
#10 0x4d2f381e in clone () from /lib/libc.so.6


That will give a stack trace of every thread, and we can see if there are multiple threads waiting on mutexes, and what code those threads are executing.

Mark


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Brian McAllister
Sent: Monday, July 08, 2013 9:48 AM
To: Mark Rivers
Cc: Dirk Zimoch; [email protected]
Subject: Re: asyn+streamDevice hangs with synchronousLock:Yes 

>>> On 7/8/2013 at 3:26:08 -0000, Mark Rivers wrote:

  > That is interesting.  When it is hung and you type

  > epicsMutexShowAll, 1

  > repeatedly, do you always see the same thing, i.e.

  >> ellCount(&mutexList) 117 ellCount(&freeList) 12
  >> epicsMutexId 0x894b988 source ../../asyn/asynDriver/asynManager.c line 1868
  >> epicsMutexId 0x89a2710 source ../../../src/libCom/osi/epicsMutex.cpp line 222

Yes.  Once it gets stuck, that report doesn't change.

The reference to asynManager.c is in registerPort().  It seems odd that
this function would be called other than during iocInit(), and that it
would get stuck trying to create what should be a new mutex.

- brian


References:
RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers

Navigate by Date:
Prev: RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
Next: Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
Next: Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·