EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: channel access security deadlock from asInit()
From: "Jeff Hill" <[email protected]>
To: "'Kay-Uwe Kasemir'" <[email protected]>, "'tech talk'" <[email protected]>
Date: Wed, 18 Apr 2007 12:56:01 -0600
Kay,

It looks like you have half (or less) of the bug in the jar.

You have the thread that has the AS lock and is calling rsrv where it is
taking its lock (presumably either eventqLock or chanListLock unless calls
to db_post_single_event, db_event_disable, db_event_enable, or
db_post_extra_labor are on the stack).

If it's a classic deadlock then we need to find the other half of the bug.
This is the other thread that has rsrv's lock (eventqLock, chanListLock, or
maybe a lock in the event queue system) and is calling AS. Unfortunately, I
don't see this in the source code after looking at all of the Access
Security library calls in rsrv so we may have a more complicated chain of
locks in the deadlock. 

In fact, after looking at all of the AS function calls in rsrv (see
attached) it appears that rsrv never holds any of its locks when it calls
AS. Note that asCheckPut and asCheckGet can be ignored during the analysis
because they are macros that don't take the AS locks.

I am looking at the latest sources on the R3.14 branch.

The bottom line is that if you can identify the other thread, or threads if
the situation is more complex, that is (are) participating in the deadlock,
and send their stack trace(s) as well, then identifying the cause of the
problem will be easier. Note that it isn't sufficient to find one thread
that is waiting for a lock. Once the deadlock occurs there can be many of
these. To positively identify a deadlock one must find thread one with lock
A pending for lock B, and also thread two with lock B pending for lock A
(more complex situation are also possible if the deadlock has more than two
locks in it).

I am willing to log into your IOC for a closer look if you would like to
have some assistance.

PS: also look at CPU consumption as infinite loops at higher priority can
also make low priority threads appear to be locked up.

Jeff

> -----Original Message-----
> From: Kay-Uwe Kasemir [mailto:[email protected]]
> Sent: Wednesday, April 18, 2007 10:06 AM
> To: tech talk
> Subject: channel access security deadlock from asInit()
> 
> Hello:
> 
> We're having problems with IOCs deadlocking when re-loading the
> Channel Access Security configuration files at runtime.
> Maybe somebody else has seen this, too?
> 
> When doing this via the subroutine record AS routines,
> a follow-up 'asDump' on the IOC shell will show that the file
> was indeed loaded OK, but no more connections are allowed.
> When instead running 'asInit' from the IOC shell, that hangs,
> and a CTRL-C shows a stack trace that's analyzed below.
> 
> This happens with both R3.14.7 and R3.14.8.2,
> on IOCs that do very different things.
> After an IOC reboot, 'asInit' calls work fine,
> and the access security files are of course OK as well.
> 
> By looking at the affected IOCs and their uptime, we assume
> that this is caused by something that happened on the SNS
> network between March 13 and April 4.
> In the past, we suspected the 'pure Java CA client' code
> to cause CA security related deadlocks. In principle,
> we don't expect any more pure Java CA client code on the
> SNS controls network, but we can't be 100% sure.
> On the other hand, this deadlock could be completely unrelated
> to Java CA, so forget all I said about Java CA.
> 
> Anyway, here's the stack trace with guesses based on
> about 2 hours of comparing the stack offsets with the vxWorks 'l'
> output and the 'gcc -S' PowerPC assembly code:
> 
> 1e1a54 yystart        +96c: asInit ()
> 1b5b1a8 asInit         +10 : 1b5ab3c () = asInitCommon()
> 1b5ac28 asTrapWriteRegisterListener+384: asInitFile ()
> 1b59564 asInitFile     +7c : asInitFP (34a7)
> 1b545dc asInitFP       +11c: asInitialize ()
> 1b542b0 asInitialize   +31c: 1b55ec0 ()
>    This is asAddMemberPvt()
> 
> 1b56018 asDumpMemFP    +49c: 1b561cc ()
>    This is asComputePvt()
> 
> 1b563c8 asDumpMemFP    +84c: 1b650dc ()
>    This must be the change-of-access-rights callback
>    at the end of asComputePvt():
>      if(pasgclient->pcallback && oldaccess!=access) {
>          (*pasgclient->pcallback)(pasgclient,asClientCOAR);
> 
> 1b6519c casUserNameInitiatingCurrentThread+1ebc: 1b64f5c ()
>    Based on a grep of the sources, the only code that registers for such
>    a callback is in rsrv/camessage.c, so this must be
> casAccessRightsCB(),
>    which calls epicsMutexMustLock(pclient->eventqLock)
> 
> 1b6502c casUserNameInitiatingCurrentThread+1d4c: epicsMutexLock ()
> 1a7a7c0 epicsMutexLock +24 : semTake ()
> 1f0c74 semTake        +13c: semMTake ()
>    Deadlock...
> 
> 
> -Kay


Replies:
nexplained hangup in semTake(), Re: channel access security deadlock from asInit() Kay-Uwe Kasemir
References:
channel access security deadlock from asInit() Kay-Uwe Kasemir

Navigate by Date:
Prev: Re: channel access security deadlock from asInit() Kay-Uwe Kasemir
Next: nexplained hangup in semTake(), Re: channel access security deadlock from asInit() Kay-Uwe Kasemir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: channel access security deadlock from asInit() Kay-Uwe Kasemir
Next: nexplained hangup in semTake(), Re: channel access security deadlock from asInit() Kay-Uwe Kasemir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Nov 2011 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·