g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  Index 2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014 
<== Date ==> <== Thread ==>

Subject: RE: epicsEvent::invalidSemaphore exception in timerQueue
From: "Jeff Hill" <johill@lanl.gov>
To: "'Dirk Zimoch'" <dirk.zimoch@psi.ch>
Cc: "'EPICS core-talk'" <core-talk@aps.anl.gov>, "'ebner'" <simon.ebner@psi.ch>
Date: Thu, 6 May 2010 10:13:12 -0600
PS: One can attach to a timer queue by explicitly creating one, or by just
asking for a default. So many different codes can end up sharing a common
timer queue.

PPS: One of the recent fixes (after your version) was to add a try catch
block around the invocation of the timer expiration callback so that an
exception in the timer expiration callback will not cause the timer queue
facility to die. This type of flexibility wrt prioritized damage control is
one of the benefits of managing errors as exceptions (maybe no-longer a
topic of discussion in the Java community). 

PPPS: Maybe I am restating the obvious, but the throwing site is obviously a
C++ code which certainly points to the timer queue or the CA client library,
and maybe a few others. Nevertheless, suspect corruption if the C++ facility
isn't being shutdown as C++ is pretty good about keeping track of when the
event id needs to be destroyed (only only in the dtor).

Jeff
______________________________________________________
Jeffrey O. Hill           Email        johill@lanl.gov
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

Message content: TSPA


> -----Original Message-----
> From: Jeff Hill [mailto:johill@lanl.gov]
> Sent: Thursday, May 06, 2010 9:48 AM
> To: 'Dirk Zimoch'; 'EPICS'
> Cc: 'ebner'
> Subject: RE: epicsEvent::invalidSemaphore exception in timerQueue
> 
> Dirk,
> 
> One typically attaches the tornado debugger and request, for a particular
> thread, that it suspend execution at the point where the exception gets
> thrown instead of where it gets caught (currently in the epicsThread's
> last chance exception handler). This will provide a more useful stack
> trace.
> 
> Are you aware of a precipitating set of events that leads to the demise
> of this timer queue. Is there any facility that is being shut down at
> about the same time (just before this occurrence)? New device support?
> Does it happen shortly after the EPICS system starts?
> 
> Typically erroneous use of an invalid semaphore id results from ...
> A) memory corruption
> B) shutdown order issues where an object is used after it was destroyed
> 
> It _would_ be useful to determine what component of the IOC the timer
> queue belongs to. We should be able to make that determination if you
> send the output from the vxWorks "i" command (hopefully also identifying
> the task id of the culprit). The associated component probably can be
> inferred from the relative execution priorities of the timer queue to the
> spawning component.
> 
> I am the author of the timer queue (a new, heavily used, feature in
> R3.14). The timer queue thread typically spends most of its time waiting,
> with timeout, on an event semaphore. The event semaphore gets posted only
> when the timer queue needs to reschedule. That particular event semaphore
> would only become invalid if the timer queue was being shut down, the
> timer queue data structure was corrupted, or if the vxWorks kernel data
> structures were corrupted.
> 
> The timer queue class has a show diagnostic member function which is
> typically called by the diagnostic member function of its owner. So if
> you can find out who the timer queue belongs to then you can invoke its
> show function, at increased interest level, to find out if there is
> generalized corruption in its data structures. The tornado debugger can
> also help with quick surveys for corruption.
> 
> Also, see Mantis entries 336, 332, 320 which may be unrelated, but
> nevertheless do involve fixes to the timer queue facility after your
> version.
> 
> Jeff
> ______________________________________________________
> Jeffrey O. Hill           Email        johill@lanl.gov
> LANL MS H820              Voice        505 665 1831
> Los Alamos NM 87545 USA   FAX          505 665 5107
> 
> Message content: TSPA
> 
> 
> > -----Original Message-----
> > From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-
> > bounces@aps.anl.gov] On Behalf Of Dirk Zimoch
> > Sent: Thursday, May 06, 2010 7:51 AM
> > To: EPICS
> > Cc: ebner
> > Subject: epicsEvent::invalidSemaphore exception in timerQueue
> >
> > Hi all,
> >
> > We are having a strange problem where a TimerQueue task gets suspended
> > because of a invalidSemaphore exception and we can't wind out where.
> >
> > We are using EPICS 3.14.8 on vxWorks.
> >
> > Here is the error message:
> >
> > > 0x1fb87960 (timerQueue): Unhandled C++ exception resulted in call to
> > terminate
> > > epicsThread: Unexpected C++ exception
> "epicsEvent::invalidSemaphore()"
> > with type "Q210epicsEvent16invalidSemaphore" in thread "timerQueue" at
> > THU MAY 06 2010 10:19:21.310783060
> >
> > The dead task:
> > > timerQueue a94c08       1fb87960 148 SUSPEND      23d410 1fb87470
> > 3d0002     0
> >
> > And the stack trace:
> > >> tt 0x1fb87960
> > > 243c24 vxTaskEntry    +68 : a94c08 (&epicsThreadCallEntryPoint,
> > 1f9d24fc)
> > > a94c84 epicsThreadOnceOsd+174: a7b230 ()
> > > a7b230 epicsThreadCallEntryPoint+5c8: __cp_exception_info ()
> > > 15e2fc __cp_exception_info+0  : __default_unexpected(void) ()
> > > 15e2b8 set_terminate(void (*)(void))+0  : terminate(void) ()
> > > 15e2a8 __default_unexpected(void)+0  : cplusTerminate(void) ()
> > > 15d068 cplusTerminate(void)+50 : taskSuspend ()
> >
> > I found out that epicsEvent::invalidSemaphore is thrown by
> > epicsEvent::wait() and its variants when semTake failes for other
> > reasons than timeout (which can only be an invalid SEM_ID).
> > Unfortunately the stack trace is not very helpful to find out where the
> > actual error happened (thanks to the C++ exception mechanism).
> >
> > The error happens when (shortly after) the attached SNL program
> finishes
> > the entry block of state "active". We are using seq version 2.0.10.
> >
> > Any idea how to find out where the problem really is? Who might own the
> > timerQueue? What corrupted epicsEvent the timerQueue might wait for?
> How
> > the epicsEvent semaphore might got corrupted?
> >
> > Dirk



Navigate by Date:
Prev: Re: Experimental Sequencer Release Ralph Lange
Next: Re: Experimental Sequencer Release Andrew Johnson
Index: 2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014 
Navigate by Thread:
Prev: Re: Experimental Sequencer Release Benjamin Franksen
Next: [Question #110395]: push fails? Jeff Hill
Index: 2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·