EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: dbEvent.c line 665
From: "Jeff Hill" <[email protected]>
To: <[email protected]>
Date: Wed, 16 Nov 2005 11:16:44 -0700
Ok, I see what is occurring. This is the sequence of 
events. First a stack trace and then an explanation.

>	dbIoc.dll!db_close_events(void * ctx=0x00a4e138)  Line 337	C
 	dbIoc.dll!dbContext::~dbContext()  Line 85	C++
 	dbIoc.dll!dbContext::`scalar deleting destructor'()  + 0x16	C++
 	ca.dll!epics_auto_ptr<cacContext,0>::destroyTarget()  Line 52 + 0x22
C++
 	ca.dll!epics_auto_ptr<cacContext,0>::reset(cacContext *
pIn=0x00000000)  Line 112	C++
 	ca.dll!ca_client_context::~ca_client_context()  Line 190	C++
 	ca.dll!ca_client_context::`scalar deleting destructor'()  + 0x16
C++
 	ca.dll!ca_context_destroy()  Line 251 + 0x20	C++
 	dbIoc.dll!dbCaTask()  Line 946 + 0x8	C
 	Com.dll!epicsWin32ThreadEntry(void * lpParameter=0x00a2fc78)  Line
497 + 0xf	C
 	msvcr71d.dll!_threadstartex(void * ptd=0x00a32488)  Line 241 + 0xd
C
 	KERNEL32.DLL!lstrcmpiW()  + 0xb7	

User types exit.
The epics exit handlers are called.
The DB CA EPICS exit handler calls ca_context_destroy
The DB service is destroyed
The "dbEvent" event queuing system is destroyed

However, in the above sequence, the DBCA channels are *not* destroyed
and the scan tasks are not shutdown prior to calling
ca_context_destroy.

Therefore, scan tasks are still running that are posting events for
subscriptions into an event system that has already been destroyed.

A crash is of course guaranteed. It is not possible for the event
system to cancel the subscriptions when it is destroyed because
it does not keep a list of them.

Given that the records don't have destructors in R3.14 the easiest
Solution will be to create shutdown procedures for the database scan 
tasks to be run before the DBCA exit handler runs.

This issue will also apply to the sequencer (or any other CA client
running inside the IOC) it must destroiy its channels prior to calling
ca_context_destroy.

PS I have also attached a stack trace for the windows crash, but of 
course we expect to crash in many different ways depending on
timing now that we know the cause.

PPS presumably shutdown procedures for the CA server threads would 
also be needed eventually (mantis 125).

PPPS presumably we will run into problems shutting down also on 
vxWorks as long as threads are still running when someone types ^X.
This is mantis 204.

PPPPS we may need to take a 2nd look at the justification for 
running the EPICS exit handlers when IOC shell exits. Certainly
we know that proper orderly shutdown is a good idea, but it
would appear that everything below CAC is clearly not written
to deal with this possibility at the moment. Once you pull
on this thread, then we will unravel the entire sweater, and
every thread will need a shutdown procedure. Nevertheless,
we will clearly be at the mercy of the OS specific process 
shutdown procdures if we don't do it properly ourselves. 

Jeff

>	dbIoc.dll!db_post_events(void * pRecord=0x009a5cf0, void *
pField=0x009a5dc0, unsigned int caEventMask=1)  Line 796 + 0x6	C
 	dbIoc.dll!recGblResetAlarms(void * precord=0x009a5cf0)  Line 235
C
 	recIoc.dll!monitor(aiRecord * pai=0x009a5cf0)  Line 369 + 0xc	C
 	recIoc.dll!process()  Line 177 + 0x9	C
 	dbIoc.dll!dbProcess(dbCommon * precord=0x009a5cf0)  Line 630 + 0xc
C
 	dbIoc.dll!dbScanPassive(dbCommon * pfrom=0x00a2a160, dbCommon *
pto=0x009a5cf0)  Line 477 + 0x9	C
 	dbIoc.dll!dbScanFwdLink(link * plink=0x00a2a274)  Line 507	C
 	dbIoc.dll!recGblFwdLink(void * precord=0x00a2a160)  Line 257	C
 	recIoc.dll!process()  Line 131 + 0xc	C
 	dbIoc.dll!dbProcess(dbCommon * precord=0x00a2a160)  Line 630 + 0xc
C
 	dbIoc.dll!scanList(scan_list * psl=0x00a314f0)  Line 590	C
 	dbIoc.dll!periodicTask(void * arg=0x00a314f0)  Line 489 + 0x12	C
 	Com.dll!epicsWin32ThreadEntry(void * lpParameter=0x00a31ce0)  Line
497 + 0xf	C
 	msvcr71d.dll!_threadstartex(void * ptd=0x00a32358)  Line 241 + 0xd
C
 	KERNEL32.DLL!lstrcmpiW()  + 0xb7	

> -----Original Message-----
> From: Jeff Hill [mailto:[email protected]] 
> Sent: Wednesday, November 16, 2005 9:46 AM
> To: 'Jeff Hill'; 'Marty Kraimer'
> Cc: [email protected]
> Subject: RE: dbEvent.c line 665
> 
> 
> 
> Still unable to reproduce on our Linux
> 
> O when setting USE_POSIX_THREAD_PRIORITY_SCHEDULING = YES
> O when typeing exit and a ca client is attached
> 
> Do I need to configure in an internal DB CA link within this IOC? 
> I am using the example application. Do I need to run an IOC in  
> mrkSoftTest to reporduce this?
> 
> PS: I see these two messages now whenever a client attaches 
> to a Linux IOC.
> 
> epicsThreadSetPriority called by non epics thread 
> epicsThreadSetPriority called by non epics thread
> 
> Jeff
> 
> > -----Original Message-----
> > From: Jeff Hill [mailto:[email protected]]
> > Sent: Wednesday, November 16, 2005 9:12 AM
> > To: 'Jeff Hill'; 'Marty Kraimer'
> > Cc: [email protected]
> > Subject: RE: dbEvent.c line 665
> > 
> > 
> > 
> > So far, unable to reproduce with an example app created by
> > makeBaseApp and a
> > 
> > debug build. When I type exit it exits w/o crashing.
> > 
> > Do I need to turn on priority based schedualing of libCom/osi
> > threads on posix?
> > 
> > ~/epicsR3.14/epics/appl/iocBoot/iocex$ cat /proc/version 
> Linux version 
> > 2.4.21-20.ELsmp
> > ([email protected]) (gcc version 3.2.3
> > 20030502 (Red Hat Linux 3.2.3-42)) #1 SMP Wed Aug 18 
> 20:46:40 EDT 2004
> > 
> > > -----Original Message-----
> > > From: Jeff Hill [mailto:[email protected]]
> > > Sent: Wednesday, November 16, 2005 8:26 AM
> > > To: 'Marty Kraimer'
> > > Cc: [email protected]
> > > Subject: RE: dbEvent.c line 665
> > > 
> > > 
> > > 
> > > > What to do?
> > > 
> > > I will pursue it in the debugger.
> > > 
> > > Jeff
> > > 
> > > > -----Original Message-----
> > > > From: Marty Kraimer [mailto:[email protected]]
> > > > Sent: Wednesday, November 16, 2005 8:08 AM
> > > > To: Jeff Hill
> > > > Cc: [email protected]
> > > > Subject: Re: dbEvent.c line 665
> > > > 
> > > > 
> > > > Jeff,
> > > > 
> > > > I think we have a big problem.
> > > > I think this is happening because dbCaTask has called
> > > > ca_context_destroy.
> > > > 
> > > > At the end of dbCaTask I have
> > > > 
> > > > printf("before ca_context_destroy\n");
> > > >     ca_context_destroy();
> > > > printf("after ca_context_destroy\n");
> > > >     epicsEventSignal(exitEvent);
> > > > }
> > > > 
> > > > and I get
> > > > 
> > > > epics> exit
> > > > before ca_context_destroy
> > > > after ca_context_destroy
> > > > 
> > > > 
> > > > 
> > > > A call to "assert
> > > > ((epicsMutexLock(((ev_que)->writelock))==epicsMutexLockOK))"
> > > > failed in
> > > > ../dbEvent.c line 665.
> > > > EPICS Release EPICS R3.14.7 $$Name: R3-14-2_branch $$ $$Date: 
> > > > 2004/12/06 
> > > > 22:31:52 $$.
> > > > Current time Wed Nov 16 2005 09:00:02.474391200.
> > > > Please E-mail this message to the author or to 
> > > > [email protected] Calling epicsThreadSuspendSelf()
> > > > 
> > > > If I take out the call to ca_context_destroy(); then exit
> > terminates
> > > > without error.
> > > > 
> > > > What to do?
> > > > 
> > > > Marty
> > > > 
> > > > Marty Kraimer wrote:
> > > > 
> > > > > Andrew Johnson wrote:
> > > > >
> > > > >> I just did a CVS update and rebuild on linux-x86, created
> > > > an example
> > > > >> app and ran it, which ran fine.  However on pressing
> > > > Ctrl-D to exit,
> > > > >> I got this:
> > > > >>
> > > > >> epics>
> > > > >>
> > > > >>
> > > > >> A call to "assert
> > > > >> 
> > > > 
> > ((epicsMutexLock(((ev_que)->writelock))==epicsMutexLockOK))" failed
> > > > >> in ../dbEvent.c line 665.
> > > > >> EPICS Release EPICS R3.14.7 $$Name: R3-14-2_branch $$ $$Date:
> > > > >> 2004/12/06 22:31:52 $$. Current time Tue Nov 15 2005 
> > > > >> 16:20:33.533218000. Please E-mail this message to the 
> > author or
> > > > >> to
> > > > [email protected]
> > > > >> Calling epicsThreadSuspendSelf() 
> > > > >> filename="../../../src/libCom/taskwd/taskwd.c" line
> > > number=174 task
> > > > >> 0x80985b8 suspendedapsajnt%
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I am looking into it.
> > > > >
> > > > > Marty
> > > > 
> > > > 
> > > 
> > 
> 


Replies:
epicsAtExit Marty Kraimer
References:
RE: dbEvent.c line 665 Jeff Hill

Navigate by Date:
Prev: Re: dbEvent.c line 665 Andrew Johnson
Next: RE: R3.14.8 Status/logClient patch Jeff Hill
Index: 2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: dbEvent.c line 665 Marty Kraimer
Next: epicsAtExit Marty Kraimer
Index: 2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·