EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: FW: error log facility
From: "Jeff Hill" <[email protected]>
To: <[email protected]>
Date: Wed, 16 Nov 2005 14:33:01 -0700

-----Original Message-----
From: Jeff Hill [mailto:[email protected]] 
Sent: Wednesday, April 13, 2005 4:46 PM
To: 'Marty Kraimer'
Cc: 'Liyu, Andrei'
Subject: RE: error log facility



Hi Marty,

I did fix this issue (by figuring out a way to increment the in use count
for the shareable library and thereby preventing the labView VI from yanking
the code out while threads using it are still running).

However, I notice now that the labView system can only go through about 200
edit/run cycles before they will run out of thread private variable
allocations (on windows). Further investigation revealed that this is
because epicsExit() was not being called by the test (and CAC now calls
epicsAtExit instead of atexit). 

Of course the solution is simply for the VI destructor to call
epicsAtExit() but that's a bit weird because it might end up getting called
multiple times during a single program execution. Also, multiple
applications must remember that this is required. 

When at SLAC perhaps we should discuss again the rational behind
epicsAtExit. I notice that CA is the only code that uses epicsAtExit. Was
this done only to solve libCom shutdown problems on windows? If so, perhaps
we can just go back to using atexit which might work fine now to release the
thread private variable when ca.dll unloads (com.dll is locked in memory
until all threads using it exit, but that is not the case for ca.dll). On
windows, the atexit handlers run when the dll unloads. 

Was there also a Linux/Solaris reason for epicsAtExit?

Jeff

> -----Original Message-----
> From: Marty Kraimer [mailto:[email protected]]
> Sent: Tuesday, March 29, 2005 6:55 AM
> To: Jeff Hill
> Subject: Re: error log facility
> 
> Did you do anything about errlog?
> 
> Sorry if I was not copperative last week but was trying hard to get
> ready for my three day vacation.
> I could not believe my Email this morning. When will I get back
> to V4???
> 
> Marty
> 
> Jeff Hill wrote:
> 
> >>I think we should do the minimum necessary to 3.14 but
> >>I have no idea what that minimum is.
> >>
> >>
> >
> >I agree. I fear that I must look again for a simple solution 
> >involving starting the log thread on demand when the first message is 
> >sent so that an errlogFlush() of an un-initialized
> log
> >system will not start the log thread. Unfortunately, given my funding 
> >situation I am the last person that should be
> launching
> >into projects fixing someone else's code, but I don't see any other 
> >way to get Liyu Andrei's (SNS diagnostics) problem,
> which
> >has been festering now since April, fixed.
> >
> >I have been debugging some more. Here is the sequence of
> events,
> >and why it seems to be a good idea to prevent the error log thread 
> >from starting up in errlogFlush().
> >
> >o err log facility installs its atexit handler when it's
> thread
> >runs
> >o in ca's shutdown I call errlogFlush(), and this starts the errlog 
> >thread (which probably does not immediately get a
> chance
> >to run because it runs at a lower priority on windows)
> >o the com.dll is unloaded by labview when it switches from execute to 
> >edit mode o windows appears to take a lock preventing the errlog 
> >thread that is being initialized from getting to its entry point
> >o exit handlers run in correct order, but the errlog thread's
> >exit handler has not been installed yet and does not run
> >o win32 specific epics thread facility gives up waiting for
> the
> >last epicsThread to exit
> >o windows releases its lock and the error log thread runs
> >o purify seems to indicate that the errlog thread crashes
> because
> >it is calling epicsThread functions after the epicsThread
> system
> >shuts down, but the information here is poor because debugging 
> >information tends to be imprecise after main exits.
> >
> >So I suspect that fixing errlogFlush() so that it will not
> start
> >the error log thread if it is not already running will greatly reduce 
> >shutdown problems, but I don't know this for certain.
> >
> >Alternatively, the atexit() handler for the err log system
> could
> >be installed when the err log system is initialized, and the errlog 
> >thread could see that it needs to exit immediacy when
> its
> >entry point is run, but since that might be based on tests against 
> >data structures that have already been unloaded, then
> I
> >am not holding my breath that that approach will work.
> >
> >Actually, the best solution will probably be to install the atexit 
> >handler in the error log facility init, and to also not start the 
> >error log thread until the first message is sent.
> >
> >Jeff
> >
> >
> >
> >>-----Original Message-----
> >>From: Marty Kraimer [mailto:[email protected]]
> >>Sent: Wednesday, March 23, 2005 7:01 AM
> >>To: Jeff Hill
> >>Subject: Re: error log facility
> >>
> >>
> >>
> >>Jeff Hill wrote:
> >>
> >>
> >>
> >>>Hello Marty,
> >>>
> >>>:-(
> >>>
> >>>There appear to be a number of issues in the error log code. There 
> >>>are unsynchronized accesses to variables shared by
> >>>
> >>>
> >>multiple
> >>
> >>
> >>>threads, and other significant problems. For example, msgbufGetFree 
> >>>(see below) seems to have a return path that
> >>>
> >>>
> >>leaks
> >>
> >>
> >>>the lock.
> >>>
> >>>
> >>>
> >>Not true. I will not claim it is nice but I probally could
> not
> >>see a
> >>better way. Note the msgbufGetFree does not unlock if it succeeds. 
> >>It it does succeed the caller must call msgbufSetSize after putting
> >>its
> >>message into the message queue. msgbufSetSize unlocks.
> >>
> >>
> >>
> >>>See attached code. Furthermore, pvtData.atExit and
> >>>others seem to be unsynchronized yet there are multiple
> >>>
> >>>
> >>threads
> >>
> >>
> >>>reading and writing. Of course, even individual atomic flags
> >>>
> >>>
> >>must
> >>
> >>
> >>>be protected with locks if we are running on SMP systems.
> >>>
> >>>
> >>>
> >>>
> >>I will say again that if this is true then every part of iocCore 
> >>must be looked at carefully.
> >>
> >>
> >>
> >>>I am afraid to meddle in this code w/o taking a chance on
> >>>
> >>>
> >>being
> >>
> >>
> >>>responsible for many things breaking, or feeling compelled
> to
> >>>make many changes which probably wouldn't be appropriate 
> >>>considering current ownership and the need for reducing
> >>>
> >>>
> >>entropy
> >>
> >>
> >>>with R3.14. So I don't know how to proceed.
> >>>
> >>>
> >>>
> >>>
> >>I think we should do the minmimum necessary to 3.14 but I
> have
> >>no idea
> >>what that minimum is.
> >>
> >>
> >>
> >>>One solution to the errlogFlush() starts up an inactive
> error
> >>>
> >>>
> >>log
> >>
> >>
> >>>facility would be to have a two stage init with the 1st
> stage
> >>>creating the locks, and the 2nd stage starting the thread.
> The
> >>>2nd stage would be initiated only the first time that a new message 
> >>>queue entry is created thereby avoiding creation of
> >>>
> >>>
> >>the
> >>
> >>
> >>>auxiliary thread during shutdown which is a cause of
> problems.
> >>>This is however a big change which shouldn't be embarked on 
> >>>lightly.
> >>>
> >>>Open to suggestions.
> >>>
> >>>PS: As my memory serves I *was* the original author of a
> >>>
> >>>
> >>message
> >>
> >>
> >>>routing facility (message routing to the log server), but
> you
> >>>took over responsibility many moons ago, and extensively
> >>>
> >>>
> >>rewrote
> >>
> >>
> >>>the code. I can remember being somewhat put back at the time
> >>>
> >>>
> >>by
> >>
> >>
> >>>your aggressively assuming control w/o asking first, but
> that
> >>>
> >>>
> >>was
> >>
> >>
> >>>soon enough ok then - and now. Nevertheless, my perception
> is
> >>>that it would be quite ironic to assign any degree of
> >>>
> >>>
> >>ownership
> >>
> >>
> >>>of the current situation to me.
> >>>
> >>>
> >>>
> >>I dont remember the exact history but I remember that major issues 
> >>arose shortly before we had to have a new release for APS and there
> >>was not
> >>much time to reflect. I had to get a release ASAP.
> >>I also think I remember that you started the message routing
> >>without
> >>telling anyone first. I did the same and then we discovered
> >>that we were
> >>solving the same problem. It was long enough ago the I may be
> >>remembering incorrectly.
> >>
> >>All the atExit code was added because of problem with terminating
> >>multi-threaded code that uses libCom, i.e. either IOCs or CA
> >>3.14
> >>clients. With Eric's help I did the best I could. errlog is a
> >>big part
> >>of this problem. I don't have any magic answers.
> >>
> >>Marty
> >>
> >>
> >>
> >>>Jeff
> >>>
> >>>LOCAL char *msgbufGetFree(int noConsoleMessage)
> >>>{
> >>>   msgNode	*pnextSend;
> >>>
> >>>   epicsMutexMustLock(pvtData.msgQueueLock);
> >>>   if((ellCount(&pvtData.msgQueue) == 0) &&
> >>>pvtData.missedMessages) {
> >>>	int	nchar;
> >>>
> >>>	pnextSend = msgbufGetNode();
> >>>	nchar = sprintf(pnextSend->message,"errlog = %d messages were 
> >>>discarded\n",
> >>>		pvtData.missedMessages);
> >>>	pnextSend->length = nchar + 1;
> >>>	pvtData.missedMessages = 0;
> >>>	ellAdd(&pvtData.msgQueue,&pnextSend->node);
> >>>   }
> >>>   pvtData.pnextSend = pnextSend = msgbufGetNode();
> >>>   if(pnextSend) {
> >>>       pnextSend->noConsoleMessage = noConsoleMessage;
> >>>       pnextSend->length = 0;
> >>>       return(pnextSend->message);
> >>>   }
> >>>   ++pvtData.missedMessages;
> >>>   epicsMutexUnlock(pvtData.msgQueueLock);
> >>>   return(0);
> >>>}
> >>>
> >>>__________________________________________________________
> >>>Jeffrey O. Hill               Mail         [email protected]
> >>>LANL MS H820                  Voice        505 665 1831
> >>>Los Alamos NM 87545 USA       Fax          505 665 5107
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >




Navigate by Date:
Prev: FW: sysAtReboot Jeff Hill
Next: converging Jeff Hill
Index: 2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: FW: error log facility Jeff Hill
Next: FW: Possible bug in taskwd Jeff Hill
Index: 2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·