EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Mutex error on recent Redhat systems
From: "Jeff Hill" <[email protected]>
To: "'Andrew Johnson'" <[email protected]>, "'Sergey Stepanov'" <[email protected]>
Cc: "'EPICS core-talk'" <[email protected]>
Date: Fri, 18 May 2007 11:53:59 -0600
Probably the fastest way to identify the cause would be to build the code
for debugging, set a break point in code that detects that
pthread_mutex_lock has failed, and then print out the stacks of all of the
threads once the breakpoint is hit.

If it's a shutdown issue someone is going to need to look at the big picture
- at what all of the threads are doing when this happens.

It might also help to print out the value of errno when pthread_mutex_lock
returns bad status.

Jeff

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Andrew Johnson
Sent: Friday, May 18, 2007 10:45 AM
To: Sergey Stepanov
Cc: EPICS core-talk
Subject: Re: Mutex error on recent Redhat systems

Hi Sergey,

Sergey Stepanov wrote:
> Hi Andrew,
> we wonder if you have seen this: we are often observing
> the following error when using caget and caput (the EPICS
> base programs) on Redhat EL5 and Fedora-6:

I have not seen this yet as I haven't had either of those OSs available 
before now -- I've been asking IT to upgrade my PC to Fedora-6 for some 
time, and it finally looks like they're able to do it so I will 
investigate this problem as soon as I can.

Has anyone else on core-talk seen this problem?

> gmca@bl1ws3:~ 43> caget 23i:GO:Om:Start.DESC
> 23i:GO:Om:Start.DESC           Gonio Omega
> epicsThreadOnceOsd epicsMutexLock failed.
> 
> 
> gmca@bl1ws3:~ 47> caput 23i:GO:Om:Start.DESC "Gonio Omega"
> Old : 23i:GO:Om:Start.DESC           Gonio Omega
> New : 23i:GO:Om:Start.DESC           Gonio Omega
> epicsThreadOnceOsd epicsMutexLock failed.
> 
> This is happening at least with EPICS BASE-3.14.9 and BASE-3.14.8.2.
> At the same time the error does not show up on Redhat EL3 and
> Redhat EL4 (with the same versions of EPICS BASE). The rate of
> the error is about 5-10%.
> 
> Does it ring the bell for you? We did not find any note on this
> problem in the EPICS Mantis Bug Reports list. Also we do not
> have an estimate how harmful it is. In case of the error the caput
> program still does what it is asked to do (i.e. sets the new value),
> but we are not sure how the other EPICS software would behave.
> 
> We also tried caget and caput from EPICS extensions and got the
> same problem.

The error message is from src/libCom/osi/os/posix/osdThread.c at the top 
of the routine epicsThreadOnceOsd(), and occurs if mutexLock(&onceLock) 
returns a non-zero status.  The mutexLock() routine is one of ours that 
calls pthread_mutex_lock() and handles retrying in the event that it 
returns EINTR (which violates the SUSv3).

The result of a bad status (other than EINTR) from pthread_mutex_lock() 
is for picsThreadOnceOsd() to call exit(1).  However since Sergey's 
caput and caget operations seem to be completing normally, it looks like 
this may be some kind of shutdown issue.  The onceLock mutex does get 
destroyed after myAtExit() has run epicsExitCallAtExits() and called 
pthread_cancel() on all threads other than itself and main, and this 
might explain the issue -- if a thread other than main invokes exit() 
and is used to perform the atExit() cleanups itself, the main thread 
might still try to run something that checks the onceLock before the 
process breathes its very last breath.

I'll add some more instrumentation and try to replicate this on Fedora-6 
but any assistance is welcome...

- Andrew
-- 
The right to be heard does not automatically include
the right to be taken seriously. -- Hubert H. Humphrey
_______________________________________________
Core-talk mailing list
[email protected]
http://www.aps.anl.gov/mailman/listinfo/core-talk

_______________________________________________
Core-talk mailing list
[email protected]
http://www.aps.anl.gov/mailman/listinfo/core-talk

Replies:
Re: Mutex error on recent Redhat systems Kay-Uwe Kasemir
References:
Re: Mutex error on recent Redhat systems Andrew Johnson

Navigate by Date:
Prev: Re: Mutex error on recent Redhat systems Andrew Johnson
Next: Re: Mutex error on recent Redhat systems Kay-Uwe Kasemir
Index: 2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Mutex error on recent Redhat systems Andrew Johnson
Next: Re: Mutex error on recent Redhat systems Kay-Uwe Kasemir
Index: 2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·