EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Database hanging
From: "Rees, NP \(Nick\)" <[email protected]>
To: "EPICS tech-talk" <[email protected]>
Date: Fri, 3 Nov 2006 17:30:39 -0000
We are having channel access problems occasionally on some R3.14.8.2
vxWorks IOC's. It seems that one of the database semaphores isn't being
released for some reason and this is screwing everything up. No task is
suspended, and there are no inverted priorities indicating a deadlock
(but this is no guarantee). Has anyone else seen this?

The details follow.

The simplest symptom is caget fails as follows:

[npr78@i06-ws002 ~]$ caget BL06I-AL-SLITS-01:YA
Read operation timed out: some PV data was not read.
BL06I-AL-SLITS-01:YA           0
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "op=0, channel=BL06I-AL-SLITS-01:YA, type=DBR_TIME_DOUBLE,
count=1, ctx="BL06I-MO-IOC-01.diamond.ac.uk:5064""
    Source File: ../getCopy.cpp line 82
    Current Time: Fri Nov 03 2006 16:30:00.426411000
..................................................................

If I try a dbgf on the IOC to try and get the same parameter it hangs
and the stack trace emitted after Ctrl-C is as follows:

BL06I-MO-IOC-01 -> dbgf "BL06I-AL-SLITS-01:YA"
231f7c vxTaskEntry    +68 : shell ()
1f81e0 shell          +190: 1f820c ()
1f840c shell          +3bc: execute ()
1f8590 execute        +d8 : yyparse ()
2122d0 yyparse        +71c: 210668 ()
2107ec yystart        +96c: dbgf ()
1e752e18 dbgf           +15c: dbGetField ()
1e73f3fc dbGetField     +68 : dbScanLock ()
1e73c70c dbScanLock     +1b4: epicsMutexLock ()
1e8089c0 epicsMutexLock +24 : semTake ()
2283ac semTake        +13c: semMTake ()
tShell restarted.

So the shell is hanging because it can't get a lock in dbScanLock.
Address 0x1e73c70c is somewhere in the middle of dbScanLock - the next
routine is dbScanUnlock and the addresses of each routine is:

BL06I-MO-IOC-01 -> lkup "dbScanLock"
dbScanLock                0x1e73c558 text     (BL06I-MO-IOC-01.munch)
BL06I-MO-IOC-01 -> lkup "dbScanUnlock"
dbScanUnlock              0x1e73c8b8 text     (BL06I-MO-IOC-01.munch)

In the middle of dbScanLock there are various statements of the form:
   epicsMutexMustLock(plockSet->lock);
   epicsMutexMustLock(lockSetModifyLock);

... and so the problem is presumably in one of these semaphores.

Has anyone seem something similar? Does anyone have suggestions of what
I should do next time for diagnostics?

Cheers

Nick Rees
Principal Software Engineer           Phone: +44 (0)1235-778430
Diamond Light Source                  Fax:   +44 (0)1235-446713


Replies:
Re: Database hanging Andrew Johnson

Navigate by Date:
Prev: AAI/AAO database information David Dudley
Next: Re: HP8116A signal generator Till Straumann
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: AAI/AAO database information Andrew Johnson
Next: Re: Database hanging Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·