Experimental Physics and
Industrial Control System

[email protected] (Ralph Lange) · Tue, 10 Jun 1997 15:42:58 +0200 (METDST)

Dear Gurus,

	since Friday we're observing an unpleasant behaviour on one of our
IOCs:
	Occasionally (approx. every 20 to 30 minutes at normal load) the
scanOnce task gets suspended due to an Access Fault. After a couple of
seconds, the dbCaLink task follows after writing "rngBufPut overflow in
scanOnce" to the log. After that there is no more ca based record
processing, i.e. no more ca links and operator access.

	The console output looks like this:

Access Fault
Program Counter: 0x006ec252
Status Register: 0x3004
Access Address : 0xb4007745
Special Status : 0x0505
Task: 0x4550e0 "scanOnce"
task: 0X526b70 taskwd
task 4550e0 scanOnce suspended
task: 0X506370 EV dbCaLink
S_errno_ENOENT rngBufPut overflow in scanOnce

	We tried to check what's happening ...

-> lkAddr 0x006ec252
0x006ec246 _dbScanLock               text    
0x006ec308 _dbScanUnlock             text    
0x006ec33c _dbLockGetLockId          text    

-> l dbScanLock
                        _dbScanLock:
6ec246  4e56 0000                LINK  .W    A6,#0
6ec24a  2f0b                     MOVE  .L    A3,-(A7)
6ec24c  2f0a                     MOVE  .L    A2,-(A7)
6ec24e  266e 0008                MOVEA .L    (0x8,A6),A3
6ec252  246b 00dc                MOVEA .L    (0xdc,A3),A2
6ec256  4a8a                     TST   .L    A2
6ec258  661a                     BNE         0x006ec274

	and after comparing to dbLock.c:

void dbScanLock(dbCommon *precord)
{
    lockRecord	*plockRecord;
    lockSet	*plockSet;
    STATUS	status;

    if(!(plockRecord= (lockRecord *)precord->lset)) {
	epicsPrintf("dbScanLock plockRecord is NULL record %s\n",
	    precord->name);
	exit(1);
    }

	we assume that the first executable line of dbScanLock (the
dereferencing of precord) crashes, i.e. dbScanLock is called with a
non-NULL junk precord (0xb4007669 in the above example - where 0xdc ist the
lset offset).
	Unfortunately the scanOnce task is not restarted by the taskwd (as
opposed to the periodic scan tasks), so rebooting the IOC is the only way
out.
	So - who the hell is calling scanOnce() with a junk record pointer
(this is the way we think dbScanLock gets called)?
	We inserted a patch in scanOnce to check the precord argument for
validity and start a tt() to see which task is the bad guy here. Another
advantage of that patch is that we won't process the record and therefore
extend the IOC's uptime.

Some more data:
- IOC: MVME 162-042 (25MHz 8MB), VxWorks 5.2, EPICS core R3.13.0.beta4
- CheckStack looks absolutely normal.
- casr says (at the end):
  There are currently 52420 bytes on the server's free list
  1 client(s), 134 channel(s), and 133 event(s) (monitors)
  The server's resource id conversion table:
  Bucket entries in use = 474 bytes in use = 23988
  Free list bytes in use = 2144
  Bucket entries/hash id - mean = 0.115723 std dev = 0.319892 max = 1
- dbcar states:
  ncalinks 1876 not connected 0 no_read_access 0 no_write_access 0
  nDisconnect 0 nNoWrite 0
- dbnr lists:
  0088 ai
  0066 ao
  0795 bi
  0234 bo
  0394 calc
  0002 fanout
  0114 longout
  0080 mbbiDirect
  0038 mbboDirect
  0038 stringout
  0002 sub
  Total Records: 1851
- Memory is uncritical:
   status   bytes    blocks   avg block  max block
   ------ --------- -------- ---------- ----------
  current
     free   2006276       13     154328   1957672

Anyone any hint anywhere?

Clueless,
Ralph
-- 
      __  Ralph Lange                         Email:       [email protected] 
     /\ \                                     WWW: http://www.bessy.de/~lange
    /  \ \  BESSY II                          
   / /\ \ \  Berliner Elektronenspeicherring- Snail:                 BESSY II
  / / /\ \ \  Gesellschaft fuer Synchrotron-               Rudower Chaussee 5
 / / /__\_\ \  strahlung m.b.H.                       D-12489 Berlin, Germany
/ / /________\                                Phone:         +49 30 6392-4862
\/___________/ Control System Group           Fax:                ...   -4859

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System