Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017 Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
<== Date ==> <== Thread ==>

Subject: dbCa::addAction pausing, 10000 channels to clear
From: "Pearson, Matthew R." <pearsonmr@ornl.gov>
To: "tech-talk@aps.anl.gov list" <tech-talk@aps.anl.gov>
Date: Mon, 24 Apr 2017 21:36:49 +0000
Hi,

I’m seeing one of my IOCs seg fault with this message when I do an ‘exit’:

dbCa::addAction pausing, 10000 channels to clear
Segmentation fault (core dumped)

Examining the core file I see:

(gdb) bt
#0  0x00002b8aa7916318 in addAction (pca=0xc6c8390, link_action=1) at ../dbCa.c:150
#1  0x00002b8aa7916785 in dbCaRemoveLink (plink=0xa312f80) at ../dbCa.c:246
#2  0x00002b8aa70b7d76 in doCloseLinks (pdbRecordType=0x25fc590, precord=0xa312bb0, user=0x0) at ../iocInit.c:600
#3  0x00002b8aa70b7749 in iterateRecords (func=0x2b8aa70b7ccc <doCloseLinks>, user=0x0) at ../iocInit.c:396
#4  0x00002b8aa70b7e51 in exitDatabase (dummy=0x0) at ../iocInit.c:623
#5  0x00002b8aa8222c11 in epicsExitCallAtExitsPvt (pep=0x256d2d0) at ../../../src/libCom/misc/epicsExit.c:80
#6  0x00002b8aa8222ce6 in epicsExitCallAtExits () at ../../../src/libCom/misc/epicsExit.c:97
#7  0x00002b8aa8222f01 in epicsExit (status=0) at ../../../src/libCom/misc/epicsExit.c:160
#8  0x00000000004063dd in main (argc=2, argv=0x7fff390894a8) at ../bl11a-SensTech1Main.cpp:21

(gdb) info frame
Stack level 0, frame at 0x7fff39089240:
 rip = 0x2b8aa7916318 in addAction (../dbCa.c:150); saved rip 0x2b8aa7916785
 called by frame at 0x7fff39089270
 source language c.
 Arglist at 0x7fff39089230, args: pca=0xc6c8390, link_action=1
 Locals at 0x7fff39089230, Previous frame's sp is 0x7fff39089240
 Saved registers:
  rbp at 0x7fff39089230, rip at 0x7fff39089238

(gdb) info args
pca = 0xc6c8390
link_action = 1

(gdb) info locals
callAdd = 1

(gdb) list
150                printLinks(pca);
151            }
152            while (removesOutstanding >= removesOutstandingWarning) {
153                epicsMutexUnlock(workListLock);
154                epicsThreadSleep(1.0);
155                epicsMutexMustLock(workListLock);
156            }
157        }
158        pca->link_action |= link_action;
159        if (callAdd)

(gdb) print pca
$10 = (caLink *) 0xc6c8390

(gdb) print pca->plink
$11 = (struct link *) 0x0

In the dbCaRemoveLink function I see where pca->plink is being cleared and where addAction is being called:

void dbCaRemoveLink(struct link *plink)
    235 {
    236     caLink *pca = (caLink *)plink->value.pv_link.pvt;
    237 
    238     if (!pca) return;
    239     epicsMutexMustLock(pca->lock);
    240     pca->plink = 0;
    241     plink->value.pv_link.pvt = 0;
    242     if (pca->putCallback)
    243         pca->plinkPutCallback = plink;
    244     /* Unlock before addAction or dbCaTask might free first */
    245     epicsMutexUnlock(pca->lock);
    246     addAction(pca, CA_CLEAR_CHANNEL);
    247 }


In addAction the printLinks function tries to access a null pointer (pca->plink).

If I comment out the printLinks function in addAction, it doesn’t seg fault (just takes a few seconds to shutdown). 

Alternatively, if I increase the removesOutstandingWarning limit, it’s also fine. I don’t think that parameter is configurable via the IOC shell though.

This IOC does have quite a lot of records and makes heavy use of CA/CP links:

dbnr
Records  Aliases  Record Type
 19640        1    ai
 18510        0    ao
  1133        0    bi
   320        0    bo
  1227        0    calc
  6687        0    calcout
  7338        0    dfanout
  4352        0    longin
 34824        0    longout
    11        0    mbbo
  3288        0    stringin
    38        0    stringout
     9        1    sub
     3        0    waveform
    10        0    asyn
  3546        0    sseq
  1088        0    scalcout

[mkp@bl11a-dassrv1 db]$ cat * | grep "record" | wc
 113033  400407 5652852
[mkp@bl11a-dassrv1 db]$ cat * | grep " CA" | wc
  26236   78712 1315560
[mkp@bl11a-dassrv1 db]$ cat * | grep " CP" | wc
   6793   23894  350894

But it’s not doing that much at any one time. I use a series of sseq records (with the CA links) and streamDevice to talk to 10 separate Asyn ports which talk to RS485 chains with about 100 power supplies on each chain (about 1000 power supplies in total). 

On the IOC exit I also tend to see several messages like:

sseq:putCallbackCB: Bad link at index 0

which I suspect is ok given that we’re shutting down in the middle of some put_callback operations.

I could split this IOC into separate processes if necessary. 
Our base version is 3.14.12.4.

Cheers,
Matt


Data Acquisition and Control Engineer
Spallation Neutron Source
Oak Ridge National Lab






Replies:
Re: dbCa::addAction pausing, 10000 channels to clear Andrew Johnson

Navigate by Date:
Prev: Re: EPICS Archiver Appliance does not transfer PVs to "Being archived" Shankar, Murali
Next: Re: dbCa::addAction pausing, 10000 channels to clear Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
Navigate by Thread:
Prev: CSS Service Panel gennaro.tortone@na.infn.it
Next: Re: dbCa::addAction pausing, 10000 channels to clear Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
ANJ, 24 Apr 2017 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·