Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017 Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
<== Date ==> <== Thread ==>

Subject: Re: dbCa::addAction pausing, 10000 channels to clear
From: Andrew Johnson <anj@aps.anl.gov>
To: <tech-talk@aps.anl.gov>
Date: Mon, 24 Apr 2017 18:00:46 -0500
Hi Matt,

On 04/24/2017 04:36 PM, Pearson, Matthew R. wrote:
> I’m seeing one of my IOCs seg fault with this message when I do an ‘exit’:
> 
> dbCa::addAction pausing, 10000 channels to clear
> Segmentation fault (core dumped)

<snip> - thanks for all the detail.

> In addAction the printLinks function tries to access a null pointer (pca->plink).
> 
> If I comment out the printLinks function in addAction, it doesn’t seg
> fault (just takes a few seconds to shutdown).

It seems to me that the bug is the use of printLinks() there, since that
calls errlogPrintf() which queues the link's PV name pointer onto the
errlog queue, but if the errlog thread doesn't get scheduled soon, when
it does run it may attempt to print from a pointer which no longer
exists, hence your core dump.

I don't see the need for the printLinks() output at that point, so I
think just removing it from addAction() is probably the best fix, which
you confirmed above works for you.

> Alternatively, if I increase the removesOutstandingWarning limit,
> it’s also fine. I don’t think that parameter is configurable via the
> IOC shell though.

No, that's currently a constant; we could make it configurable, but
since the "pausing" message is an explanation to the user why the IOC's
shutdown is taking a while I think it's worth keeping as is. Changing
the number would just make it happen less often and wouldn't fix the bug.

> This IOC does have quite a lot of records and makes heavy use of CA/CP links:

.. which is part of the reason why this IOC sees the problem but others
don't.

> On the IOC exit I also tend to see several messages like:
> 
> sseq:putCallbackCB: Bad link at index 0
> 
> which I suspect is ok given that we’re shutting down in the middle
> of some put_callback operations.

Agreed.

> I could split this IOC into separate processes if necessary. 
> Our base version is 3.14.12.4.

I don't think that should be necessary. I'll commit the above change to
the 3.14 branch of Base and add a patch to the Known Problems page for
Base-3.14.12 and 3.15.5.

- Andrew

-- 
Arguing for surveillance because you have nothing to hide is no
different than making the claim, "I don't care about freedom of
speech because I have nothing to say." -- Edward Snowdon

Replies:
Re: dbCa::addAction pausing, 10000 channels to clear Pearson, Matthew R.
References:
dbCa::addAction pausing, 10000 channels to clear Pearson, Matthew R.

Navigate by Date:
Prev: dbCa::addAction pausing, 10000 channels to clear Pearson, Matthew R.
Next: Re: dbCa::addAction pausing, 10000 channels to clear Pearson, Matthew R.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
Navigate by Thread:
Prev: dbCa::addAction pausing, 10000 channels to clear Pearson, Matthew R.
Next: Re: dbCa::addAction pausing, 10000 channels to clear Pearson, Matthew R.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
ANJ, 24 Apr 2017 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·