EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: CA subscription synchronisation shutdown problem
From: Benjamin Franksen <[email protected]>
To: <[email protected]>
Date: Mon, 13 May 2013 11:36:32 +0200
Hi Michael

I think this could be a bug in EPICS base that has been fixed in 3.14.12.2, 
see http://www.aps.anl.gov/epics/base/R3-14/12-docs/RELEASE_NOTES.html Section 
"Changes between 3.14.12.1 and 3.14.12.2" which links to this
https://bugs.launchpad.net/epics-base/+bug/878372 entry in the bug tracker.

Cheers
Ben

On Monday, May 13, 2013 06:21:07 [email protected] wrote:
> I'd like to resend the message below.  I would be grateful if someone coud
> please try to reproduce the bug using the attached test.
> 
> I'd also like to point out that this bug is not as trivial and contrived as
> may appear -- any client application which closes camonitor subscriptions
> is liable to the synchronisation error described here and may thus suffer a
> segmentation fault or any other misbehaviour as a result.
> > This e-mail is really a follow up to this thread from a year ago:
> > http://www.aps.anl.gov/epics/tech-talk/2012/msg00584.php .  (Alas, I
> > can't check this link because the APS web site seems to be poorly this
> > morning.)
> > 
> > Back then I was seeing signs that CA subscription callbacks were being
> > called after returning from ca_clear_subscription ... in this e-mail I
> > have what looks like a definitive demonstration!
> > 
> > In the attached test IOC I repeatedly create 500 subscriptions to 500
> > locally published PVs, pause a few hundred microseconds, and then
> > proceed to tear them all down again.  The context pointer I pass
> > (args.usr) just contains a validity flag which I reset after
> > ca_clear_subscription returns -- and which I test in the callback.
> > 
> > Below is a typical run:
> > 
> > $ ./test 10 500
> > dbLoadDatabase("dbd/TEST.dbd", NULL, NULL)
> > TEST_registerRecordDeviceDriver(pdbbase)
> > dbLoadRecords("db/TEST.db", NULL)
> > iocInit()
> > Starting iocInit
> > #######################################################################
> > #####
> > ## EPICS R3.14.11 $R3-14-11$ $2009/08/28 18:47:36$
> > ## EPICS Base built Nov  4 2011
> > #######################################################################
> > #####
> > iocRun: All initialization complete
> > All channels connected
> > Testing 10 cycles, interval 500 us
> > [......................................................................
> > .......................................................................
> > .......................................................................
> > .......................................................................
> > .......................................................................
> > .......................................................................
> > ...............................................................whoops!
> > ][
> > 
> > 
> > The two arguments to `test` are number of times to try and how long to
> > pause between create and clear (in microseconds, passed to usleep(3)).
> > [ and ] are printed at the start and end of a cycle (so [ is
> > immediately followed by a burst of ca_create_subscription() calls) and
> > each . represents a successful callback.  An unsuccessful (invalid)
> > callback is shown by 'whoops!' which is followed by an exit() call.
> > 
> > This test can be very delicate and difficult to reproduce, and may need
> > to be run many times with slightly different pause intervals before
> > being even partially repeatable -- the fault only appears to show when
> > there isn't time for all 500 PVs to complete their initial updates, but
> > there has to be enough time for them all to make the effort.
> > 
> > Another interesting detail follows from some locking I'm doing.  Here
> > is an extract of the relevant code (LOCK() is just
> > pthread_mutex_lock(3p) on a global mutex):
> > 
> > 1	static void on_update(struct event_handler_args)
> > 2	{
> > 3	    struct event *event = args.usr;
> > 4	    LOCK();
> > 5	    bool valid = event->valid;
> > 6	    UNLOCK();
> > 7	    if (valid) ...
> > 8	}
> > 
> > 	...
> > 
> > 9	    LOCK();		// This should trigger deadlock
> > 10	    ca_clear_subscription(event->event_id);
> > 11	    event->valid = false;
> > 12	    UNLOCK();
> > 
> > It seems to me that if ca_clear_subscription() is correctly doing what
> > we discussed a year ago, which is to say, if it is waiting for all
> > outstanding callbacks to complete before returning, then the LOCK() on
> > line 9 should trigger a deadlock when ca_clear_subscription() is called
> > with its associated callback still only on line 3 (or earlier).  But I
> > never see my test deadlock.
> > 
> > I'm seeing this problem occur on test code which is repeatedly creating
> > and destroying subscriptions, but I've previously reported this on CA
> > client shutdown, so it does look to me like there is a general
> > synchronisation problem here.  I believe I have a workaround, which is
> > to delay releasing the callback context to give time for outstanding
> > callbacks to complete, but this is a bit worrysome...

Attachment: signature.asc
Description: This is a digitally signed message part.


Replies:
RE: CA subscription synchronisation shutdown problem michael.abbott
References:
RE: CA subscription synchronisation shutdown problem michael.abbott

Navigate by Date:
Prev: Re: Slow updates from EPICS-OPC interface Carsten Winkler
Next: RE: CSS 3.1.x PVManager / JCA ClassCastException marcus . michalsky
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: CA subscription synchronisation shutdown problem michael.abbott
Next: RE: CA subscription synchronisation shutdown problem michael.abbott
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·