EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: CAC problem between RTEMS and vxWorks
From: Wesley Moore <[email protected]>
To: Jeff Hill <[email protected]>, Benjamin Franksen <[email protected]>
Cc: EPICS tech-talk <[email protected]>
Date: Tue, 25 Sep 2012 15:02:38 -0400 (EDT)
----- Original Message -----
> From: "Jeff Hill" <[email protected]>
> To: "Benjamin Franksen" <[email protected]>, "EPICS tech-talk" <[email protected]>
> Sent: Tuesday, September 25, 2012 2:29:08 PM
> Subject: RE: CAC problem between RTEMS and vxWorks
> 
> > Jeff:
> > 
> > Dynamic re-assign means in CA terms that the sequencer will issue a
> > ca_clear_channel (for some connected channel), immediately followed
> > by a
> > ca_create_channel (to some other PV).
> 
> Be careful about race conditions where a thread continues to use a
> chid
> after the channel that the chid references has been destroyed.
> 
> > Hmmm, what happens if there is a
> > race between the ca_clear_channel call and the disconnect event
> > caused by the
> > IOC reboot? (Just think loud here.)
> > 
> 
> Reading Wesley's later post I think we can assume that it's the
> client (i.e.
> sequencer) side that is being rebooted.
> 
> > It should be possible to devise a simple test client that issues a
> > long
> > sequence of such create/clear calls to check whether this is a CA
> > problem.
> > Unfortunately my automatic test setup does not yet support RTEMS,
> > so this
> > will take some time for me to do.
> 
> I had a look at the CA client side regression tests and I don't see a
> test
> of this nature. Presumably such a test would create a few thousand
> channels
> and then begin destroying them while at least some of them are in the
> process of connecting. This would be perhaps an unusual test in that
> we aren't testing
> any metric other than that the test program can survive a somewhat
> unusual
> sequence of events, and because successfully testing of anything at
> all might
> be highly dependent on the various ratios of server/client cpu speeds
> and
> the speed of the network.
> 
> Considering this particular situation further. All we know is that we
> have
> an RTEMS IOC with a sequencer CA client that is not reconnecting
> reliably
> when the RTEMS IOC undergoes a rapid (close together in time) series
> of
> reboots. One has to guess that the vxWorks IOC is running low on some
> resource perhaps its network buffers, file descriptors, ...
> 
> Wesley:
> I think that we see evidence that the vxWorks IOC is taking some time
> to determine that the previous lifetimes of the RTEMS IOC have passed
> out
> of existence (since the RTEMS IOC didn't cleanup its CA circuits when
> it
> was shutdown). Have you seen the situation improve if you leave it
> for a
> sufficiently long interval? This would be long enough for the TCP
> watchdog
> timers to fire; that delay varies between different OS and different
> OS
> versions but is typically on the order, as I recall, of about 20
> minutes.
> 
> Jeff

Jeff:
I've been working with Ben offline a bit.  Let me see if I can sum up the findings (Ben, fill in as needed).  I believe that the issues caused by repeated reboots were masking the true problems.  When the sequencer assigns one PV per variable, even dynamically at runtime, there's no connection/reconnection issues.  When a variable is re-assigned, CA to the VxWorks server is intermittent (without rebooting).  Even re-assigning a variable to the same PV is a problem.  

This is some debug output from the sequencer that is triggered every 5 seconds.  The variable is assigned and then a pvPut is issued.  In this case I'm simply re-assigning the variable to the same output PV.  Sometimes the pvPut succeeds and sometimes it doesn't.

# sequencer output
pvName:           ITVFL02op1_status
pvAssign Status:  0
pvPut Status:     0

pvName:           ITVFL02op1_status
pvAssign Status:  0
pvPut Status:     -2

pvName:           ITVFL02op1_status
pvAssign Status:  0
pvPut Status:     -2

pvName:           ITVFL02op1_status
pvAssign Status:  0
pvPut Status:     0

pvName:           ITVFL02op1_status
pvAssign Status:  0
pvPut Status:     0

pvName:           ITVFL02op1_status
pvAssign Status:  0
pvPut Status:     -2

from Ben: 
"I am fairly certain that this is a CA problem and not one of the sequencer.  The usage pattern (clear channel, then create new ones) is typical for some client applications, but not for code running on an IOC; which may be the reason it hasn't come up before."

Wesley

Replies:
Re: CAC problem between RTEMS and vxWorks Benjamin Franksen
References:
RE: CAC problem between RTEMS and vxWorks Hill, Jeff

Navigate by Date:
Prev: Motor record -- URIP with soft limits Fong, Nia W.
Next: Re: Motor record -- URIP with soft limits J. Lewis Muir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: CAC problem between RTEMS and vxWorks Hill, Jeff
Next: Re: CAC problem between RTEMS and vxWorks Benjamin Franksen
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·