g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
<== Date ==> <== Thread ==>

Subject: RE: CAC problem between RTEMS and vxWorks
From: "Hill, Jeff" <johill@lanl.gov>
To: Benjamin Franksen <benjamin.franksen@helmholtz-berlin.de>, EPICS tech-talk <tech-talk@aps.anl.gov>
Date: Wed, 26 Sep 2012 01:30:33 +0000
> The message is a bit cryptic:
> 
> """
> Unable to connect because "Connection timed out"
> """

This means that the TCP "connect" function in the socket library returned
an error, and the "Connection timed out" part is the string associated with
the errno for this particular failure of the "connect" function. Typically,
the "connect" function will block for a long time before failing if
the remote node is unavailable. If the remote node is out of resources then
perhaps we would get a more immediate failure.

> What I find a bit strange about this is that obviously the (server-) IOC
> *did* respond to the name resolution request (else the client would not
> know to which server to (try to) connect), but not to the subsequent connection
> request. Right?

We could guess that there isnt sufficient resources for a new TCP
circuit on this vxWorks CA server.

Here are some additional guesses:

How much CPU is the sequencer program using in such pvAssign followed 
immediately by pvPut situations? 

Is there a possibility of a leak; for example creating the channel 
without destroying it?

Jeff

> -----Original Message-----
> From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-bounces@aps.anl.gov]
> On Behalf Of Benjamin Franksen
> Sent: Tuesday, September 25, 2012 3:10 PM
> To: EPICS tech-talk
> Subject: Re: CAC problem between RTEMS and vxWorks
> 
> Hi Guys
> 
> I think we are closing in. I should have looked at the program Wes sent to
> me,
> which I did not at first. I did now and it is obvious (to me) now that it
> cannot work as expected. Specifically, doing
> 
>         pvAssign(vidOutputStat, pvName);
>         ...
>         pvPut(vidOutputSel);
> 
> in the same action block of course there is a good chance that the pvPut
> fails, simply because the channel is *not yet* connected. As I explained
> in a
> previous message, pvAssign is asynchronous, it causes a connection
> *request*
> to be sent, but it does *not* wait for completion of this request (i.e.
> callback from connection handler). Before you issue further requests, you
> should make sure the PV is actually connected using
> 
> http://www-
> csr.bessy.de/control/SoftDist/sequencer/Reference.html#pvconnected
> 
> inside a when() clause.
> 
> So, this should clear up the
> 
> > pvPut Status:     -2
> 
> you are seeing.
> 
> The asynchronous nature of pvAssign should be documented. I am taking a
> note
> to fix that. I am also thinking about adding SYNC support for pvAssign.
> 
>  * * *
> 
> Now, the messages from CA client library you see some time after
> rebooting:
> 
> CAC: Unable to connect because "Connection timed out"
> CA.Client.Exception...............................................
>     Warning: "Virtual circuit disconnect"
>     Context: "iocfel8.acc.jlab.org:5064"
>     Source File: ../cac.cpp line 1145
>     Current Time: Wed Sep 19 2012 11:52:27.183784942
> ..................................................................
> 
> Whether this is related to the attempt to connect to another channel as
> indicated above I cannot tell for sure.
> 
> Jeff:
> 
> The message is a bit cryptic:
> 
> """
> Unable to connect because "Connection timed out"
> """
> 
> seems to mean that the client tried to connect to the given IOC [Wes: is
> this
> the same one that hosts the PV you call pvAssign for?] and it did not
> respond.
> 
> What I find a bit strange about this is that obviously the (server-) IOC
> *did* respond to the name resolution request (else the client would not
> know
> to which server to (try to) connect), but not to the subsequent connection
> request. Right?
> 
> Am Dienstag, 25. September 2012, 21:02:38 schrieb Wesley Moore:
> > > From: "Jeff Hill" <johill@lanl.gov>
> > > > Dynamic re-assign means in CA terms that the sequencer will issue a
> > > > ca_clear_channel (for some connected channel), immediately followed
> > > > by a
> > > > ca_create_channel (to some other PV).
> > >
> > > Be careful about race conditions where a thread continues to use a
> > > chid
> > > after the channel that the chid references has been destroyed.
> 
> Yes, I had that in the past. I /think/ I have eliminated all these
> problems
> with pvAssign, but taking another a look doesn't hurt.
> 
> > > > It should be possible to devise a simple test client that issues a
> > > > long
> > > > sequence of such create/clear calls to check whether this is a CA
> > > > problem.
> > > > Unfortunately my automatic test setup does not yet support RTEMS,
> > > > so this
> > > > will take some time for me to do.
> > >
> > > I had a look at the CA client side regression tests and I don't see a
> > > test
> > > of this nature. Presumably such a test would create a few thousand
> > > channels
> > > and then begin destroying them while at least some of them are in the
> > > process of connecting. This would be perhaps an unusual test in that
> > > we aren't testing
> > > any metric other than that the test program can survive a somewhat
> > > unusual
> > > sequence of events, and because successfully testing of anything at
> > > all might
> > > be highly dependent on the various ratios of server/client cpu speeds
> > > and
> > > the speed of the network.
> > >
> > > Considering this particular situation further. All we know is that we
> > > have
> > > an RTEMS IOC with a sequencer CA client that is not reconnecting
> > > reliably
> > > when the RTEMS IOC undergoes a rapid (close together in time) series
> > > of
> > > reboots. One has to guess that the vxWorks IOC is running low on some
> > > resource perhaps its network buffers, file descriptors, ...
> 
> Hmm, this would most probably show up on the VxWorks IOC and in a much
> more
> drastic way, if I remember correctly, i.e. any further connection requests
> from anywhere should fail.
> 
> Wes: could you check that, i.e. (from a workstation) issue a 'caget' to a
> PV
> that is known to exist on the VxWorks IOC, right after you see the CA
> message
> (the one I quoted above)? If Jeff is right you should get a failure
> message
> saying that the PV could not be found or the request timed out or some
> such.
> In fact, it may be that you can't even ping the IOC any longer.
> Connections (I
> mean *any* kind, not only CA) that are already established, however, may
> continue to work.
> 
> > > Wesley:
> > > I think that we see evidence that the vxWorks IOC is taking some time
> > > to determine that the previous lifetimes of the RTEMS IOC have passed
> > > out
> > > of existence (since the RTEMS IOC didn't cleanup its CA circuits when
> > > it
> > > was shutdown). Have you seen the situation improve if you leave it
> > > for a
> > > sufficiently long interval? This would be long enough for the TCP
> > > watchdog
> > > timers to fire; that delay varies between different OS and different
> > > OS
> > > versions but is typically on the order, as I recall, of about 20
> > > minutes.
> 
> Yes, this is also worth a try.
> 
> > Jeff:
> > I've been working with Ben offline a bit.  Let me see if I can sum up
> the
> > findings (Ben, fill in as needed).  I believe that the issues caused by
> > repeated reboots were masking the true problems.  When the sequencer
> > assigns one PV per variable, even dynamically at runtime, there's no
> > connection/reconnection issues.  When a variable is re-assigned, CA to
> the
> > VxWorks server is intermittent (without rebooting).  Even re-assigning a
> > variable to the same PV is a problem.
> >
> > This is some debug output from the sequencer that is triggered every 5
> > seconds.  The variable is assigned and then a pvPut is issued.  In this
> > case I'm simply re-assigning the variable to the same output PV.
> > Sometimes the pvPut succeeds and sometimes it doesn't.
> >
> > # sequencer output
> > pvName:           ITVFL02op1_status
> > pvAssign Status:  0
> > pvPut Status:     0
> >
> > pvName:           ITVFL02op1_status
> > pvAssign Status:  0
> > pvPut Status:     -2
> >
> > pvName:           ITVFL02op1_status
> > pvAssign Status:  0
> > pvPut Status:     -2
> >
> > pvName:           ITVFL02op1_status
> > pvAssign Status:  0
> > pvPut Status:     0
> >
> > pvName:           ITVFL02op1_status
> > pvAssign Status:  0
> > pvPut Status:     0
> >
> > pvName:           ITVFL02op1_status
> > pvAssign Status:  0
> > pvPut Status:     -2
> >
> > from Ben:
> > "I am fairly certain that this is a CA problem and not one of the
> > sequencer.  The usage pattern (clear channel, then create new ones) is
> > typical for some client applications, but not for code running on an
> IOC;
> > which may be the reason it hasn't come up before."
> 
> No longer sure this is the case; maybe it's just an unfortunate
> combination of
> features...
> 
> Cheers
> --
> Ben Franksen
> ()  ascii ribbon campaign - against html e-mail
> /\  www.asciiribbon.org   - against proprietary attachments
> 
> ________________________________
> 
> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
> 
> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
> Forschungszentren e.V.
> 
> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv.
> Vorsitzende Dr. Beatrix Vierkorn-Rudolph
> Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking
> 
> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
> 
> Postadresse:
> Hahn-Meitner-Platz 1
> D-14109 Berlin
> 
> http://www.helmholtz-berlin.de



References:
Re: CAC problem between RTEMS and vxWorks Wesley Moore
Re: CAC problem between RTEMS and vxWorks Benjamin Franksen

Navigate by Date:
Prev: Re: CAC problem between RTEMS and vxWorks Benjamin Franksen
Next: RE: Motor record -- URIP with soft limits matthew.pearson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
Navigate by Thread:
Prev: Re: CAC problem between RTEMS and vxWorks Benjamin Franksen
Next: Re: CAC problem between RTEMS and vxWorks Benjamin Franksen
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·