EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: CAC problem between RTEMS and vxWorks
From: Benjamin Franksen <[email protected]>
To: EPICS tech-talk <[email protected]>
Date: Tue, 25 Sep 2012 23:10:28 +0200
Hi Guys

I think we are closing in. I should have looked at the program Wes sent to me,
which I did not at first. I did now and it is obvious (to me) now that it
cannot work as expected. Specifically, doing

        pvAssign(vidOutputStat, pvName);
        ...
        pvPut(vidOutputSel);

in the same action block of course there is a good chance that the pvPut
fails, simply because the channel is *not yet* connected. As I explained in a
previous message, pvAssign is asynchronous, it causes a connection *request*
to be sent, but it does *not* wait for completion of this request (i.e.
callback from connection handler). Before you issue further requests, you
should make sure the PV is actually connected using

http://www-csr.bessy.de/control/SoftDist/sequencer/Reference.html#pvconnected

inside a when() clause.

So, this should clear up the

> pvPut Status:     -2

you are seeing.

The asynchronous nature of pvAssign should be documented. I am taking a note
to fix that. I am also thinking about adding SYNC support for pvAssign.

 * * *

Now, the messages from CA client library you see some time after rebooting:

CAC: Unable to connect because "Connection timed out"
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "iocfel8.acc.jlab.org:5064"
    Source File: ../cac.cpp line 1145
    Current Time: Wed Sep 19 2012 11:52:27.183784942
..................................................................

Whether this is related to the attempt to connect to another channel as
indicated above I cannot tell for sure.

Jeff:

The message is a bit cryptic:

"""
Unable to connect because "Connection timed out"
"""

seems to mean that the client tried to connect to the given IOC [Wes: is this
the same one that hosts the PV you call pvAssign for?] and it did not respond.

What I find a bit strange about this is that obviously the (server-) IOC
*did* respond to the name resolution request (else the client would not know
to which server to (try to) connect), but not to the subsequent connection
request. Right?

Am Dienstag, 25. September 2012, 21:02:38 schrieb Wesley Moore:
> > From: "Jeff Hill" <[email protected]>
> > > Dynamic re-assign means in CA terms that the sequencer will issue a
> > > ca_clear_channel (for some connected channel), immediately followed
> > > by a
> > > ca_create_channel (to some other PV).
> >
> > Be careful about race conditions where a thread continues to use a
> > chid
> > after the channel that the chid references has been destroyed.

Yes, I had that in the past. I /think/ I have eliminated all these problems
with pvAssign, but taking another a look doesn't hurt.

> > > It should be possible to devise a simple test client that issues a
> > > long
> > > sequence of such create/clear calls to check whether this is a CA
> > > problem.
> > > Unfortunately my automatic test setup does not yet support RTEMS,
> > > so this
> > > will take some time for me to do.
> >
> > I had a look at the CA client side regression tests and I don't see a
> > test
> > of this nature. Presumably such a test would create a few thousand
> > channels
> > and then begin destroying them while at least some of them are in the
> > process of connecting. This would be perhaps an unusual test in that
> > we aren't testing
> > any metric other than that the test program can survive a somewhat
> > unusual
> > sequence of events, and because successfully testing of anything at
> > all might
> > be highly dependent on the various ratios of server/client cpu speeds
> > and
> > the speed of the network.
> >
> > Considering this particular situation further. All we know is that we
> > have
> > an RTEMS IOC with a sequencer CA client that is not reconnecting
> > reliably
> > when the RTEMS IOC undergoes a rapid (close together in time) series
> > of
> > reboots. One has to guess that the vxWorks IOC is running low on some
> > resource perhaps its network buffers, file descriptors, ...

Hmm, this would most probably show up on the VxWorks IOC and in a much more
drastic way, if I remember correctly, i.e. any further connection requests
from anywhere should fail.

Wes: could you check that, i.e. (from a workstation) issue a 'caget' to a PV
that is known to exist on the VxWorks IOC, right after you see the CA message
(the one I quoted above)? If Jeff is right you should get a failure message
saying that the PV could not be found or the request timed out or some such.
In fact, it may be that you can't even ping the IOC any longer. Connections (I
mean *any* kind, not only CA) that are already established, however, may
continue to work.

> > Wesley:
> > I think that we see evidence that the vxWorks IOC is taking some time
> > to determine that the previous lifetimes of the RTEMS IOC have passed
> > out
> > of existence (since the RTEMS IOC didn't cleanup its CA circuits when
> > it
> > was shutdown). Have you seen the situation improve if you leave it
> > for a
> > sufficiently long interval? This would be long enough for the TCP
> > watchdog
> > timers to fire; that delay varies between different OS and different
> > OS
> > versions but is typically on the order, as I recall, of about 20
> > minutes.

Yes, this is also worth a try.

> Jeff:
> I've been working with Ben offline a bit.  Let me see if I can sum up the
> findings (Ben, fill in as needed).  I believe that the issues caused by
> repeated reboots were masking the true problems.  When the sequencer
> assigns one PV per variable, even dynamically at runtime, there's no
> connection/reconnection issues.  When a variable is re-assigned, CA to the
> VxWorks server is intermittent (without rebooting).  Even re-assigning a
> variable to the same PV is a problem.
>
> This is some debug output from the sequencer that is triggered every 5
> seconds.  The variable is assigned and then a pvPut is issued.  In this
> case I'm simply re-assigning the variable to the same output PV.
> Sometimes the pvPut succeeds and sometimes it doesn't.
>
> # sequencer output
> pvName:           ITVFL02op1_status
> pvAssign Status:  0
> pvPut Status:     0
>
> pvName:           ITVFL02op1_status
> pvAssign Status:  0
> pvPut Status:     -2
>
> pvName:           ITVFL02op1_status
> pvAssign Status:  0
> pvPut Status:     -2
>
> pvName:           ITVFL02op1_status
> pvAssign Status:  0
> pvPut Status:     0
>
> pvName:           ITVFL02op1_status
> pvAssign Status:  0
> pvPut Status:     0
>
> pvName:           ITVFL02op1_status
> pvAssign Status:  0
> pvPut Status:     -2
>
> from Ben:
> "I am fairly certain that this is a CA problem and not one of the
> sequencer.  The usage pattern (clear channel, then create new ones) is
> typical for some client applications, but not for code running on an IOC;
> which may be the reason it hasn't come up before."

No longer sure this is the case; maybe it's just an unfortunate combination of
features...

Cheers
--
Ben Franksen
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv. Vorsitzende Dr. Beatrix Vierkorn-Rudolph
Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de


Replies:
RE: CAC problem between RTEMS and vxWorks Hill, Jeff
Re: CAC problem between RTEMS and vxWorks Benjamin Franksen
References:
Re: CAC problem between RTEMS and vxWorks Wesley Moore

Navigate by Date:
Prev: Re: Motor record -- URIP with soft limits J. Lewis Muir
Next: RE: CAC problem between RTEMS and vxWorks Hill, Jeff
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: CAC problem between RTEMS and vxWorks Wesley Moore
Next: RE: CAC problem between RTEMS and vxWorks Hill, Jeff
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·