Experimental Physics and
Industrial Control System

"Jeff Hill" <[email protected]> · Thu, 3 Nov 2011 15:08:14 -0600

Hi Matt,

In a single threaded (non-preemptive callback) client no CA activity can
occur (i.e. processing network packets and or calling callbacks) unless your
single thread is executing in the CA library. The ca_pend_event call is
purely a mechanism for spending some time, the amount of time being
specified by its sole argument, in the library. The ca_pend_event call is
not used in any way for synchronizing ca_array_get (or channel
connectivity).

In contrast, ca_pend_io will synchronize completion of all outstanding
ca_array_get requests issued since the beginning of the ca context, or the
previous call to ca_pend_io, whichever is later. In practice, what this
implies is that ca_pend_io is a one-time synchronization. If it returns
ECA_TIMEOUT this implies that any incomplete CA get requests are effectively
canceled. The library will _not_ allow your variable, specified to ca get,
to be modified unexpectedly far in the future after ca_pend_io returns if
the network is slow. That could be result in a disaster, for example, if you
are using a stack automatic variable.

> That is, how do I know how long to wait in ca_pend_event() and
> ca_pend_io()?    

It's actually quite difficult to write a program that is generic in a wide
range of networking environments if there are any short timeouts which are
hardcoded. In practice, network generic programs with complex state
transitions typically use CA callbacks.

> leaves the data at pvalue incorrect (all zeros).  Trying to access
> this data from python, I can get python to segfault for large enough
> data (say, 4.2M ints).  I'm not sure I fully understand why python is
> crashing -- the segfault happens well after ca_array_get() returns,
> but can happen in ca_pend_event() a short time later.  If I save but
> don't try to access the data in pvalue, or wait "long enough", it
> seems I never have a crash.

I suspect that python, or the ca python interface library, isn't happy if
its data is being modified by an outside thread (i.e. a ca client library
auxiliary thread) precisely at the same time that its being read. It's
probably a good idea not to touch the data until ca_pend_io synchronized, or
alternatively use a callback.

> Is there a way to tell if pend_event() or pend_io() have timed out

They return ECA_TIMEOUT

> or if there are events that are pending?

The ca_pend_io call returns ECA_TIMEOUT if it timed out before all of the
get requests completed, and otherwise ECA_NORMAL if they did complete. There
is also a mechanism for determining at any given instant if the ca_client
library has outstanding labor (network activity) which needs to be dealt
with. I can provide more details on that if you are interested.

If you need to know precisely which get requests failed out of many then
it's probably best to use a callback mechanism.

Jeff
______________________________________________________
Jeffrey O. Hill           Email        [email protected]
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

Message content: TSPA

With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead. -- RFC 1925

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Matt Newville
> Sent: Thursday, November 03, 2011 2:09 PM
> To: EPICS Tech Talk
> Subject: reading large data arrays over slow networks
> 
> Hi,
> 
> I'm seeing an issue with Channel Access getting "large data arrays"
> and hope someone can provide some insight.  First, this is not an
> issue with EPICS_CA_MAX_ARRAY_BYTES, which is set large enough.
> Rather the issue I'm seeing seems to raise the naive question:
> 
>     Once I do a ca_array_get(type, count, chid, pvalue), when can I
> successfully read the data in pvalue?
> 
> That is, how do I know how long to wait in ca_pend_event() and
> ca_pend_io()?    In most cases of a fast network and scalar values or
> small arrays, values on the order of
>    ca_pend_event(1.e-3);
>    ca_pend_io(1.0);
> 
> seem to work well.   But, if the array is "large" or the network
> "slow", I have a hard time predicting these values, and accessing the
> data may give incorrect values unless I've waited "long enough".
> Using either a slow network or pend_event/io times that is clearly
> "too short",  say,
>    ca_pend_event(1.e-5);
>    ca_pend_io(0.01);
> 
> leaves the data at pvalue incorrect (all zeros).  Trying to access
> this data from python, I can get python to segfault for large enough
> data (say, 4.2M ints).  I'm not sure I fully understand why python is
> crashing -- the segfault happens well after ca_array_get() returns,
> but can happen in ca_pend_event() a short time later.  If I save but
> don't try to access the data in pvalue, or wait "long enough", it
> seems I never have a crash.
> 
> Related questions are:  What is supposed to happen if ca_pend_event()
> and/or ca_pend_io() do time out?  and: Is there a way to tell if
> pend_event() or pend_io() have timed out or if there are events that
> are pending?
> 
> I was hoping to be able to do something like
>    ret_ev = ca_pend_event(1.e-3);
>    ret_io = ca_pend_io(1.);
> 
>    while (ret_io == ECA_TIMEOUT) {
>       ret_ev = ca_pend_event(1.e-3);
>       ret_io = ca_pend_io(1.);
>    }
> 
> Unfortunately, it seems that checking the return values are not so
> helpful, as pend_event() always(?) returns ECA_TIMEOUT and pend_io()
> seems to return ECA_TIMEOUT no more than one time before returning
> ECA_NORMAL.
> 
> For what it's worth, I'm using base 3.14-12.1.   Thanks in advance for
> any insight.
> 
> --Matt Newville <newville at cars.uchicago.edu> 630-252-0431

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System