EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES
From: Ken Evans <[email protected]>
To: [email protected]
Cc: [email protected]
Date: Fri, 11 Feb 2005 13:36:08 -0600 (CST)
The Gateway, or rather the underlying CAS code, has a problem when
EPICS_MAX_ARRAY_BYTES is set.  In short, doing this causes code that
processes UDP search requests from clients (such as MEDM) to be run
that is not run otherwise and uncovers a bug in that one of the server
UDP sockets blocks when it should be non-blocking.  This is known to
happen on Solaris and Linux, through 3.14.7.

I don't usually put bugs in tech-talk but this one proved rather
difficult to track down, and this may help someone else.

At the APS the main Gateways started crashing after operating system
patches were applied at the end of December.  The actual event that
caused the failure is that sometime in the fall,
EPICS_CA_MAX_ARRAY_BYTES=140000 was added to everyone's .login.  It,
however, did not affect the Gateways until some time later when they
were restarted after the patches.  The first efforts were to check for
machine problems, since the executable had not changed.  The problem
was caused by a "reverse" Gateway, one that had its server side on the
machine network.  The hanging in this one caused the others to crash,
though it did not crash itself.

The bug is that the CAS socket that receives UDP broadcasts blocks
waiting for input and can delay the server, so it does not process its
queues.  For the Gateway, this in turn delays the Gateway client and
also keeps it from processing.

The hanging Gateway does not crash, but owing to not processing right,
it apparently sends bad messages to other clients and Gateways.  The
other Gateways are not robust to this and crash.  This is a second
bug.  Both are characterized by "bad message id" messages in the
Gateway logs.

The problem is compounded by the new feature in CAC, where it
disconnects if you do not call ca_poll every 30 sec, even when there
is nothing wrong with the actual connection.  The thrashing caused by
the continual connects and disconnects contributes to the Gateway
interaction problem.

This "feature" also causes other problems and is mentioned here
because there are no apparent plans to change it.  It causes programs
that used to work to not work now.  (Probe was an example.  When it
was not monitoring, there was no reason to call ca_poll and this
worked fine.  It has been fixed now to call ca_poll every 100 ms, even
when it doesn't need information.)  There are many other programs that
used to work and need to be changed or are not amenable to being
changed.  I know of at least one developer's setting EPICS_CA_CONN_TMO
to a huge number to get around this problem.  This is bad.  [End of
commercial]

If you want more information, these bugs are described in more detail,
along with a fix for the hanging, in the EPICS bug tracking system,
accessed from the EPICS pages:

http://www.aps.anl.gov/epics/mantis/login_page.php

The bugs are #176 and #177.

Replies:
RE: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Jeff Hill

Navigate by Date:
Prev: Re: edm Terry Carlino
Next: RE: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: EPICS problems on cygwin solved! Mark Rivers
Next: RE: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·