EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES
From: "Jeff Hill" <[email protected]>
To: "'Ken Evans'" <[email protected]>, <[email protected]>
Date: Fri, 11 Feb 2005 14:20:11 -0700
Ken's description of this bug is accurate, but it's probably
necessary to clarify that this bug (Mantis 176) is related to the
"portable" CA server, and does not impact the original CA server
which is still used in the R3.14 and R3.13 IOCs. I have not yet
determined if bug 177 is in the CA client library or the gateway
at this time. 

Jeff

> -----Original Message-----
> From: Ken Evans [mailto:[email protected]]
> Sent: Friday, February 11, 2005 12:36 PM
> To: [email protected]
> Cc: [email protected]
> Subject: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES
> 
> 
> The Gateway, or rather the underlying CAS code, has a problem
> when
> EPICS_MAX_ARRAY_BYTES is set.  In short, doing this causes code
> that
> processes UDP search requests from clients (such as MEDM) to be
> run
> that is not run otherwise and uncovers a bug in that one of the
> server
> UDP sockets blocks when it should be non-blocking.  This is
> known to
> happen on Solaris and Linux, through 3.14.7.
> 
> I don't usually put bugs in tech-talk but this one proved
> rather
> difficult to track down, and this may help someone else.
> 
> At the APS the main Gateways started crashing after operating
> system
> patches were applied at the end of December.  The actual event
> that
> caused the failure is that sometime in the fall,
> EPICS_CA_MAX_ARRAY_BYTES=140000 was added to everyone's .login.
> It,
> however, did not affect the Gateways until some time later when
> they
> were restarted after the patches.  The first efforts were to
> check for
> machine problems, since the executable had not changed.  The
> problem
> was caused by a "reverse" Gateway, one that had its server side
> on the
> machine network.  The hanging in this one caused the others to
> crash,
> though it did not crash itself.
> 
> The bug is that the CAS socket that receives UDP broadcasts
> blocks
> waiting for input and can delay the server, so it does not
> process its
> queues.  For the Gateway, this in turn delays the Gateway
> client and
> also keeps it from processing.
> 
> The hanging Gateway does not crash, but owing to not processing
> right,
> it apparently sends bad messages to other clients and Gateways.
> The
> other Gateways are not robust to this and crash.  This is a
> second
> bug.  Both are characterized by "bad message id" messages in
> the
> Gateway logs.
> 
> The problem is compounded by the new feature in CAC, where it
> disconnects if you do not call ca_poll every 30 sec, even when
> there
> is nothing wrong with the actual connection.  The thrashing
> caused by
> the continual connects and disconnects contributes to the
> Gateway
> interaction problem.
> 
> This "feature" also causes other problems and is mentioned here
> because there are no apparent plans to change it.  It causes
> programs
> that used to work to not work now.  (Probe was an example.
> When it
> was not monitoring, there was no reason to call ca_poll and
> this
> worked fine.  It has been fixed now to call ca_poll every 100
> ms, even
> when it doesn't need information.)  There are many other
> programs that
> used to work and need to be changed or are not amenable to
> being
> changed.  I know of at least one developer's setting
> EPICS_CA_CONN_TMO
> to a huge number to get around this problem.  This is bad.
> [End of
> commercial]
> 
> If you want more information, these bugs are described in more
> detail,
> along with a fix for the hanging, in the EPICS bug tracking
> system,
> accessed from the EPICS pages:
> 
> http://www.aps.anl.gov/epics/mantis/login_page.php
> 
> The bugs are #176 and #177.



References:
The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Ken Evans

Navigate by Date:
Prev: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Ken Evans
Next: building 3.14.7 on Windows Liyu, Andrei
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Ken Evans
Next: building 3.14.7 on Windows Liyu, Andrei
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  <20052006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·