Ken's description of this bug is accurate, but it's probably
necessary to clarify that this bug (Mantis 176) is related to the
"portable" CA server, and does not impact the original CA server
which is still used in the R3.14 and R3.13 IOCs. I have not yet
determined if bug 177 is in the CA client library or the gateway
at this time.
Jeff
> -----Original Message-----
> From: Ken Evans [mailto:[email protected]]
> Sent: Friday, February 11, 2005 12:36 PM
> To: [email protected]
> Cc: [email protected]
> Subject: The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES
>
>
> The Gateway, or rather the underlying CAS code, has a problem
> when
> EPICS_MAX_ARRAY_BYTES is set. In short, doing this causes code
> that
> processes UDP search requests from clients (such as MEDM) to be
> run
> that is not run otherwise and uncovers a bug in that one of the
> server
> UDP sockets blocks when it should be non-blocking. This is
> known to
> happen on Solaris and Linux, through 3.14.7.
>
> I don't usually put bugs in tech-talk but this one proved
> rather
> difficult to track down, and this may help someone else.
>
> At the APS the main Gateways started crashing after operating
> system
> patches were applied at the end of December. The actual event
> that
> caused the failure is that sometime in the fall,
> EPICS_CA_MAX_ARRAY_BYTES=140000 was added to everyone's .login.
> It,
> however, did not affect the Gateways until some time later when
> they
> were restarted after the patches. The first efforts were to
> check for
> machine problems, since the executable had not changed. The
> problem
> was caused by a "reverse" Gateway, one that had its server side
> on the
> machine network. The hanging in this one caused the others to
> crash,
> though it did not crash itself.
>
> The bug is that the CAS socket that receives UDP broadcasts
> blocks
> waiting for input and can delay the server, so it does not
> process its
> queues. For the Gateway, this in turn delays the Gateway
> client and
> also keeps it from processing.
>
> The hanging Gateway does not crash, but owing to not processing
> right,
> it apparently sends bad messages to other clients and Gateways.
> The
> other Gateways are not robust to this and crash. This is a
> second
> bug. Both are characterized by "bad message id" messages in
> the
> Gateway logs.
>
> The problem is compounded by the new feature in CAC, where it
> disconnects if you do not call ca_poll every 30 sec, even when
> there
> is nothing wrong with the actual connection. The thrashing
> caused by
> the continual connects and disconnects contributes to the
> Gateway
> interaction problem.
>
> This "feature" also causes other problems and is mentioned here
> because there are no apparent plans to change it. It causes
> programs
> that used to work to not work now. (Probe was an example.
> When it
> was not monitoring, there was no reason to call ca_poll and
> this
> worked fine. It has been fixed now to call ca_poll every 100
> ms, even
> when it doesn't need information.) There are many other
> programs that
> used to work and need to be changed or are not amenable to
> being
> changed. I know of at least one developer's setting
> EPICS_CA_CONN_TMO
> to a huge number to get around this problem. This is bad.
> [End of
> commercial]
>
> If you want more information, these bugs are described in more
> detail,
> along with a fix for the hanging, in the EPICS bug tracking
> system,
> accessed from the EPICS pages:
>
> http://www.aps.anl.gov/epics/mantis/login_page.php
>
> The bugs are #176 and #177.
- References:
- The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Ken Evans
- Navigate by Date:
- Prev:
The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Ken Evans
- Next:
building 3.14.7 on Windows Liyu, Andrei
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
The Gateway, CAS, and EPICS_MAX_ARRAY_BYTES Ken Evans
- Next:
building 3.14.7 on Windows Liyu, Andrei
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|