EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Darwin and EPICS_CA_AUTO_ADDR_LIST issue
From: "Jeff Hill" <[email protected]>
To: "'Burkhard Kolb'" <[email protected]>
Cc: <[email protected]>
Date: Mon, 12 Mar 2007 10:11:02 -0600
> One more clue:
> I removed the currently down nodes from the EPICS_CA_ADDR_LIST and all
> seems to be okay. I don't get any error messages like "host is down".
> It seems to me, there is a problem in the code with the unicasts: if
> there are nodes on the list which do not respond (in time?) some other
> nodes (which are alive) are considered to be dead???
> Burkhard

The IP kernel has a finite length input queue attached to each UDP port. The
"host is down" diagnostic is communicated using ICMP messages. I suspect
that what is occurring is that there are a sufficient number of ICMP
messages to temporarily saturate this input queue. This saturation can be
cleared up as soon as CA gets an opportunity to read from the UDP socket,
but by then the legitimate UDP search protocol responses from certain IOCs
may reach this queue when it is saturated and end up being discarded.

So that is maybe an explanation of the behavior you see, but of course isn't
a solution to your troubles. I would first attempt to better understand your
situation by monitoring all ICMP traffic received into this host. On Linux
this can be done by typing "tcpdump ip proto icmp" or perhaps "tcpdump dst
host xxx and ip proto icmp". You will need to change to root first on most
systems and also run tcpdump on the host that is receiving the ICMP messages
(the host with the CA client). You will also need to monitor the ICMP
traffic while the CA client that is experiencing troubles is running. I
recommend this because the raw ICMP traffic can tell us something about what
might actually be occurring. 

See http://en.wikipedia.org/wiki/Internet_Control_Message_Protocol. 

The solution might be only to fix the hosts that have some sort of unusual
configuration - that the routing system might be objecting to. This might be
originating in ordinary hosts or possibly also in some of the routers in
your system. 

> >> CAC: error = "Host is down" sending UDP msg to 140.181.98.50:5064

There is an underlying ICMP message that causes this. Knowing which one it
is might be quite useful when attempting to eliminate the extra traffic.

Jeff

> -----Original Message-----
> From: Burkhard Kolb [mailto:[email protected]]
> Sent: Friday, March 09, 2007 3:07 AM
> To: Jeff Hill
> Cc: [email protected]
> Subject: Re: Darwin and EPICS_CA_AUTO_ADDR_LIST issue
> 
> One more clue:
> I removed the currently down nodes from the EPICS_CA_ADDR_LIST and all
> seems to be okay. I don't get any error messages like "host is down".
> It seems to me, there is a problem in the code with the unicasts: if
> there are nodes on the list which do not respond (in time?) some other
> nodes (which are alive) are considered to be dead???
> Burkhard
> 
> Jeff Hill wrote:
> >
> > What is the response to "ping -s 140.181.98.50"?
> >
> > PS: Behavior is known to vary between IP kernels for such unicast
> addresses
> > if there are two or more daemons (in this case two or more CA UDP
> servers)
> > listening to the same {IP address, port} tuple on the same host. The
> typical
> > behavior difference is that on certain IP kernels only one daemeon
> listening
> > on UDP port 5064 will receive a UDP message with a unicast destination
> > address, but in contrast on other IP kernels all daemeons listening on
> UDP
> > port 5064 will receive such messages. In my experience UDP frames with
> > broadcast address destinations always go to all registered listeners
> > listening on UDP port 5064 - a uniform behavior across IP kernel
> > implementations.
> >
> > PPS: One possible workaround to the above dilemma is to use directed
> > broadcast addresses (router forwarded broadcast addresses) in the
> > EPICS_CA_ADDR_LIST. This sometimes requires enabling of "broadcast
> > forwarding" features in your routers. We might also add support for
> > multicasting to future versions of CA.
> >
> > Jeff
> >
> >> -----Original Message-----
> >> From: Burkhard Kolb [mailto:[email protected]]
> >> Sent: Thursday, March 08, 2007 9:28 AM
> >> To: '[email protected]'
> >> Subject: Darwin and EPICS_CA_AUTO_ADDR_LIST issue
> >>
> >> On my MAC (OSX 10.4.8), EPICS base-3.14.9, medm 3.11:
> >>
> >> When I set EPICS_CA_AUTO_ADDR_LIST to NO and have the
> EPICS_CA_ADDR_LIST
> >> pointing to the list of IOCs medm does not find all PVs. Sometimes the
> >> connections work for a short time then several IOCs are not seen
> anymore.
> >> If I set it to YES, medm finds them always.
> >>
> >> In the xterm window I get some error messages from really not existing
> >> IOCs:
> >> CAC: error = "Host is down" sending UDP msg to 140.181.98.50:5064
> >> But the lost ones are not reported!
> >>
> >> I use the same list of IOCs on other linux boxes and have no problem
> > there.
> >> Any idea?
> >>
> 
> --
> -------------------------------------------------------------------
> Dr. Burkhard Kolb
> GSI mbH  |   KP1   |  Planckstr. 1  |  D-64291 Darmstadt
> Email: [email protected]                |  Tel.: +49 (0)6159 / 71 2667
> -------------------------------------------------------------------


References:
Darwin and EPICS_CA_AUTO_ADDR_LIST issue Burkhard Kolb
RE: Darwin and EPICS_CA_AUTO_ADDR_LIST issue Jeff Hill
Re: Darwin and EPICS_CA_AUTO_ADDR_LIST issue Burkhard Kolb

Navigate by Date:
Prev: Re: caPutLog and Vxstats equivalent for Linux ? David Kline
Next: RE: Darwin and EPICS_CA_AUTO_ADDR_LIST issue Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Darwin and EPICS_CA_AUTO_ADDR_LIST issue Burkhard Kolb
Next: OMS MAXv reset/power-up problem Ronald L. Sluiter
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Nov 2011 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·