On Mon, May 14, 2012 at 03:29:41PM +0000, Hill, Jeff wrote:
> Hi,
>
> > 2. Call ca_pend_io() to make sure the request is sent on its way.
>
> The ca_pend_io does have also an implicit flush, but it's not typically
> used only for that purpose, if a flush is all that needed. It also turns out
> that the udp based search messages are sent periodically, and ca internally
> takes care of flushing them out (when you periodically call ca_poll).
OK. I will do ca_flush_io() at that point instead.
> > At present, the only client platform that I have that has these timeouts
> > is Debian 6.0 Linux (Squeeze). The same hardware running Debian 5.0
> > did not have these problems. The clients are using EPICS Base 3.14.10
> > and the IOC is using 3.14.12.1.
>
> Are both installations the same word size (i.e. both 32 or both 64). Is gcc
> at a significantly different version on the two os? There might also be
> something changing in the nic driver btw debian versions.
Yes, they are all running 32-bit operating systems. Here is how they
are configured:
Distribution Kernel GCC Ethernet card Kernel driver
Debian 6.0 (Squeeze) 2.6.32 4.4.5 nVidia CK804 forcedeth
Debian 5.0 (Lenny) 2.6.26 4.3.2 VIA VT6102 via-rhine
The two Debian 6.0 machines are the user control workstations for two
different APS beamlines. The Debian 5.0 machine was set up specifically
for this test. One thing that just occurred to me is that I do not know
whether the current beamline control computers are old beamline control
computers upgraded to Debian 6.0 or whether they are brand new computers.
I am not involved in administering those computers, so I will have to ask
the people who do that, since that may be the important difference here.
> > I am assuming that what I need to do here is to monitor the network
> > traffic between the Debian 6.0 machine and the IOC and compare it
> > to the traffic between a Debian 5.0 machine and the same IOC.
>
> Do you see these same issues when changing EPICS_CA_AUTO_ADDR_LIST to
> NO, and then manually configure the server's address(s) in
> EPICS_CA_ADDR_LIST (this would fault isolate an issue with broadcasting).
> It might be interesting to monitor ICMP (IP error diagnostic) traffic
> on your network with wireshark or some other Ethernet sniffer. You could
> could look at all ICMP messages and also only the ICMP messages that have
> the CA client's host for a destination address. I have seen problems here
> getting a client to connect reliably when there were too many hosts
> on the LAN with miss-configured subnet masks. There were so many ICMP
> error responses to each search broadcast that the finite length UDP
> input queue was saturating.
I do intend to pursue this using Wireshark. In fact, my original reason
for asking a question was to find out whether or not there are other tools
(perhaps EPICS-specific ones?) besides Wireshark. One possible glitch is
that the beamline computers are supposed to use a netmask of 255.255.255.128.
If some computers are mistakenly configured with 255.255.255.0, then that
might produce the ICMP errors you are talking about.
The odd thing is that currently we cannot reproduce the timeout problems.
If we run tests now, everything seems to run just fine. That makes it
rather hard to diagnose the problem.
However, the APS is currently in a shutdown period between user runs.
Given that we had timeout problems when the APS was running, but do not
have timeout problems during the current APS shutdown, my current wild
guess is that the network traffic profile is different on the local subnet
during an APS run compared to what it is during a shutdown. Which is why
I am investigating network monitoring tools.
> I assume that other types of network traffic work (i.e. ping, ssh, telnet,
> http) without issues on this network?
>
> Jeff
As far as I know, all other network protocols seem to be working fine.
Thanks.
Bill Lavender
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > On Behalf Of Bill Lavender
> > Sent: Friday, May 11, 2012 2:21 PM
> > To: [email protected]
> > Subject: Channel Access monitoring tools
> >
> > One of the installations I am responsible for is having trouble with
> > unexplained PV connection failures. What I have is a program that needs
> > to do things in a serialized fashion. In other words, for each of
> > several PVs, it needs to start a connection to a PV, wait for the
> > connection to complete, and then immediately use that PV.
> >
> > At present, I am doing something like this:
> >
> > 1. Invoke ca_create_channel() with a connection state change handler.
> > The connection state change handler sets a flag to indicate that
> > the connection has complete.
> >
> > 2. Call ca_pend_io() to make sure the request is sent on its way.
> >
> > 3. I then wait inside a loop periodically calling ca_poll() until
> > my connection flag has been sent.
> >
> > 4. If the loop has been looping for too long, I declare a timeout
> > and call ca_clear_channel() to get rid of the existing unconnected
> > PV andn then call ca_pend_io() to send that request on the way.
> >
> > My code did not originally have this ca_clear_channel() call,
> > but I added it to see if it would help and in the name of
> > preventing memory leaks. It didn't help.
> >
> > 5. I then go back to step one to create a new channel.
> >
> > 6. If I execute the outer loop from step 1 to step 5 too many times
> > then I give up and tell the user that I have timed out.
> >
> > Increasing the timeouts has not helped. For debugging, ,I have tried
> > timeouts as long as 10 seconds and have not seen a change in the frequency
> > of connection timeouts.
> >
> > At present, the only client platform that I have that has these timeouts
> > is Debian 6.0 Linux (Squeeze). The same hardware running Debian 5.0
> > did not have these problems. The clients are using EPICS Base 3.14.10
> > and the IOC is using 3.14.12.1.
> >
> > I am assuming that what I need to do here is to monitor the network
> > traffic between the Debian 6.0 machine and the IOC and compare it
> > to the traffic between a Debian 5.0 machine and the same IOC. I see
> > that there is a Channel Access plugin for Wireshark that I hope will
> > be helpful. Are there other things that I should be trying?
> >
> > The Channel Access code is wrapped in some code of my own, so it will
> > not look quite the same as raw Channel Access code, but if you want
> > to look at it anyway, look at the function mx_epics_pv_connect()
> > in this file
> >
> > http://svn.csrri.iit.edu/mx/trunk/modules/epics/mx_epics.c
> >
> > Thanks.
> >
> > Bill Lavender
> > [email protected]
>
- Replies:
- Re: Channel Access monitoring tools Ralph Lange
- References:
- Channel Access monitoring tools Bill Lavender
- RE: Channel Access monitoring tools Hill, Jeff
- Navigate by Date:
- Prev:
Re: Channel Access monitoring tools Bill Lavender
- Next:
FW: Problem installing CA for QT Andrew Rhyder
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: Channel Access monitoring tools Hill, Jeff
- Next:
Re: Channel Access monitoring tools Ralph Lange
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|