Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017 
<== Date ==> <== Thread ==>

Subject: RE: burtrb failing on local LAN, but not through gateway
From: "Hill, Jeff" <johill@lanl.gov>
To: Keith Thorne <kthorne@ligo-la.caltech.edu>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Wed, 17 Oct 2012 18:32:13 +0000
Hi Keith,

> The application successfully completes the ca_search_and_connect calls for
> all channels, but only gets successful connections for a few hundred
> channels.

One possibility might be that the Ubuntu system is running low on some 
resource such as file descriptors, threads, and or network buffers. If this
was the cause you probably would see see other, unrelated to EPICS, programs 
struggling during startup and or circuit initiation when they are running also
on the resource limited host.

To debug this situation, I would set the burt connection establishment timeouts 
to be very large, and then run some diagnostics, at the same time  burt is 
having troubles.

0) Do you see that its always the same set of IOCs that burt has troubles
with, or does the situation move around between different IOCs? The later 
points more towards a client side code or resource consumption issue, and 
the former points more towards a server side code or resource consumption 
issue. 

1) look at tcp circuits on the Linux systems where the burt client, and 
soft IOCs, are running. I would use a command like "netstat | grep 5064" 
to look at circuits with the CA default port in the server side circuit 
endpoint. You will be looking to see if too many of the TCP circuits have
their state machines remaining in a cleanup, and or connection startup, 
state.

2) Does command line caget, running on the same or different host from burt, 
have troubles when connecting to the same IOC that burt is having problems 
connecting with. Run this test at the same time that burt is having problems.

3) I would also have a look at the casr, on the soft IOCs command line.
There is an interest level parameter with this command which can be used 
to obtain more information about what is happening.

4) You might also try running the diagnostic dumping function in the ca
client API against the ca client context for the burt program. This can 
be called from the debugger or burt could be modified to call this function, 
after some timeout, when the channels are not connecting. You can increase the
interest level to get additional details about the state of the ca client library.

5) Also have a look at the CPU meter on the hosts where the soft IOCs are
running. If the server is starved for CPU it won't connect to the client.

6) Watch out for too many ICMP error replies competing with the UDP search 
reply from the IOC. This has been a problem typically only with old IP kernel's
that incorrectly replied to a UDP broadcast with an ICMP error message. To
diagnose this type of problem problem you will need an Ethernet monitoring ]
program like tcpdump.

7) If a particular IOC is known to be experiencing problems, with being 
connected to, then I would attach gdb and type "thread apply all bt". The 
output is large, but I can use this type of output to determine if there 
is a problem in the code. Same thing can be done with the burt client process.

> We are seeing the burtrb calls fail when invoked for a large list
> (~14,000) channels

I do connect to large numbers of channels in the CA client side regression 
tests.

Jeff

> -----Original Message-----
> From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-bounces@aps.anl.gov]
> On Behalf Of Keith Thorne
> Sent: Wednesday, October 17, 2012 10:50 AM
> To: tech-talk@aps.anl.gov
> Subject: burtrb failing on local LAN, but not through gateway
> 
> We use the burt extension heavily for backup and restoration of settings
> on Linux-based IOCs.
> We are seeing the burtrb calls fail when invoked for a large list
> (~14,000) channels when run on the same LAN as the computer with the IOC.
> The application successfully completes the ca_search_and_connect calls for
> all channels, but only gets successful connections for a few hundred
> channels.  Interestingly, the same burtrb call works just fine when run
> from a computer on a different LAN where an EPICS gateway is used to
> provide access.
> 
> We are using Ubuntu 10.04 (64-bit) with EPICS 3.14.2 on the client.
> 
> We tried increasing the number of retries, but all that does is increase
> the wait time (in 1 second increments) for checking that all connections
> are made.  This does not work.
> 
> The Burt extension (as well as the casr extension) have not be updated for
> 3.14.x.  Are there suggestions for an alternate snapshot/restore tool that
> is being updated?
> 
> Thanks
> 	Keith Thorne
> 	LIGO Livingston Observatory


Replies:
Re: burtrb failing on local LAN, but not through gateway Keith Thorne
References:
burtrb failing on local LAN, but not through gateway Keith Thorne

Navigate by Date:
Prev: burtrb failing on local LAN, but not through gateway Keith Thorne
Next: Re: gpu integration into epics Rodrigo Castro
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017 
Navigate by Thread:
Prev: burtrb failing on local LAN, but not through gateway Keith Thorne
Next: Re: burtrb failing on local LAN, but not through gateway Keith Thorne
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·