g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
<== Date ==> <== Thread ==>

Subject: Re: burtrb failing on local LAN, but not through gateway
From: Keith Thorne <kthorne@ligo-la.caltech.edu>
To: tech-talk@aps.anl.gov
Date: Thu, 18 Oct 2012 14:55:52 -0500
Dear Jeff
      Thanks for all the detailed debugging suggestions. So far I have found that if you wait long enough, all the connections are made.
Rate is about 1,500-2,000 channels per second.  What is not working is burt's connection callback scheme.
It should be decrementing a counter as each PV is connected.
However the count proceeds (at the 1-second stride) if 15,000 connections attempted (example numbers only)

15000 pending
14300 pending
14300 pending
14300 pending
14300 pending
14300 pending
14300 pending
14300 pending
....

so it is not decrementing correctly.  I'll have to add in some debugging to figure out how to fix it.
After that, I figure out what the bottleneck is

				Keith Thorne
					K 
On Oct 17, 2012, at 1:32 PM, Hill, Jeff wrote:

> Hi Keith,
> 
>> The application successfully completes the ca_search_and_connect calls for
>> all channels, but only gets successful connections for a few hundred
>> channels.
> 
> One possibility might be that the Ubuntu system is running low on some 
> resource such as file descriptors, threads, and or network buffers. If this
> was the cause you probably would see see other, unrelated to EPICS, programs 
> struggling during startup and or circuit initiation when they are running also
> on the resource limited host.
> 
> To debug this situation, I would set the burt connection establishment timeouts 
> to be very large, and then run some diagnostics, at the same time  burt is 
> having troubles.
> 
> 0) Do you see that its always the same set of IOCs that burt has troubles
> with, or does the situation move around between different IOCs? The later 
> points more towards a client side code or resource consumption issue, and 
> the former points more towards a server side code or resource consumption 
> issue. 
> 
> 1) look at tcp circuits on the Linux systems where the burt client, and 
> soft IOCs, are running. I would use a command like "netstat | grep 5064" 
> to look at circuits with the CA default port in the server side circuit 
> endpoint. You will be looking to see if too many of the TCP circuits have
> their state machines remaining in a cleanup, and or connection startup, 
> state.
> 
> 2) Does command line caget, running on the same or different host from burt, 
> have troubles when connecting to the same IOC that burt is having problems 
> connecting with. Run this test at the same time that burt is having problems.
> 
> 3) I would also have a look at the casr, on the soft IOCs command line.
> There is an interest level parameter with this command which can be used 
> to obtain more information about what is happening.
> 
> 4) You might also try running the diagnostic dumping function in the ca
> client API against the ca client context for the burt program. This can 
> be called from the debugger or burt could be modified to call this function, 
> after some timeout, when the channels are not connecting. You can increase the
> interest level to get additional details about the state of the ca client library.
> 
> 5) Also have a look at the CPU meter on the hosts where the soft IOCs are
> running. If the server is starved for CPU it won't connect to the client.
> 
> 6) Watch out for too many ICMP error replies competing with the UDP search 
> reply from the IOC. This has been a problem typically only with old IP kernel's
> that incorrectly replied to a UDP broadcast with an ICMP error message. To
> diagnose this type of problem problem you will need an Ethernet monitoring ]
> program like tcpdump.
> 
> 7) If a particular IOC is known to be experiencing problems, with being 
> connected to, then I would attach gdb and type "thread apply all bt". The 
> output is large, but I can use this type of output to determine if there 
> is a problem in the code. Same thing can be done with the burt client process.
> 
>> We are seeing the burtrb calls fail when invoked for a large list
>> (~14,000) channels
> 
> I do connect to large numbers of channels in the CA client side regression 
> tests.
> 
> Jeff
> 
>> -----Original Message-----
>> From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-bounces@aps.anl.gov]
>> On Behalf Of Keith Thorne
>> Sent: Wednesday, October 17, 2012 10:50 AM
>> To: tech-talk@aps.anl.gov
>> Subject: burtrb failing on local LAN, but not through gateway
>> 
>> We use the burt extension heavily for backup and restoration of settings
>> on Linux-based IOCs.
>> We are seeing the burtrb calls fail when invoked for a large list
>> (~14,000) channels when run on the same LAN as the computer with the IOC.
>> The application successfully completes the ca_search_and_connect calls for
>> all channels, but only gets successful connections for a few hundred
>> channels.  Interestingly, the same burtrb call works just fine when run
>> from a computer on a different LAN where an EPICS gateway is used to
>> provide access.
>> 
>> We are using Ubuntu 10.04 (64-bit) with EPICS 3.14.2 on the client.
>> 
>> We tried increasing the number of retries, but all that does is increase
>> the wait time (in 1 second increments) for checking that all connections
>> are made.  This does not work.
>> 
>> The Burt extension (as well as the casr extension) have not be updated for
>> 3.14.x.  Are there suggestions for an alternate snapshot/restore tool that
>> is being updated?
>> 
>> Thanks
>> 	Keith Thorne
>> 	LIGO Livingston Observatory



Replies:
RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
References:
burtrb failing on local LAN, but not through gateway Keith Thorne
RE: burtrb failing on local LAN, but not through gateway Hill, Jeff

Navigate by Date:
Prev: RE: CSS widget Chen, Xihui
Next: RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
Navigate by Thread:
Prev: RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
Next: RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·