g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
<== Date ==> <== Thread ==>

Subject: RE: burtrb failing on local LAN, but not through gateway
From: "Hill, Jeff" <johill@lanl.gov>
To: Keith Thorne <kthorne@ligo-la.caltech.edu>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Fri, 19 Oct 2012 00:34:22 +0000
> if you wait long enough, all the connections are made.
> Rate is about 1,500-2,000 channels per second

If there are lost UDP frames the client side will back off
on the search rate too avoid overwhelming the IOC or the 
LAN.

Jeff

> -----Original Message-----
> From: Keith Thorne [mailto:kthorne@ligo-la.caltech.edu]
> Sent: Thursday, October 18, 2012 1:56 PM
> To: tech-talk@aps.anl.gov
> Cc: Hill, Jeff
> Subject: Re: burtrb failing on local LAN, but not through gateway
> 
> Dear Jeff
>       Thanks for all the detailed debugging suggestions. So far I have
> found that if you wait long enough, all the connections are made.
> Rate is about 1,500-2,000 channels per second.  What is not working is
> burt's connection callback scheme.
> It should be decrementing a counter as each PV is connected.
> However the count proceeds (at the 1-second stride) if 15,000 connections
> attempted (example numbers only)
> 
> 15000 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> ....
> 
> so it is not decrementing correctly.  I'll have to add in some debugging
> to figure out how to fix it.
> After that, I figure out what the bottleneck is
> 
> 				Keith Thorne
> 					K
> On Oct 17, 2012, at 1:32 PM, Hill, Jeff wrote:
> 
> > Hi Keith,
> >
> >> The application successfully completes the ca_search_and_connect calls
> for
> >> all channels, but only gets successful connections for a few hundred
> >> channels.
> >
> > One possibility might be that the Ubuntu system is running low on some
> > resource such as file descriptors, threads, and or network buffers. If
> this
> > was the cause you probably would see see other, unrelated to EPICS,
> programs
> > struggling during startup and or circuit initiation when they are
> running also
> > on the resource limited host.
> >
> > To debug this situation, I would set the burt connection establishment
> timeouts
> > to be very large, and then run some diagnostics, at the same time  burt
> is
> > having troubles.
> >
> > 0) Do you see that its always the same set of IOCs that burt has
> troubles
> > with, or does the situation move around between different IOCs? The
> later
> > points more towards a client side code or resource consumption issue,
> and
> > the former points more towards a server side code or resource
> consumption
> > issue.
> >
> > 1) look at tcp circuits on the Linux systems where the burt client, and
> > soft IOCs, are running. I would use a command like "netstat | grep 5064"
> > to look at circuits with the CA default port in the server side circuit
> > endpoint. You will be looking to see if too many of the TCP circuits
> have
> > their state machines remaining in a cleanup, and or connection startup,
> > state.
> >
> > 2) Does command line caget, running on the same or different host from
> burt,
> > have troubles when connecting to the same IOC that burt is having
> problems
> > connecting with. Run this test at the same time that burt is having
> problems.
> >
> > 3) I would also have a look at the casr, on the soft IOCs command line.
> > There is an interest level parameter with this command which can be used
> > to obtain more information about what is happening.
> >
> > 4) You might also try running the diagnostic dumping function in the ca
> > client API against the ca client context for the burt program. This can
> > be called from the debugger or burt could be modified to call this
> function,
> > after some timeout, when the channels are not connecting. You can
> increase the
> > interest level to get additional details about the state of the ca
> client library.
> >
> > 5) Also have a look at the CPU meter on the hosts where the soft IOCs
> are
> > running. If the server is starved for CPU it won't connect to the
> client.
> >
> > 6) Watch out for too many ICMP error replies competing with the UDP
> search
> > reply from the IOC. This has been a problem typically only with old IP
> kernel's
> > that incorrectly replied to a UDP broadcast with an ICMP error message.
> To
> > diagnose this type of problem problem you will need an Ethernet
> monitoring ]
> > program like tcpdump.
> >
> > 7) If a particular IOC is known to be experiencing problems, with being
> > connected to, then I would attach gdb and type "thread apply all bt".
> The
> > output is large, but I can use this type of output to determine if there
> > is a problem in the code. Same thing can be done with the burt client
> process.
> >
> >> We are seeing the burtrb calls fail when invoked for a large list
> >> (~14,000) channels
> >
> > I do connect to large numbers of channels in the CA client side
> regression
> > tests.
> >
> > Jeff
> >
> >> -----Original Message-----
> >> From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-
> bounces@aps.anl.gov]
> >> On Behalf Of Keith Thorne
> >> Sent: Wednesday, October 17, 2012 10:50 AM
> >> To: tech-talk@aps.anl.gov
> >> Subject: burtrb failing on local LAN, but not through gateway
> >>
> >> We use the burt extension heavily for backup and restoration of
> settings
> >> on Linux-based IOCs.
> >> We are seeing the burtrb calls fail when invoked for a large list
> >> (~14,000) channels when run on the same LAN as the computer with the
> IOC.
> >> The application successfully completes the ca_search_and_connect calls
> for
> >> all channels, but only gets successful connections for a few hundred
> >> channels.  Interestingly, the same burtrb call works just fine when run
> >> from a computer on a different LAN where an EPICS gateway is used to
> >> provide access.
> >>
> >> We are using Ubuntu 10.04 (64-bit) with EPICS 3.14.2 on the client.
> >>
> >> We tried increasing the number of retries, but all that does is
> increase
> >> the wait time (in 1 second increments) for checking that all
> connections
> >> are made.  This does not work.
> >>
> >> The Burt extension (as well as the casr extension) have not be updated
> for
> >> 3.14.x.  Are there suggestions for an alternate snapshot/restore tool
> that
> >> is being updated?
> >>
> >> Thanks
> >> 	Keith Thorne
> >> 	LIGO Livingston Observatory



References:
burtrb failing on local LAN, but not through gateway Keith Thorne
RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
Re: burtrb failing on local LAN, but not through gateway Keith Thorne

Navigate by Date:
Prev: Re: burtrb failing on local LAN, but not through gateway Keith Thorne
Next: JCA problems and questions Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
Navigate by Thread:
Prev: Re: burtrb failing on local LAN, but not through gateway Keith Thorne
Next: ISA 5.1. CAD tool Jeřábek Jiří
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·