Hi Keith,
> The application successfully completes the ca_search_and_connect calls for
> all channels, but only gets successful connections for a few hundred
> channels.
One possibility might be that the Ubuntu system is running low on some
resource such as file descriptors, threads, and or network buffers. If this
was the cause you probably would see see other, unrelated to EPICS, programs
struggling during startup and or circuit initiation when they are running also
on the resource limited host.
To debug this situation, I would set the burt connection establishment timeouts
to be very large, and then run some diagnostics, at the same time burt is
having troubles.
0) Do you see that its always the same set of IOCs that burt has troubles
with, or does the situation move around between different IOCs? The later
points more towards a client side code or resource consumption issue, and
the former points more towards a server side code or resource consumption
issue.
1) look at tcp circuits on the Linux systems where the burt client, and
soft IOCs, are running. I would use a command like "netstat | grep 5064"
to look at circuits with the CA default port in the server side circuit
endpoint. You will be looking to see if too many of the TCP circuits have
their state machines remaining in a cleanup, and or connection startup,
state.
2) Does command line caget, running on the same or different host from burt,
have troubles when connecting to the same IOC that burt is having problems
connecting with. Run this test at the same time that burt is having problems.
3) I would also have a look at the casr, on the soft IOCs command line.
There is an interest level parameter with this command which can be used
to obtain more information about what is happening.
4) You might also try running the diagnostic dumping function in the ca
client API against the ca client context for the burt program. This can
be called from the debugger or burt could be modified to call this function,
after some timeout, when the channels are not connecting. You can increase the
interest level to get additional details about the state of the ca client library.
5) Also have a look at the CPU meter on the hosts where the soft IOCs are
running. If the server is starved for CPU it won't connect to the client.
6) Watch out for too many ICMP error replies competing with the UDP search
reply from the IOC. This has been a problem typically only with old IP kernel's
that incorrectly replied to a UDP broadcast with an ICMP error message. To
diagnose this type of problem problem you will need an Ethernet monitoring ]
program like tcpdump.
7) If a particular IOC is known to be experiencing problems, with being
connected to, then I would attach gdb and type "thread apply all bt". The
output is large, but I can use this type of output to determine if there
is a problem in the code. Same thing can be done with the burt client process.
> We are seeing the burtrb calls fail when invoked for a large list
> (~14,000) channels
I do connect to large numbers of channels in the CA client side regression
tests.
Jeff
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Keith Thorne
> Sent: Wednesday, October 17, 2012 10:50 AM
> To: [email protected]
> Subject: burtrb failing on local LAN, but not through gateway
>
> We use the burt extension heavily for backup and restoration of settings
> on Linux-based IOCs.
> We are seeing the burtrb calls fail when invoked for a large list
> (~14,000) channels when run on the same LAN as the computer with the IOC.
> The application successfully completes the ca_search_and_connect calls for
> all channels, but only gets successful connections for a few hundred
> channels. Interestingly, the same burtrb call works just fine when run
> from a computer on a different LAN where an EPICS gateway is used to
> provide access.
>
> We are using Ubuntu 10.04 (64-bit) with EPICS 3.14.2 on the client.
>
> We tried increasing the number of retries, but all that does is increase
> the wait time (in 1 second increments) for checking that all connections
> are made. This does not work.
>
> The Burt extension (as well as the casr extension) have not be updated for
> 3.14.x. Are there suggestions for an alternate snapshot/restore tool that
> is being updated?
>
> Thanks
> Keith Thorne
> LIGO Livingston Observatory
- Replies:
- Re: burtrb failing on local LAN, but not through gateway Keith Thorne
- References:
- burtrb failing on local LAN, but not through gateway Keith Thorne
- Navigate by Date:
- Prev:
burtrb failing on local LAN, but not through gateway Keith Thorne
- Next:
Re: gpu integration into epics Rodrigo Castro
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
burtrb failing on local LAN, but not through gateway Keith Thorne
- Next:
Re: burtrb failing on local LAN, but not through gateway Keith Thorne
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|