Folks,
One more followup.
I've now re-built base on win32-x86 with the patch to 3.14.10. This is
the platform where I originally reported the problem. The patch indeed
also fixes the slow reconnect times on win32x-86.
Without patch, time to reconnnect 223 client channels after IOC down for
10 minutes: 3 minutes and 45 seconds.
Time after patch applied: less than 15 seconds.
Recall that this problem only shows up on little-endian machines when
EPICS_CA_AUTO_ADDR_LIST=NO.
Mark
-----Original Message-----
From: Mark Rivers
Sent: Wednesday, February 11, 2009 4:09 PM
To: 'Jeff Hill'; 'Andrew Johnson'; 'epics'
Subject: RE: Very slow reconnection to medm after IOC reboot
Folks,
This discussion moved off tech-talk for a while, but there is now at
least a partial solution to the problem so I want to post an update.
The main problem I was having was due to a bug in the channel access
server that was introduced in EPICS 3.14.10. The bug manifested itself
under the following conditions:
- Little-endian IOC architecture.
- EPICS_CA_AUTO_ADDR_LIST=NO, i.e. a manually configured
EPICS_CA_ADDR_LIST was being used, even if that list was the same local
subnet broadcast address that would be automatically set.
- EPICS_CAS_BEACON_ADDR_LIST not set.
This bug caused the EPICS beacon address list to be empty, i.e. the IOC
did not send any beacons when it started up. The following message was
displayed on the IOC right after iocInit.
"The CA server's beacon address list was empty after initialization?"
Because the IOC did not send beacons EPICS clients did not detect any
beacon anomalies, and took a long time to connect.
Here are tests I did to demonstrate the problem, and to demonstrate that
with Jeff Hill's fix to channel access (below) that the problem is gone.
Test configuration:
- EPICS 3.14.10 IOC on Linux (linux-x86) host
- MEDM 3.1.3 built with EPICS 3.14.10 on same Linux host
- 4 medm screens open with a total of 223 channels
- Measured time for all 223 medm channels to reconnect after IOC was
shut down for 10 minutes
Test 1)
- Unpatched base, EPICS_CA_AUTO_ADDR_LIST not set. Beacons are sent.
- Time to reconnect: less than 10 seconds.
Test 2)
- Unpatched base, EPICS_CA_AUTO_ADDR_LIST=NO,
EPICS_CA_ADDR_LIST=164.54.160.255.
No beacons are sent.
- Time to reconnect: 4 minutes and 30 seconds
Test 3)
- Patched base, EPICS_CA_AUTO_ADDR_LIST=NO,
EPICS_CA_ADDR_LIST=164.54.160.255.
Beacons are sent.
- Time to reconnect: less than 10 seconds
So the following patch to EPICS base definitely makes the reconnections
much faster: 4 minutes goes down to 10 seconds!
corvette:base-3.14.10/src/ca>cvs diff -rR3-14-10 iocinf.cpp
Index: iocinf.cpp
===================================================================
RCS file: /net/phoebus/epicsmgr/cvsroot/epics/base/src/ca/iocinf.cpp,v
retrieving revision 1.23.2.5
diff -u -r1.23.2.5 iocinf.cpp
--- iocinf.cpp 22 Sep 2008 18:20:56 -0000 1.23.2.5
+++ iocinf.cpp 11 Feb 2009 22:04:19 -0000
@@ -97,7 +97,7 @@
continue;
}
- if ( ignoreNonDefaultPort && addr.sin_port != port ) {
+ if ( ignoreNonDefaultPort && ntohs ( addr.sin_port ) != port )
{
continue;
}
As Jeff explained in another message with large numbers of disconnected
channels and an IOC that has been down for a long time, it is possible
that reconnects in 3.14.7 and later can take longer than they did in
previous versions of EPICS. But the very long times I was seeing were
not due to this design change, but rather due to the fact that there
were no beacons being sent.
Cheers,
Mark
-----Original Message-----
From: Mark Rivers
Sent: Thursday, January 22, 2009 12:43 PM
To: 'Jeff Hill'; 'Andrew Johnson'; epics
Subject: Very slow reconnection to medm after IOC reboot
Folks,
I have noticed a problem which seems new, though I can't say exactly
when it started.
The time for EPICS CA clients to reconnect when an IOC reboots is quite
long.
Here are some specifics. I am running a soft IOC on Linux or Windows
and running medm on Linux or Windows. There are 4 medm screens opened
to the IOC with about 200 PVs total. There are 2 computers under test:
a recent Redhat Linux machine and a Windows XP machine. They are on the
same subnet. caRepeater is running on both machines.
******************************
Linux IOC (3.14.10), Linux medm client (built with 3.14.8.2) on the same
machine.
Stop IOC, wait 10 seconds, restart IOC.
Time for medm to reconnect all channels: less than 10 seconds.
Stop IOC, wait 8 minutes, restart IOC.
Time for medm to reconnect first channel: 40 seconds
Time for medm to reconnect all channels: 60 seconds
******************************
Windows IOC (3.14.10), same Linux medm client (built with 3.14.8.2)
Stop IOC, wait 10 seconds, restart IOC.
Time for medm to reconnect all channels: less than 10 seconds.
Stop IOC, wait 15 minutes, restart IOC.
After 3 minutes no channels had reconnected. Open a new medm screen to
the same IOC. It immediately connects all PVs on the new screen, but
the PVs on the previous screens are still not connected.
Time for medm to reconnect all channels: 4 minutes (finally after 4
minutes all PVs on the previous screens reconnected.
******************************
Windows IOC (3.14.10), Windows medm client (built with 3.14.9) on same
PC.
Stop IOC, wait 10 seconds, restart IOC.
Time for medm to reconnect all channels: less than 10 seconds.
Stop IOC, wait 15 minutes, restart IOC.
Time for medm to reconnect first channel: 60 seconds
Time for medm to reconnect all channels: 4 minutes and 25 seconds.
******************************
This definitely seems like a new problem to me, it never used to take 4
minutes to reconnect. Unfortunately I have recently upgraded both medm
(from a much older version, to the newer one built with 3.14.8.2) and
the IOCs (from 3.14.8.2 to 3.14.10).
On the Windows PC I have the following EPICS environment variables
defined:
EPICS_CA_AUTO_ADDR_LIST=NO
EPICS_HOST_ARCH=win32-x86
EPICS_CA_ADDR_LIST=localhost 164.54.160.255
EPICS_DISPLAY_PATH=C:\EPICS\adls\
EPICS_CA_MAX_ARRAY_BYTES=10000000
On the Linux machine I have the following EPICS environment variables
defined:
EPICS_HOST_ARCH=linux-x86
EPICS_DISPLAY_PATH=/home/epics/adl/all
EPICS_CA_MAX_ARRAY_BYTES=10000000
These delays make IOC development very painful, particularly on Windows.
Each time I restart the IOC if I wait more than 30 seconds or so it then
takes 4 minutes for the medm screens to refresh. It becomes quicker to
close them all and re-open them, though this is very tedious.
Have others seen similar problems? Any idea when they started? Do they
go away if clients are built with 3.14.10 which I have not tried (but I
have used 3.14.9)
Thanks,
Mark
- References:
- Very slow reconnection to medm after IOC reboot Mark Rivers
- RE: Very slow reconnection to medm after IOC reboot Mark Rivers
- Navigate by Date:
- Prev:
RE: CA Server queue depth Jeff Hill
- Next:
compile error with medm 3.1.3 on fedora 10 Dropp, Christopher
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: Very slow reconnection to medm after IOC reboot Mark Rivers
- Next:
MEDM EPICS Win32 Extensions Richard Pastrick
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|