EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Very slow reconnection to medm after IOC reboot
From: "Mark Rivers" <[email protected]>
To: "Jeff Hill" <[email protected]>, "Andrew Johnson" <[email protected]>, "epics" <[email protected]>
Date: Fri, 13 Feb 2009 13:31:05 -0600
Folks,

One more followup.

I've now re-built base on win32-x86 with the patch to 3.14.10.  This is
the platform where I originally reported the problem.  The patch indeed
also fixes the slow reconnect times on win32x-86.

Without patch, time to reconnnect 223 client channels after IOC down for
10 minutes: 3 minutes and 45 seconds.

Time after patch applied: less than 15 seconds.

Recall that this problem only shows up on little-endian machines when
EPICS_CA_AUTO_ADDR_LIST=NO.

Mark


-----Original Message-----
From: Mark Rivers 
Sent: Wednesday, February 11, 2009 4:09 PM
To: 'Jeff Hill'; 'Andrew Johnson'; 'epics'
Subject: RE: Very slow reconnection to medm after IOC reboot

Folks,

This discussion moved off tech-talk for a while, but there is now at
least a partial solution to the problem so I want to post an update.

The main problem I was having was due to a bug in the channel access
server that was introduced in EPICS 3.14.10.  The bug manifested itself
under the following conditions:

- Little-endian IOC architecture.
- EPICS_CA_AUTO_ADDR_LIST=NO, i.e. a manually configured
EPICS_CA_ADDR_LIST was being used, even if that list was the same local
subnet broadcast address that would be automatically set.
- EPICS_CAS_BEACON_ADDR_LIST not set.

This bug caused the EPICS beacon address list to be empty, i.e. the IOC
did not send any beacons when it started up.  The following message was
displayed on the IOC right after iocInit.

"The CA server's beacon address list was empty after initialization?"

Because the IOC did not send beacons EPICS clients did not detect any
beacon anomalies, and took a long time to connect.

Here are tests I did to demonstrate the problem, and to demonstrate that
with Jeff Hill's fix to channel access (below) that the problem is gone.

Test configuration: 
- EPICS 3.14.10 IOC on Linux (linux-x86) host
- MEDM 3.1.3 built with EPICS 3.14.10 on same Linux host
- 4 medm screens open with a total of 223 channels
- Measured time for all 223 medm channels to reconnect after IOC was
shut down for 10 minutes

Test 1) 
- Unpatched base, EPICS_CA_AUTO_ADDR_LIST not set.  Beacons are sent.
- Time to reconnect: less than 10 seconds.

Test 2)
- Unpatched base, EPICS_CA_AUTO_ADDR_LIST=NO,
EPICS_CA_ADDR_LIST=164.54.160.255.
  No beacons are sent.
- Time to reconnect: 4 minutes and 30 seconds

Test 3)
- Patched base, EPICS_CA_AUTO_ADDR_LIST=NO,
EPICS_CA_ADDR_LIST=164.54.160.255.
  Beacons are sent.
- Time to reconnect: less than 10 seconds

So the following patch to EPICS base definitely makes the reconnections
much faster: 4 minutes goes down to 10 seconds!

corvette:base-3.14.10/src/ca>cvs diff -rR3-14-10 iocinf.cpp
Index: iocinf.cpp
===================================================================
RCS file: /net/phoebus/epicsmgr/cvsroot/epics/base/src/ca/iocinf.cpp,v
retrieving revision 1.23.2.5
diff -u -r1.23.2.5 iocinf.cpp
--- iocinf.cpp  22 Sep 2008 18:20:56 -0000      1.23.2.5
+++ iocinf.cpp  11 Feb 2009 22:04:19 -0000
@@ -97,7 +97,7 @@
             continue;
         }
 
-        if ( ignoreNonDefaultPort && addr.sin_port != port ) {
+        if ( ignoreNonDefaultPort && ntohs ( addr.sin_port ) != port )
{
             continue;
         }

As Jeff explained in another message with large numbers of disconnected
channels and an IOC that has been down for a long time, it is possible
that reconnects in 3.14.7 and later can take longer than they did in
previous versions of EPICS.  But the very long times I was seeing were
not due to this design change, but rather due to the fact that there
were no beacons being sent.

Cheers,
Mark


-----Original Message-----
From: Mark Rivers 
Sent: Thursday, January 22, 2009 12:43 PM
To: 'Jeff Hill'; 'Andrew Johnson'; epics
Subject: Very slow reconnection to medm after IOC reboot

Folks,

I have noticed a problem which seems new, though I can't say exactly
when it started.

The time for EPICS CA clients to reconnect when an IOC reboots is quite
long.

Here are some specifics.  I am running a soft IOC on Linux or Windows
and running medm on Linux or Windows.  There are 4 medm screens opened
to the IOC with about 200 PVs total.  There are 2 computers under test:
a recent Redhat Linux machine and a Windows XP machine.  They are on the
same subnet.  caRepeater is running on both machines.

******************************
Linux IOC (3.14.10), Linux medm client (built with 3.14.8.2) on the same
machine.

Stop IOC, wait 10 seconds, restart IOC.  
Time for medm to reconnect all channels: less than 10 seconds.

Stop IOC, wait 8 minutes, restart IOC.  
Time for medm to reconnect first channel: 40 seconds
Time for medm to reconnect all channels: 60 seconds

******************************
Windows IOC (3.14.10), same Linux medm client (built with 3.14.8.2)

Stop IOC, wait 10 seconds, restart IOC.  
Time for medm to reconnect all channels: less than 10 seconds.

Stop IOC, wait 15 minutes, restart IOC.
After 3 minutes no channels had reconnected.  Open a new medm screen to
the same IOC.  It immediately connects all PVs on the new screen, but
the PVs on the previous screens are still not connected.
Time for medm to reconnect all channels: 4 minutes (finally after 4
minutes all PVs on the previous screens reconnected.

******************************
Windows IOC (3.14.10), Windows medm client (built with 3.14.9) on same
PC.

Stop IOC, wait 10 seconds, restart IOC.  
Time for medm to reconnect all channels: less than 10 seconds.

Stop IOC, wait 15 minutes, restart IOC.
Time for medm to reconnect first channel: 60 seconds
Time for medm to reconnect all channels: 4 minutes and 25 seconds.

******************************

This definitely seems like a new problem to me, it never used to take 4
minutes to reconnect.  Unfortunately I have recently upgraded both medm
(from a much older version, to the newer one built with 3.14.8.2) and
the IOCs (from 3.14.8.2 to 3.14.10).

On the Windows PC I have the following EPICS environment variables
defined:
EPICS_CA_AUTO_ADDR_LIST=NO
EPICS_HOST_ARCH=win32-x86
EPICS_CA_ADDR_LIST=localhost 164.54.160.255
EPICS_DISPLAY_PATH=C:\EPICS\adls\
EPICS_CA_MAX_ARRAY_BYTES=10000000

On the Linux machine I have the following EPICS environment variables
defined:
EPICS_HOST_ARCH=linux-x86
EPICS_DISPLAY_PATH=/home/epics/adl/all
EPICS_CA_MAX_ARRAY_BYTES=10000000

These delays make IOC development very painful, particularly on Windows.
Each time I restart the IOC if I wait more than 30 seconds or so it then
takes 4 minutes for the medm screens to refresh.  It becomes quicker to
close them all and re-open them, though this is very tedious.

Have others seen similar problems?  Any idea when they started?  Do they
go away if clients are built with 3.14.10 which I have not tried (but I
have used 3.14.9)

Thanks,
Mark




References:
Very slow reconnection to medm after IOC reboot Mark Rivers
RE: Very slow reconnection to medm after IOC reboot Mark Rivers

Navigate by Date:
Prev: RE: CA Server queue depth Jeff Hill
Next: compile error with medm 3.1.3 on fedora 10 Dropp, Christopher
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Very slow reconnection to medm after IOC reboot Mark Rivers
Next: MEDM EPICS Win32 Extensions Richard Pastrick
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 31 Jan 2014 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·