EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: CA reconnection problem
From: Dirk Zimoch <[email protected]>
To: Jeff Hill <[email protected]>
Cc: "'tech-talk'" <[email protected]>
Date: Thu, 06 Aug 2009 11:21:51 +0200
Hi Jeff,

Time advances normally on the IOC.
Bug 299 is not related because I have no suspended tasks.
Bug 330 is also not related because my channels never reconnect, even after days.

The IOC connects to the gateway with normal broadcasts on the local network. I can see beacons from all IOCs and gateway and I can see the beacon anomaly when I start a softioc. Still the IOC does not send any search requests to reconnect the links. I can see the search requests from 3.13 IOCs, 3.14.8 medms and the gateway after a beacon anomaly, but not from 3.14.8 IOCs. It is definitely not the case that the 3.14.8 IOC never stops searching.

I do not want to restart the gateway at the moment because we have user operation. I also think it's not necessary because I can see the channels through the gateway with caget. I doubt that it is a gateway problem at all.

Rebooting the IOC helps. Writing the record name to the link again helps too, but only for this link. All other links on the same IOC to the same target record stay disconnected. I do not understand this.

UDP package size is not an issue because its only 8 broken CA links to 2 records. So it should easily fit into one package. But as I said, I see no package at all (except when I rewnew the link).

I would send a stack trace of the CAC timer thread if I could find it. What's the name?

Here is the full dbcar output, mybe it helps:

X04SA-VME-ID2 > dbcar "",100
    connected X04SA-VME-ID2 X04SA-ID-LMD:PMD.INPB X04SA-ID-LMD:SCALER.S3
    connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-INJ.INPB X04SA-ID-LMD:PMD
    connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-NOINJ.INPB X04SA-ID-LMD:PMD
    connected X04SA-VME-ID2 X04SA-ID-LMD:PMU.INPB X04SA-ID-LMD:SCALER.S2
    connected X04SA-VME-ID2 X04SA-ID-LMU:PMD.INPB X04SA-ID-LMU:SCALER.S3
    connected X04SA-VME-ID2 X04SA-ID-LMU:PMU.INPB X04SA-ID-LMU:SCALER.S2
    connected 129.129.104.31:5064 X04SA-ID-FBPMD:ALARM.INPD X04SA-ID-GAP:ISOPEN
not_connected X04SA-ID-FBPMD:X-COMP.INPE ACOAU-ACCU:OP-MODE
not_connected X04SA-ID-FBPMD:X-COMP.INPI ARIDI-PCT:CURRENT
not_connected X04SA-ID-FBPMD:Y-COMP.INPE ACOAU-ACCU:OP-MODE
not_connected X04SA-ID-FBPMD:Y-COMP.INPI ARIDI-PCT:CURRENT
    connected 129.129.104.31:5064 X04SA-ID-FBPMU:ALARM.INPD X04SA-ID-GAP:ISOPEN
not_connected X04SA-ID-FBPMU:X-COMP.INPE ACOAU-ACCU:OP-MODE
not_connected X04SA-ID-FBPMU:X-COMP.INPI ARIDI-PCT:CURRENT
not_connected X04SA-ID-FBPMU:Y-COMP.INPE ACOAU-ACCU:OP-MODE
not_connected X04SA-ID-FBPMU:Y-COMP.INPI ARIDI-PCT:CURRENT
    connected 129.129.104.31:5064 X04SA-ID:FBPM-ILK.OUT X04SA-ID:ILK-BITS.B1
    connected X04SA-VME-ID2 X04SA-VME-ID2:HBTSYNC.INPA X04SA-VME-ID2:HBT.VAL
    connected X04SA-VME-ID2 X04SA-VME-ID2:MEMALLOC.INPA X04SA-VME-ID2:MEMALLOCRA
W
    connected X04SA-VME-ID2 X04SA-VME-ID2:MEMFREE.INPA X04SA-VME-ID2:MEMFREE-RAW
    connected X04SA-VME-ID2 X04SA-VME-ID2:MEMMAX.INPA X04SA-VME-ID2:MEMMAX-RAW
ncalinks 21 not connected 8 no_read_access 0 no_write_access 0
 nDisconnect 0 nNoWrite 0
ca_client_context at 0x7ca6c90 pndRecvCnt=0 ioSeqNo=0
dbContext at 0x7ca6a80
        event call back cache location 0x7bac130, and its size 24
dbContextReadNotifyCache
dbContextReadNotifyCacheAlocator
        count 1 and size 88
epicsMutexId 0x7ca6c20 source ../../../src/libCom/osi/epicsMutex.cpp line 197

Semaphore Id        : 0x7ca6c50
Semaphore Type      : MUTEX
Task Queuing        : PRIORITY
Pended Tasks        : 3
Owner               : 0x7a331e0    (tShell)
Options             : 0xd       SEM_Q_PRIORITY
                                SEM_DELETE_SAFE
                                SEM_INVERSION_SAFE

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
timerQueue  7bf5d20 148       0
CAC-event   7bb5a90 148       0
CAC-TCP-recv 7a5e3b0 150       0

Channel Access Client Context at 0x7bf8cc0 for user ioc
        revision "@(#) EPICS 3.14.8.2, CA Client Library Nov 29 2006"
Hash table with 512 buckets and 2 items of type 6tcpiiu installed
Virtual circuit to "129.129.104.31:5064" at version V4.8 state 1
        current data cache pointer = 0x7b88c20 current data cache size = 16384
        contiguous receive message count=0, busy detect bool=0, flow control boo
l=0
eceive thread is busy=0
        virtual circuit socket identifier 20
        send thread flush signal:

Semaphore Id        : 0x7b88bc0
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 1
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7b8fba0 148       0

        send thread:
        recv thread:
        echo pending bool = 0
IO identifier hash table:
Connected channels
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 392
        search retry number=1
        name length=20
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 393
        search retry number=1
        name length=20
Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native
type DBF_CHAR, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 394
        search retry number=1
        name length=21
Virtual circuit to "129.129.104.14:5064" at version V4.11 state 1
        current data cache pointer = 0x7af62b0 current data cache size = 16384
        contiguous receive message count=0, busy detect bool=0, flow control boo
l=0
eceive thread is busy=0
        virtual circuit socket identifier 22
        send thread flush signal:

Semaphore Id        : 0x7a1f670
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 1
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7a61590 148       0

        send thread:
        recv thread:
        echo pending bool = 0
IO identifier hash table:
Unresponsive circuit channels
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 1
        search retry number=1
        name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 2
        search retry number=1
        name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 3
        search retry number=1
        name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 4
        search retry number=1
        name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 5
        search retry number=1
        name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 6
        search retry number=1
        name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 7
        search retry number=1
        name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 8
        search retry number=1
        name length=19
Virtual circuit to "129.129.104.31:5064" at version V4.8 state 1
        current data cache pointer = 0x7b88c20 current data cache size = 16384
        contiguous receive message count=0, busy detect bool=0, flow control boo
l=0
eceive thread is busy=0
        virtual circuit socket identifier 20
        send thread flush signal:

Semaphore Id        : 0x7b88bc0
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 1
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7b8fba0 148       0

        send thread:
        recv thread:
        echo pending bool = 0
IO identifier hash table:
Connected channels
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 392
        search retry number=1
        name length=20
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 393
        search retry number=1
        name length=20
Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native
type DBF_CHAR, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 394
        search retry number=1
        name length=21
Virtual circuit to "129.129.104.14:5064" at version V4.11 state 1
        current data cache pointer = 0x7af62b0 current data cache size = 16384
        contiguous receive message count=0, busy detect bool=0, flow control boo
l=0
eceive thread is busy=0
        virtual circuit socket identifier 22
        send thread flush signal:

Semaphore Id        : 0x7a1f670
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 1
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7a61590 148       0

        send thread:
        recv thread:
        echo pending bool = 0
IO identifier hash table:
Unresponsive circuit channels
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 1
        search retry number=1
        name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 2
        search retry number=1
        name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 3
        search retry number=1
        name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 4
        search retry number=1
        name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 5
        search retry number=1
        name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 6
        search retry number=1
        name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 7
        search retry number=1
        name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 8
        search retry number=1
        name length=19
entries per bucket: mean = 0.003906 std dev = 0.062378 max = 1
        connection time out watchdog period 30.000000
Datagram IO circuit (and disconnected channel repository)
        repeater port 5065
        default server port 5064
Channel Access Address List
129.129.104.255:5064
        socket identifier 18
        bytes in xmit buffer 16
        shut down command bool 0
        recv thread exit signal:
repeater subscribe timer: attempts=2 registered=1 once=0
disconnect governor timer:
search timer delay 0.032000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.064000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.128000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.256000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.512000
0 channels with search request pending
0 channels with search response pending
search timer delay 1.024000
0 channels with search request pending
0 channels with search response pending
search timer delay 2.048000
0 channels with search request pending
0 channels with search response pending
search timer delay 4.096000
0 channels with search request pending
0 channels with search response pending
search timer delay 8.192000
0 channels with search request pending
0 channels with search response pending
search timer delay 16.384000
0 channels with search request pending
0 channels with search response pending
search timer delay 32.768000
0 channels with search request pending
0 channels with search response pending
search timer delay 65.536000
0 channels with search request pending
0 channels with search response pending
search timer delay 131.072000
0 channels with search request pending
0 channels with search response pending
search timer delay 262.144000
0 channels with search request pending
0 channels with search response pending
Program begin time:
epicsTime: MON MAY 11 2009 16:53:29.047623310
epicsTime: revision "@(#) EPICS 3.14.8.2, Common Utilities Library Nov 29 2006"
Channel identifier hash table:
Hash table with 512 buckets and 11 items of type 4nciu installed
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 392
        search retry number=1
        name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 5
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 1
        search retry number=1
        name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 6
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 2
        search retry number=1
        name length=18
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 393
        search retry number=1
        name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 7
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 3
        search retry number=1
        name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 8
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 4
        search retry number=1
        name length=18
Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native
type DBF_CHAR, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 394
        search retry number=1
        name length=21
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 392
        search retry number=1
        name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 5
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 1
        search retry number=1
        name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 6
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 2
        search retry number=1
        name length=18
Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 393
        search retry number=1
        name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 7
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 3
        search retry number=1
        name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 8
        search retry number=1
        name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
        network IO pointer = 0x7b96f58
        server identifier 4
        search retry number=1
        name length=18
Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native
type DBF_CHAR, native element count 1, read access, write access
        network IO pointer = 0x7b96cd0
        server identifier 394
        search retry number=1
        name length=21
entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1
IO identifier hash table:
Hash table with 512 buckets and 10 items of type 8baseNMIU installed
event subscription IO at 0x7ae2ef0, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2f1c, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2f48, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2f74, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2fa0, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2fcc, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2ff8, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae3024, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae3050, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae307c, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2ef0, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2f1c, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2f48, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2f74, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2fa0, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2fcc, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae2ff8, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae3024, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae3050, type DBF_invalid, element count 1, mask 5
event subscription IO at 0x7ae307c, type DBF_invalid, element count 1, mask 5
entries per bucket: mean = 0.019531 std dev = 0.138383 max = 1
Beacon source identifier hash table:
Hash table with 512 buckets and 11 items of type 3bhe installed
CA beacon hash entry for 129.129.104.13:32771 with period estimate 15.049622
        beacon number 718291, on MON JUN 29 2009 13:59:03
CA beacon hash entry for 129.129.104.33:5064 with period estimate 14.849221
        beacon number 0, on THU AUG 06 2009 10:56:40
CA beacon hash entry for 129.129.104.34:5064 with period estimate 14.891577
        beacon number 0, on THU AUG 06 2009 10:56:30
CA beacon hash entry for 129.129.104.36:5064 with period estimate 14.936137
        beacon number 0, on THU AUG 06 2009 10:56:36
CA beacon hash entry for 129.129.104.37:5064 with period estimate 14.915017
        beacon number 26678, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.39:5064 with period estimate 14.838267
        beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.13:5064 with period estimate 20.991435
        beacon number 51, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.14:5064 with period estimate 14.935088
        beacon number 218331, on THU AUG 06 2009 10:56:37
CA beacon hash entry for 129.129.104.15:5064 with period estimate 14.881638
        beacon number 218058, on THU AUG 06 2009 10:56:29
CA beacon hash entry for 129.129.104.30:5064 with period estimate 14.953183
        beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.31:5064 with period estimate 14.925085
        beacon number 0, on THU AUG 06 2009 10:56:34
CA beacon hash entry for 129.129.104.13:32771 with period estimate 15.049622
        beacon number 718291, on MON JUN 29 2009 13:59:03
CA beacon hash entry for 129.129.104.33:5064 with period estimate 14.849221
        beacon number 0, on THU AUG 06 2009 10:56:40
CA beacon hash entry for 129.129.104.34:5064 with period estimate 14.891577
        beacon number 0, on THU AUG 06 2009 10:56:30
CA beacon hash entry for 129.129.104.36:5064 with period estimate 14.936137
        beacon number 0, on THU AUG 06 2009 10:56:36
CA beacon hash entry for 129.129.104.37:5064 with period estimate 14.915017
        beacon number 26678, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.39:5064 with period estimate 14.838267
        beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.13:5064 with period estimate 20.991435
        beacon number 51, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.14:5064 with period estimate 14.935088
        beacon number 218331, on THU AUG 06 2009 10:56:37
CA beacon hash entry for 129.129.104.15:5064 with period estimate 14.881638
        beacon number 218058, on THU AUG 06 2009 10:56:29
CA beacon hash entry for 129.129.104.30:5064 with period estimate 14.953183
        beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.31:5064 with period estimate 14.925085
        beacon number 0, on THU AUG 06 2009 10:56:34
entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1
Timer queue:
EPICS threaded timer queue at 0x7bf6240
epicsTimerQueue with 15 items pending
11searchTimer, state = pending, delay = -19.894333
search timer delay 0.064000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.000999
search timer delay 1.024000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.163666
search timer delay 0.128000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.235666
search timer delay 0.256000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.296333
search timer delay 0.512000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -18.926999
search timer delay 2.048000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -17.095666
search timer delay 4.096000
0 channels with search request pending
0 channels with search response pending
23disconnectGovernorTimer, state = pending, delay = -16.191666
disconnect governor timer:
11searchTimer, state = pending, delay = -13.316332
search timer delay 8.192000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -5.424332
search timer delay 16.384000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -5.590332
search timer delay 32.768000
0 channels with search request pending
0 channels with search response pending
15tcpRecvWatchdog, state = pending, delay = 1.191668
Receive virtual circuit watchdog at 0x7b96df0, period 30.000000

11searchTimer, state = pending, delay = 6.361002
search timer delay 65.536000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = 51.530337
search timer delay 131.072000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = 236.852334
search timer delay 262.144000
0 channels with search request pending
0 channels with search response pending
reschedule event

Semaphore Id        : 0x7bf6070
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 0
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

exit event

Semaphore Id        : 0x7bf6030
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 0
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

exitFlag = F, terminateFlag = F
IP address to name conversion engine:
ipAddrToAsciiEngine at 0x7bf86c0 with 0 requests pending
mutex:
epicsMutexId 0x7bf8650 source ../../../src/libCom/osi/epicsMutex.cpp line 197

Semaphore Id        : 0x7bf8680
Semaphore Type      : MUTEX
Task Queuing        : PRIORITY
Pended Tasks        : 0
Owner               : 0x7a331e0    (tShell)
Options             : 0xd       SEM_Q_PRIORITY
                                SEM_DELETE_SAFE
                                SEM_INVERSION_SAFE

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

laborEvent:

Semaphore Id        : 0x7bf8610
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 1
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
ipToAsciiProxy 7bf82c0 189       0

exitFlag  boolean = 0
exit event:
Default mutex:
epicsMutexId 0x7ca6c20 source ../../../src/libCom/osi/epicsMutex.cpp line 197

Semaphore Id        : 0x7ca6c50
Semaphore Type      : MUTEX
Task Queuing        : PRIORITY
Pended Tasks        : 5
Owner               : 0x7a331e0    (tShell)
Options             : 0xd       SEM_Q_PRIORITY
                                SEM_DELETE_SAFE
                                SEM_INVERSION_SAFE

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
scan10      7c2d530 136       0
CAC-UDP     7be23d0 147       0
timerQueue  7bf5d20 148       0
CAC-event   7bb5a90 148       0
CAC-TCP-recv 7a5e3b0 150       0

mutex:
epicsMutexId 0x7ca6c20 source ../../../src/libCom/osi/epicsMutex.cpp line 197

Semaphore Id        : 0x7ca6c50
Semaphore Type      : MUTEX
Task Queuing        : PRIORITY
Pended Tasks        : 5
Owner               : 0x7a331e0    (tShell)
Options             : 0xd       SEM_Q_PRIORITY
                                SEM_DELETE_SAFE
                                SEM_INVERSION_SAFE

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Pended Tasks
------------
   NAME      TID    PRI TIMEOUT
---------- -------- --- -------
scan10      7c2d530 136       0
CAC-UDP     7be23d0 147       0
timerQueue  7bf5d20 148       0
CAC-event   7bb5a90 148       0
CAC-TCP-recv 7a5e3b0 150       0

        preemptive callback is enabled
        there are 0 unsatisfied IO operations blocking ca_pend_io()
        the current io sequence number is 0
IO done event:

Semaphore Id        : 0x7ca6b70
Semaphore Type      : BINARY
Task Queuing        : FIFO
Pended Tasks        : 0
State               : EMPTY
Options             : 0x0       SEM_Q_FIFO

VxWorks Events
--------------
Registered Task     : NONE
Event(s) to Send    : N/A
Options             : N/A

Synchronous group identifier hash table:
Hash table with 0 buckets and 0 items of type 4CASG installed
value = 0 = 0x0
X04SA-VME-ID2 >



Jeff Hill wrote:
Dirk,

The R3.14 design is more conservative from a perspective of avoiding
positive congestion feedback. Here are the primary differences.
O When a circuit times out the CAC library disconnects the application, but
not the circuit. We rely on TCP to decide when to give up on a circuit, and
disconnect it.

O When the CAC library sees a beacon anomaly the search rate receives a
boost, but this boost is only to once every 5 seconds. In contrast, newly
created channels are initially searched for at a rate which is closer to the
average measured round trip time. The CAC library sends multiple search
requests per UDP frame, and depending on past success rates also multiple
UDP frames per periodic search attempt (that part isn’t different between
R3.14 and R3.15). Nevertheless, if there many channels it can take a little
while to connect all of them if the IOC (or GW in this situation has been
off for a long time).

O In R3.14 the CAC library never stops searching for disconnected channels,
but the maximum search rate is adjustable with an environment variable.

I cannot see any search requests from that IOC.

That doesn’t sound right, but bear in mind that if the IOC is sending
unicasts, and not broadcasts, to the gateway on a switched network then the
network sniffer may or may not see them if it isn’t listening at the source
or the destination cables. Directed broadcasts may also not be seen by the
network sniffer if it isn’t sniffing directly on the directed broadcast's
destination subnet.

The output from dbcar when its interest level argument is set very high (say
to 100) will show many things about the internals of the CAC library. Also,
there is a timer thread created by the CAC library that is scheduling the
search requests, and you might send a stack trace (if it isn’t simply
waiting on its time delay event flag).

Another possibility is that something is amiss in the GW. Does restarting
the GW improve the situation?

Jeff
______________________________________________________
Jeffrey O. Hill           Email        [email protected]
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

Message content: TSPA


-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Dirk Zimoch
Sent: Wednesday, August 05, 2009 8:09 AM
To: tech-talk
Subject: CA reconnection problem

Hi all (Jeff in particular)

I have seen a problem with some of our IOCs today. Several beamline IOCs
running
3.14.8 have disconnected CA input links to a central IOC (3.13.10) even
though
the channels are there. The beamline IOCs run behind ca gateways. It may
be that
the gateways or the central IOC had been offline for more than some hours
recently in the last shutdown.

Even though the channels are back and I can read them from the beamline
network
with caget the IOCs don't reconnect.

I have started a softioc in the beamline network to generate a beacon
anomaly
but even then the links do not reconnect. I cannot see any search requests
from
that IOC.

Only 3.14.8 systems show this problem. 3.13. systems don't. Is there a bug
in
the 3.14.8 implementation of CAC I should know of?

Dirk

--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182



--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182

Replies:
RE: CA reconnection problem Rees, NP (Nick)
RE: CA reconnection problem Jeff Hill
References:
CA reconnection problem Dirk Zimoch
RE: CA reconnection problem Jeff Hill

Navigate by Date:
Prev: RE: Building ASYN on Windows Wang Xiaoqiang
Next: RE: CA reconnection problem Rees, NP (Nick)
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: CA reconnection problem Jeff Hill
Next: RE: CA reconnection problem Rees, NP (Nick)
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 31 Jan 2014 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·