Hi Dirk, Jeff,
I think this is possibly the same problem we described in the thread:
"Inter-IOC link problems"
http://www.aps.anl.gov/epics/tech-talk/2008/msg01103.php
That thread seemed to peter out because we basically developed reboot
workarounds that meant it didn't bother us as much. However, I don't
think it was ever solved. I don't know whether there is a Mantis entry
for it.
Cheers,
Nick Rees
Principal Software Engineer Phone: +44 (0)1235-778430
Diamond Light Source Fax: +44 (0)1235-446713
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Dirk Zimoch
> Sent: 06 August 2009 10:22
> To: Jeff Hill
> Cc: 'tech-talk'
> Subject: Re: CA reconnection problem
>
> Hi Jeff,
>
> Time advances normally on the IOC.
> Bug 299 is not related because I have no suspended tasks.
> Bug 330 is also not related because my channels never
> reconnect, even after days.
>
> The IOC connects to the gateway with normal broadcasts on the
> local network. I
> can see beacons from all IOCs and gateway and I can see the
> beacon anomaly when
> I start a softioc. Still the IOC does not send any search
> requests to reconnect
> the links. I can see the search requests from 3.13 IOCs,
> 3.14.8 medms and the
> gateway after a beacon anomaly, but not from 3.14.8 IOCs. It
> is definitely not
> the case that the 3.14.8 IOC never stops searching.
>
> I do not want to restart the gateway at the moment because we
> have user
> operation. I also think it's not necessary because I can see
> the channels
> through the gateway with caget. I doubt that it is a gateway
> problem at all.
>
> Rebooting the IOC helps. Writing the record name to the link
> again helps too,
> but only for this link. All other links on the same IOC to
> the same target
> record stay disconnected. I do not understand this.
>
> UDP package size is not an issue because its only 8 broken CA
> links to 2
> records. So it should easily fit into one package. But as I
> said, I see no
> package at all (except when I rewnew the link).
>
> I would send a stack trace of the CAC timer thread if I could
> find it. What's
> the name?
>
> Here is the full dbcar output, mybe it helps:
>
> X04SA-VME-ID2 > dbcar "",100
> connected X04SA-VME-ID2 X04SA-ID-LMD:PMD.INPB
> X04SA-ID-LMD:SCALER.S3
> connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-INJ.INPB
> X04SA-ID-LMD:PMD
> connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-NOINJ.INPB
> X04SA-ID-LMD:PMD
> connected X04SA-VME-ID2 X04SA-ID-LMD:PMU.INPB
> X04SA-ID-LMD:SCALER.S2
> connected X04SA-VME-ID2 X04SA-ID-LMU:PMD.INPB
> X04SA-ID-LMU:SCALER.S3
> connected X04SA-VME-ID2 X04SA-ID-LMU:PMU.INPB
> X04SA-ID-LMU:SCALER.S2
> connected 129.129.104.31:5064 X04SA-ID-FBPMD:ALARM.INPD
> X04SA-ID-GAP:ISOPEN
> not_connected X04SA-ID-FBPMD:X-COMP.INPE ACOAU-ACCU:OP-MODE
> not_connected X04SA-ID-FBPMD:X-COMP.INPI ARIDI-PCT:CURRENT
> not_connected X04SA-ID-FBPMD:Y-COMP.INPE ACOAU-ACCU:OP-MODE
> not_connected X04SA-ID-FBPMD:Y-COMP.INPI ARIDI-PCT:CURRENT
> connected 129.129.104.31:5064 X04SA-ID-FBPMU:ALARM.INPD
> X04SA-ID-GAP:ISOPEN
> not_connected X04SA-ID-FBPMU:X-COMP.INPE ACOAU-ACCU:OP-MODE
> not_connected X04SA-ID-FBPMU:X-COMP.INPI ARIDI-PCT:CURRENT
> not_connected X04SA-ID-FBPMU:Y-COMP.INPE ACOAU-ACCU:OP-MODE
> not_connected X04SA-ID-FBPMU:Y-COMP.INPI ARIDI-PCT:CURRENT
> connected 129.129.104.31:5064 X04SA-ID:FBPM-ILK.OUT
> X04SA-ID:ILK-BITS.B1
> connected X04SA-VME-ID2 X04SA-VME-ID2:HBTSYNC.INPA
> X04SA-VME-ID2:HBT.VAL
> connected X04SA-VME-ID2 X04SA-VME-ID2:MEMALLOC.INPA
> X04SA-VME-ID2:MEMALLOCRA
> W
> connected X04SA-VME-ID2 X04SA-VME-ID2:MEMFREE.INPA
> X04SA-VME-ID2:MEMFREE-RAW
> connected X04SA-VME-ID2 X04SA-VME-ID2:MEMMAX.INPA
> X04SA-VME-ID2:MEMMAX-RAW
> ncalinks 21 not connected 8 no_read_access 0 no_write_access 0
> nDisconnect 0 nNoWrite 0
> ca_client_context at 0x7ca6c90 pndRecvCnt=0 ioSeqNo=0
> dbContext at 0x7ca6a80
> event call back cache location 0x7bac130, and its size 24
> dbContextReadNotifyCache
> dbContextReadNotifyCacheAlocator
> count 1 and size 88
> epicsMutexId 0x7ca6c20 source
> ../../../src/libCom/osi/epicsMutex.cpp line 197
>
> Semaphore Id : 0x7ca6c50
> Semaphore Type : MUTEX
> Task Queuing : PRIORITY
> Pended Tasks : 3
> Owner : 0x7a331e0 (tShell)
> Options : 0xd SEM_Q_PRIORITY
> SEM_DELETE_SAFE
> SEM_INVERSION_SAFE
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> timerQueue 7bf5d20 148 0
> CAC-event 7bb5a90 148 0
> CAC-TCP-recv 7a5e3b0 150 0
>
> Channel Access Client Context at 0x7bf8cc0 for user ioc
> revision "@(#) EPICS 3.14.8.2, CA Client Library Nov 29 2006"
> Hash table with 512 buckets and 2 items of type 6tcpiiu installed
> Virtual circuit to "129.129.104.31:5064" at version V4.8 state 1
> current data cache pointer = 0x7b88c20 current data
> cache size = 16384
> contiguous receive message count=0, busy detect
> bool=0, flow control boo
> l=0
> eceive thread is busy=0
> virtual circuit socket identifier 20
> send thread flush signal:
>
> Semaphore Id : 0x7b88bc0
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 1
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> CAC-TCP-send 7b8fba0 148 0
>
> send thread:
> recv thread:
> echo pending bool = 0
> IO identifier hash table:
> Connected channels
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 392
> search retry number=1
> name length=20
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 393
> search retry number=1
> name length=20
> Channel "X04SA-ID:ILK-BITS.B1", connected to server
> 129.129.104.31:5064, native
> type DBF_CHAR, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 394
> search retry number=1
> name length=21
> Virtual circuit to "129.129.104.14:5064" at version V4.11 state 1
> current data cache pointer = 0x7af62b0 current data
> cache size = 16384
> contiguous receive message count=0, busy detect
> bool=0, flow control boo
> l=0
> eceive thread is busy=0
> virtual circuit socket identifier 22
> send thread flush signal:
>
> Semaphore Id : 0x7a1f670
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 1
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> CAC-TCP-send 7a61590 148 0
>
> send thread:
> recv thread:
> echo pending bool = 0
> IO identifier hash table:
> Unresponsive circuit channels
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 1
> search retry number=1
> name length=18
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 2
> search retry number=1
> name length=18
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 3
> search retry number=1
> name length=18
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 4
> search retry number=1
> name length=18
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 5
> search retry number=1
> name length=19
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 6
> search retry number=1
> name length=19
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 7
> search retry number=1
> name length=19
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 8
> search retry number=1
> name length=19
> Virtual circuit to "129.129.104.31:5064" at version V4.8 state 1
> current data cache pointer = 0x7b88c20 current data
> cache size = 16384
> contiguous receive message count=0, busy detect
> bool=0, flow control boo
> l=0
> eceive thread is busy=0
> virtual circuit socket identifier 20
> send thread flush signal:
>
> Semaphore Id : 0x7b88bc0
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 1
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> CAC-TCP-send 7b8fba0 148 0
>
> send thread:
> recv thread:
> echo pending bool = 0
> IO identifier hash table:
> Connected channels
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 392
> search retry number=1
> name length=20
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 393
> search retry number=1
> name length=20
> Channel "X04SA-ID:ILK-BITS.B1", connected to server
> 129.129.104.31:5064, native
> type DBF_CHAR, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 394
> search retry number=1
> name length=21
> Virtual circuit to "129.129.104.14:5064" at version V4.11 state 1
> current data cache pointer = 0x7af62b0 current data
> cache size = 16384
> contiguous receive message count=0, busy detect
> bool=0, flow control boo
> l=0
> eceive thread is busy=0
> virtual circuit socket identifier 22
> send thread flush signal:
>
> Semaphore Id : 0x7a1f670
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 1
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> CAC-TCP-send 7a61590 148 0
>
> send thread:
> recv thread:
> echo pending bool = 0
> IO identifier hash table:
> Unresponsive circuit channels
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 1
> search retry number=1
> name length=18
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 2
> search retry number=1
> name length=18
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 3
> search retry number=1
> name length=18
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 4
> search retry number=1
> name length=18
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 5
> search retry number=1
> name length=19
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 6
> search retry number=1
> name length=19
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 7
> search retry number=1
> name length=19
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 8
> search retry number=1
> name length=19
> entries per bucket: mean = 0.003906 std dev = 0.062378 max = 1
> connection time out watchdog period 30.000000
> Datagram IO circuit (and disconnected channel repository)
> repeater port 5065
> default server port 5064
> Channel Access Address List
> 129.129.104.255:5064
> socket identifier 18
> bytes in xmit buffer 16
> shut down command bool 0
> recv thread exit signal:
> repeater subscribe timer: attempts=2 registered=1 once=0
> disconnect governor timer:
> search timer delay 0.032000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 0.064000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 0.128000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 0.256000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 0.512000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 1.024000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 2.048000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 4.096000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 8.192000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 16.384000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 32.768000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 65.536000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 131.072000
> 0 channels with search request pending
> 0 channels with search response pending
> search timer delay 262.144000
> 0 channels with search request pending
> 0 channels with search response pending
> Program begin time:
> epicsTime: MON MAY 11 2009 16:53:29.047623310
> epicsTime: revision "@(#) EPICS 3.14.8.2, Common Utilities
> Library Nov 29 2006"
> Channel identifier hash table:
> Hash table with 512 buckets and 11 items of type 4nciu installed
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 392
> search retry number=1
> name length=20
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 5
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 1
> search retry number=1
> name length=18
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 6
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 2
> search retry number=1
> name length=18
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 393
> search retry number=1
> name length=20
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 7
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 3
> search retry number=1
> name length=18
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 8
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 4
> search retry number=1
> name length=18
> Channel "X04SA-ID:ILK-BITS.B1", connected to server
> 129.129.104.31:5064, native
> type DBF_CHAR, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 394
> search retry number=1
> name length=21
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 392
> search retry number=1
> name length=20
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 5
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 1
> search retry number=1
> name length=18
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 6
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 2
> search retry number=1
> name length=18
> Channel "X04SA-ID-GAP:ISOPEN", connected to server
> 129.129.104.31:5064, native t
> ype DBF_ENUM, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 393
> search retry number=1
> name length=20
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 7
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 3
> search retry number=1
> name length=18
> Channel "ACOAU-ACCU:OP-MODE" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 8
> search retry number=1
> name length=19
> Channel "ARIDI-PCT:CURRENT" is disconnected
> network IO pointer = 0x7b96f58
> server identifier 4
> search retry number=1
> name length=18
> Channel "X04SA-ID:ILK-BITS.B1", connected to server
> 129.129.104.31:5064, native
> type DBF_CHAR, native element count 1, read access, write access
> network IO pointer = 0x7b96cd0
> server identifier 394
> search retry number=1
> name length=21
> entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1
> IO identifier hash table:
> Hash table with 512 buckets and 10 items of type 8baseNMIU installed
> event subscription IO at 0x7ae2ef0, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2f1c, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2f48, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2f74, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2fa0, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2fcc, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2ff8, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae3024, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae3050, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae307c, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2ef0, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2f1c, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2f48, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2f74, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2fa0, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2fcc, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae2ff8, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae3024, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae3050, type DBF_invalid, element
> count 1, mask 5
> event subscription IO at 0x7ae307c, type DBF_invalid, element
> count 1, mask 5
> entries per bucket: mean = 0.019531 std dev = 0.138383 max = 1
> Beacon source identifier hash table:
> Hash table with 512 buckets and 11 items of type 3bhe installed
> CA beacon hash entry for 129.129.104.13:32771 with period
> estimate 15.049622
> beacon number 718291, on MON JUN 29 2009 13:59:03
> CA beacon hash entry for 129.129.104.33:5064 with period
> estimate 14.849221
> beacon number 0, on THU AUG 06 2009 10:56:40
> CA beacon hash entry for 129.129.104.34:5064 with period
> estimate 14.891577
> beacon number 0, on THU AUG 06 2009 10:56:30
> CA beacon hash entry for 129.129.104.36:5064 with period
> estimate 14.936137
> beacon number 0, on THU AUG 06 2009 10:56:36
> CA beacon hash entry for 129.129.104.37:5064 with period
> estimate 14.915017
> beacon number 26678, on THU AUG 06 2009 10:56:33
> CA beacon hash entry for 129.129.104.39:5064 with period
> estimate 14.838267
> beacon number 0, on THU AUG 06 2009 10:56:39
> CA beacon hash entry for 129.129.104.13:5064 with period
> estimate 20.991435
> beacon number 51, on THU AUG 06 2009 10:56:33
> CA beacon hash entry for 129.129.104.14:5064 with period
> estimate 14.935088
> beacon number 218331, on THU AUG 06 2009 10:56:37
> CA beacon hash entry for 129.129.104.15:5064 with period
> estimate 14.881638
> beacon number 218058, on THU AUG 06 2009 10:56:29
> CA beacon hash entry for 129.129.104.30:5064 with period
> estimate 14.953183
> beacon number 0, on THU AUG 06 2009 10:56:39
> CA beacon hash entry for 129.129.104.31:5064 with period
> estimate 14.925085
> beacon number 0, on THU AUG 06 2009 10:56:34
> CA beacon hash entry for 129.129.104.13:32771 with period
> estimate 15.049622
> beacon number 718291, on MON JUN 29 2009 13:59:03
> CA beacon hash entry for 129.129.104.33:5064 with period
> estimate 14.849221
> beacon number 0, on THU AUG 06 2009 10:56:40
> CA beacon hash entry for 129.129.104.34:5064 with period
> estimate 14.891577
> beacon number 0, on THU AUG 06 2009 10:56:30
> CA beacon hash entry for 129.129.104.36:5064 with period
> estimate 14.936137
> beacon number 0, on THU AUG 06 2009 10:56:36
> CA beacon hash entry for 129.129.104.37:5064 with period
> estimate 14.915017
> beacon number 26678, on THU AUG 06 2009 10:56:33
> CA beacon hash entry for 129.129.104.39:5064 with period
> estimate 14.838267
> beacon number 0, on THU AUG 06 2009 10:56:39
> CA beacon hash entry for 129.129.104.13:5064 with period
> estimate 20.991435
> beacon number 51, on THU AUG 06 2009 10:56:33
> CA beacon hash entry for 129.129.104.14:5064 with period
> estimate 14.935088
> beacon number 218331, on THU AUG 06 2009 10:56:37
> CA beacon hash entry for 129.129.104.15:5064 with period
> estimate 14.881638
> beacon number 218058, on THU AUG 06 2009 10:56:29
> CA beacon hash entry for 129.129.104.30:5064 with period
> estimate 14.953183
> beacon number 0, on THU AUG 06 2009 10:56:39
> CA beacon hash entry for 129.129.104.31:5064 with period
> estimate 14.925085
> beacon number 0, on THU AUG 06 2009 10:56:34
> entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1
> Timer queue:
> EPICS threaded timer queue at 0x7bf6240
> epicsTimerQueue with 15 items pending
> 11searchTimer, state = pending, delay = -19.894333
> search timer delay 0.064000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -20.000999
> search timer delay 1.024000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -20.163666
> search timer delay 0.128000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -20.235666
> search timer delay 0.256000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -20.296333
> search timer delay 0.512000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -18.926999
> search timer delay 2.048000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -17.095666
> search timer delay 4.096000
> 0 channels with search request pending
> 0 channels with search response pending
> 23disconnectGovernorTimer, state = pending, delay = -16.191666
> disconnect governor timer:
> 11searchTimer, state = pending, delay = -13.316332
> search timer delay 8.192000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -5.424332
> search timer delay 16.384000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = -5.590332
> search timer delay 32.768000
> 0 channels with search request pending
> 0 channels with search response pending
> 15tcpRecvWatchdog, state = pending, delay = 1.191668
> Receive virtual circuit watchdog at 0x7b96df0, period 30.000000
>
> 11searchTimer, state = pending, delay = 6.361002
> search timer delay 65.536000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = 51.530337
> search timer delay 131.072000
> 0 channels with search request pending
> 0 channels with search response pending
> 11searchTimer, state = pending, delay = 236.852334
> search timer delay 262.144000
> 0 channels with search request pending
> 0 channels with search response pending
> reschedule event
>
> Semaphore Id : 0x7bf6070
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 0
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> exit event
>
> Semaphore Id : 0x7bf6030
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 0
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> exitFlag = F, terminateFlag = F
> IP address to name conversion engine:
> ipAddrToAsciiEngine at 0x7bf86c0 with 0 requests pending
> mutex:
> epicsMutexId 0x7bf8650 source
> ../../../src/libCom/osi/epicsMutex.cpp line 197
>
> Semaphore Id : 0x7bf8680
> Semaphore Type : MUTEX
> Task Queuing : PRIORITY
> Pended Tasks : 0
> Owner : 0x7a331e0 (tShell)
> Options : 0xd SEM_Q_PRIORITY
> SEM_DELETE_SAFE
> SEM_INVERSION_SAFE
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> laborEvent:
>
> Semaphore Id : 0x7bf8610
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 1
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> ipToAsciiProxy 7bf82c0 189 0
>
> exitFlag boolean = 0
> exit event:
> Default mutex:
> epicsMutexId 0x7ca6c20 source
> ../../../src/libCom/osi/epicsMutex.cpp line 197
>
> Semaphore Id : 0x7ca6c50
> Semaphore Type : MUTEX
> Task Queuing : PRIORITY
> Pended Tasks : 5
> Owner : 0x7a331e0 (tShell)
> Options : 0xd SEM_Q_PRIORITY
> SEM_DELETE_SAFE
> SEM_INVERSION_SAFE
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> scan10 7c2d530 136 0
> CAC-UDP 7be23d0 147 0
> timerQueue 7bf5d20 148 0
> CAC-event 7bb5a90 148 0
> CAC-TCP-recv 7a5e3b0 150 0
>
> mutex:
> epicsMutexId 0x7ca6c20 source
> ../../../src/libCom/osi/epicsMutex.cpp line 197
>
> Semaphore Id : 0x7ca6c50
> Semaphore Type : MUTEX
> Task Queuing : PRIORITY
> Pended Tasks : 5
> Owner : 0x7a331e0 (tShell)
> Options : 0xd SEM_Q_PRIORITY
> SEM_DELETE_SAFE
> SEM_INVERSION_SAFE
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Pended Tasks
> ------------
> NAME TID PRI TIMEOUT
> ---------- -------- --- -------
> scan10 7c2d530 136 0
> CAC-UDP 7be23d0 147 0
> timerQueue 7bf5d20 148 0
> CAC-event 7bb5a90 148 0
> CAC-TCP-recv 7a5e3b0 150 0
>
> preemptive callback is enabled
> there are 0 unsatisfied IO operations blocking ca_pend_io()
> the current io sequence number is 0
> IO done event:
>
> Semaphore Id : 0x7ca6b70
> Semaphore Type : BINARY
> Task Queuing : FIFO
> Pended Tasks : 0
> State : EMPTY
> Options : 0x0 SEM_Q_FIFO
>
> VxWorks Events
> --------------
> Registered Task : NONE
> Event(s) to Send : N/A
> Options : N/A
>
> Synchronous group identifier hash table:
> Hash table with 0 buckets and 0 items of type 4CASG installed
> value = 0 = 0x0
> X04SA-VME-ID2 >
>
>
>
> Jeff Hill wrote:
> > Dirk,
> >
> > The R3.14 design is more conservative from a perspective of avoiding
> > positive congestion feedback. Here are the primary differences.
> >
> > O When a circuit times out the CAC library disconnects the
> application, but
> > not the circuit. We rely on TCP to decide when to give up
> on a circuit, and
> > disconnect it.
> >
> > O When the CAC library sees a beacon anomaly the search
> rate receives a
> > boost, but this boost is only to once every 5 seconds. In
> contrast, newly
> > created channels are initially searched for at a rate which
> is closer to the
> > average measured round trip time. The CAC library sends
> multiple search
> > requests per UDP frame, and depending on past success rates
> also multiple
> > UDP frames per periodic search attempt (that part isn't
> different between
> > R3.14 and R3.15). Nevertheless, if there many channels it
> can take a little
> > while to connect all of them if the IOC (or GW in this
> situation has been
> > off for a long time).
> >
> > O In R3.14 the CAC library never stops searching for
> disconnected channels,
> > but the maximum search rate is adjustable with an
> environment variable.
> >
> >> I cannot see any search requests from that IOC.
> >
> > That doesn't sound right, but bear in mind that if the IOC
> is sending
> > unicasts, and not broadcasts, to the gateway on a switched
> network then the
> > network sniffer may or may not see them if it isn't
> listening at the source
> > or the destination cables. Directed broadcasts may also not
> be seen by the
> > network sniffer if it isn't sniffing directly on the
> directed broadcast's
> > destination subnet.
> >
> > The output from dbcar when its interest level argument is
> set very high (say
> > to 100) will show many things about the internals of the
> CAC library. Also,
> > there is a timer thread created by the CAC library that is
> scheduling the
> > search requests, and you might send a stack trace (if it
> isn't simply
> > waiting on its time delay event flag).
> >
> > Another possibility is that something is amiss in the GW.
> Does restarting
> > the GW improve the situation?
> >
> > Jeff
> > ______________________________________________________
> > Jeffrey O. Hill Email [email protected]
> > LANL MS H820 Voice 505 665 1831
> > Los Alamos NM 87545 USA FAX 505 665 5107
> >
> > Message content: TSPA
> >
> >
> >> -----Original Message-----
> >> From: [email protected]
> [mailto:[email protected]]
> >> On Behalf Of Dirk Zimoch
> >> Sent: Wednesday, August 05, 2009 8:09 AM
> >> To: tech-talk
> >> Subject: CA reconnection problem
> >>
> >> Hi all (Jeff in particular)
> >>
> >> I have seen a problem with some of our IOCs today. Several
> beamline IOCs
> >> running
> >> 3.14.8 have disconnected CA input links to a central IOC
> (3.13.10) even
> >> though
> >> the channels are there. The beamline IOCs run behind ca
> gateways. It may
> >> be that
> >> the gateways or the central IOC had been offline for more
> than some hours
> >> recently in the last shutdown.
> >>
> >> Even though the channels are back and I can read them from
> the beamline
> >> network
> >> with caget the IOCs don't reconnect.
> >>
> >> I have started a softioc in the beamline network to
> generate a beacon
> >> anomaly
> >> but even then the links do not reconnect. I cannot see any
> search requests
> >> from
> >> that IOC.
> >>
> >> Only 3.14.8 systems show this problem. 3.13. systems
> don't. Is there a bug
> >> in
> >> the 3.14.8 implementation of CAC I should know of?
> >>
> >> Dirk
> >>
> >> --
> >> Dr. Dirk Zimoch
> >> Paul Scherrer Institut, WBGB/006
> >> 5232 Villigen PSI, Switzerland
> >> Phone +41 56 310 5182
> >
> >
>
> --
> Dr. Dirk Zimoch
> Paul Scherrer Institut, WBGB/006
> 5232 Villigen PSI, Switzerland
> Phone +41 56 310 5182
>
--
Scanned by iCritical.
- Replies:
- RE: CA reconnection problem Jeff Hill
- References:
- CA reconnection problem Dirk Zimoch
- RE: CA reconnection problem Jeff Hill
- Re: CA reconnection problem Dirk Zimoch
- Navigate by Date:
- Prev:
Re: CA reconnection problem Dirk Zimoch
- Next:
Linux question Mark Rivers
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: CA reconnection problem Dirk Zimoch
- Next:
RE: CA reconnection problem Jeff Hill
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|