Hi Michael:
There's no CA gateway between these IOCs.
The two IOCs are in fact on the same Linux host, and each depends on PVs from the other one.
IOC that provides a PV used as input in a channel access security rule shuts down.
Another IOC that uses that rule detects the disconnect. It recomputes the access security state, then tries to notify all CA clients about the new read/write permissions.
One of those CA clients was actually the IOC that just shut down, the client is no longer valid; crash.
Thanks,
Kay
________________________________________
From: [email protected] <[email protected]> on behalf of Michael Davidsaver <[email protected]>
Sent: Saturday, April 16, 2016 1:43 PM
To: [email protected]
Subject: Re: IOC segmentation fault related to CA security
Hi Matt,
I've created https://bugs.launchpad.net/epics-base/+bug/1571224 for this
bug.
I can recall seeing "dbCa:exceptionCallback" associated with ACF
disconnects before, but don't recall any crashes. This might just be
chance. The 'channel "unknown"' certainly stands out.
Would it be easy for you to take a packet capture of the traffic between
these two IOCs when this crash occurs? This might give some clues about
the specific sequence of events which triggers the crash.
Also, is the communication between these two IOCs direct? You don't
mention any ca gateway.
Michael
On 04/15/2016 05:27 PM, Pearson, Matthew R. wrote:
> Hi,
>
> I’ve had a few instances of one of my soft IOCs crashing with a segmentation fault when I shutdown another IOC that hosts a PV used as part of the CA security logic on the other crashed IOC.
>
> There is sometimes (but not always) this message printed out before the crash:
>
> dbCa:exceptionCallback stat "Virtual circuit disconnect" channel "unknown" context "cg1d-dassrv1.ornl.gov:5064"
> nativeType DBR_invalid requestType DBR_invalid nativeCount 0 requestCount 0 noReadAccess noWriteAccess
>
> Printing the stack trace:
>
> Core was generated by `../../bin/linux-x86_64/cg1d-parker1 ./st.cmd'.
> Program terminated with signal 11, Segmentation fault.
> #0 0x00007f5321218599 in ellDelete (pList=0x7f52bc000920, pNode=0x7f52ac008ec0) at ../../../src/libCom/ellLib/ellLib.c:87
> 87 pNode->previous->next = pNode->next;
> Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 readline-6.0-4.el6.x86_64
> (gdb) bt
> #0 0x00007f5321218599 in ellDelete (pList=0x7f52bc000920, pNode=0x7f52ac008ec0) at ../../../src/libCom/ellLib/ellLib.c:87
> #1 0x00007f532217c289 in casAccessRightsCB (ascpvt=0x7f52b8000db8, type=asClientCOAR) at ../camessage.c:1111
> #2 0x00007f5321d64122 in asComputePvt (asClientPvt=0x7f52b8000db8) at ../asLibRoutines.c:1014
> #3 0x00007f5321d63ea0 in asComputeAsgPvt (pasg=0x1f91ee0) at ../asLibRoutines.c:940
> #4 0x00007f5321d62419 in asComputeAsg (pasg=0x1f91ee0) at ../asLibRoutines.c:455
> #5 0x00007f5321d60482 in connectCallback (arg=...) at ../asCa.c:99
> #6 0x00007f53214c1301 in oldChannelNotify::disconnectNotify (this=0x7f52d0000d10, guard=...) at ../oldChannelNotify.cpp:112
> #7 0x00007f53214abe30 in nciu::unresponsiveCircuitNotify (this=0x7f5323714010, cbGuard=..., guard=...) at ../nciu.cpp:171
> #8 0x00007f53214b7c38 in tcpiiu::disconnectAllChannels (this=0x7f52d40008c0, cbGuard=..., guard=..., discIIU=...) at ../tcpiiu.cpp:1834
> #9 0x00007f532149a042 in cac::destroyIIU (this=0x7f52d000ed20, iiu=...) at ../cac.cpp:1227
> #10 0x00007f53214b27e3 in tcpSendThread::run (this=0x7f52d4000a00) at ../tcpiiu.cpp:229
> #11 0x00007f532122bc51 in epicsThreadCallEntryPoint (pPvt=0x7f52d4000a08) at ../../../src/libCom/osi/epicsThread.cpp:85
> #12 0x00007f53212333ce in start_routine (arg=0x7f52d400a250) at ../../../src/libCom/osi/os/posix/osdThread.c:385
> #13 0x00007f53204a29d1 in start_thread () from /lib64/libpthread.so.0
> #14 0x00007f53207a08fd in clone () from /lib64/libc.so.6
>
>
> The crashed IOC has a CA access security rule that looks like:
>
> ASG(DEFAULT)
> {
> INPA("$(P):Scan:Active")
> RULE(1, READ)
> RULE(1, WRITE)
> {
> CALC("A=0")
> }
> RULE(1, WRITE)
> {
> UAG(epics, beamline, detector)
> HAG(beamline)
> CALC("A=1")
> }
> }
>
> where $(P):Scan:Active is hosted by the IOC that I’m shutting down.
>
> In addition there is a channel access link between the two IOCs involving a CP link.
>
> I can’t reliably reproduce it, but it’s happened a few times today as I was testing it (stopping and starting the IOC hosting $(P):Scan:Active perhaps 20 times).
>
> Anybody have ideas about this?
>
> Our base version is 3.14.12.4 running on RHEL6.
>
> Cheers,
> Matt
>
>
> Data Acquisition and Control Engineer
> Spallation Neutron Source
> Oak Ridge National Lab
>
>
>
>
>
>
- References:
- IOC segmentation fault related to CA security Pearson, Matthew R.
- Re: IOC segmentation fault related to CA security Michael Davidsaver
- Navigate by Date:
- Prev:
RE: seqence update, then make motor error Mark Rivers
- Next:
Event Generator and Receiver. Amit Chauhan
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
<2016>
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: IOC segmentation fault related to CA security Michael Davidsaver
- Next:
EPICS at the last continent - Antarctic Shen, Guobao
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
<2016>
2017
2018
2019
2020
2021
2022
2023
2024
|