EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Serious issue with JCA / CAJ
From: Ralph Lange <[email protected]>
To: EPICS Tech Talk <[email protected]>
Date: Tue, 03 Sep 2013 12:23:02 +0200
Hi,

ITER is experiencing issues with JCA and CAJ, leading to a CSS application freezing/crashing or running happily, depending on a bugfix level change in EPICS base on the server side.

Nadine Utzel and I were running a few tests, showing the questionable behaviour.


Setup 1:
Server: Gateway 2.0.4.0 using Base 3.14.12.2
Client: CSS using JCA 2.3.6 with JNI to Base 3.14.12.3

When opening panels, or switching tabs in BOY tabbed containers, there are situations where the gateway prints:

CAS:

Sep 03 10:17:35 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1b7
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203

Sep 03 10:17:35 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect

Sep 03 10:17:35 !!! Errlog message received (message is above)

while the CSS console shows:

2013-09-03 08:17:35.185 WARNING [Thread 40] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier

Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=439 detected at ../../../../src/cas/generic/casStrmClient.cc.2203
file: null at line 0
2013-09-03 08:17:35.186 WARNING [Thread 58] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect
Info: ca-gateway-util.codac.iter.org:5064
file: ../cac.cpp at line 1214
2013-09-03 08:17:35.442 WARNING [Thread 39] org.csstudio.utility.pv.epics.JCACommandThread (run) - JCACommandThread exception
java.lang.IllegalStateException: Invalid channel
                at gov.aps.jca.jni.JNIChannel.assertState(JNIChannel.java:71)
                at gov.aps.jca.jni.JNIChannel.getConnectionState(JNIChannel.java:221)
                at org.csstudio.utility.pv.epics.EPICS_V3_PV$3.run(Unknown Source)
                at org.csstudio.utility.pv.epics.JCACommandThread.run(Unknown Source)
[...]


The last exception gets repeated many times, probably once per PV.
Many channels reconnect, some stay disconnected and will never connect.

Sometimes that last repeated exception on the client side does not occur.

The bad id followed by server disconnect also causes exceptions to be printed to STDERR of CSS:

2013-09-03 08:24:22.335 WARNING [Thread 102] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier
Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=6334 detected at ../../../../src/cas/generic/casStrmClient.cc.2203

file: null at line 0

2013-09-03 08:24:22.336 WARNING [Thread 111] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect

Info: ca-gateway-util.codac.iter.org:5064

file: ../cac.cpp at line 1214


Setup 2:
Server: Gateway 2.0.4.0 using Base 3.14.12.2
Client: CSS using JCA 2.3.6 with CAJ 1.1.10

The gateway shows similar error messages, but always in pairs of two:

CAS:
Sep 03 10:44:50 !!! Errlog message received (message is above)
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203

Sep 03 10:44:50 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1ca
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect

Sep 03 10:44:50 !!! Errlog message received (message is above)
CAS:
Sep 03 10:44:51 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=204 typ=17 cnt=1 psz=0 avail=31d
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203

Sep 03 10:44:51 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect

Sep 03 10:44:51 !!! Errlog message received (message is above)

CSS freezes immediately and has to be killed manually (no access to CSS console). STDERR shows many times (once per PV?):

2013-09-03 08:47:01.439 SEVERE [Thread 41] com.cosylab.epics.caj.impl.CATransport (processRead) -
java.lang.UnsupportedOperationException

                at java.nio.ByteBuffer.array(ByteBuffer.java:959)

                at com.cosylab.epics.caj.impl.handlers.ExceptionResponse.internalHandleResponse(ExceptionResponse.java:130)

                at com.cosylab.epics.caj.impl.handlers.AbstractCAResponseHandler.handleResponse(AbstractCAResponseHandler.java:110)

                at com.cosylab.epics.caj.impl.CAResponseHandler.handleResponse(CAResponseHandler.java:139)

                at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:530)

                at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:412)

                at com.cosylab.epics.caj.impl.CATransport.handleEvent(CATransport.java:350)

                at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.handleEvent(LeaderFollowersHandler.java:77)

                at com.cosylab.epics.caj.impl.reactor.Reactor.processInternal(Reactor.java:400)

                at com.cosylab.epics.caj.impl.reactor.Reactor.process(Reactor.java:284)

                at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.run(LeaderFollowersHandler.java:91)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

                at java.lang.Thread.run(Thread.java:722)
[...]


followed by the really bad guy (when CSS freezes):

2013-09-03 08:49:15.736 WARNING [Thread 41] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from com.cosylab.epics.caj.CAJContext@15546ea6: Virtual circuit disconnect
2013-09-03 08:49:15.757 SEVERE [Thread 1] org.csstudio.utility.pv.epics.EPICS_V3_PV (handleConnected) - UTIL-S15-AG91:MUT3-JT1 connection handling error

java.lang.IllegalStateException: transport closed

                at com.cosylab.epics.caj.impl.CATransport.submit(CATransport.java:827)

                at com.cosylab.epics.caj.impl.requests.AbstractCARequest.submit(AbstractCARequest.java:88)
                at com.cosylab.epics.caj.impl.requests.ReadNotifyRequest.submit(ReadNotifyRequest.java:171)

                at com.cosylab.epics.caj.CAJChannel.get(CAJChannel.java:952)

                at org.csstudio.utility.pv.epics.EPICS_V3_PV.handleConnected(Unknown Source)

                at org.csstudio.utility.pv.epics.EPICS_V3_PV.connect(Unknown Source)

                at org.csstudio.utility.pv.epics.EPICS_V3_PV.start(Unknown Source)

                at org.csstudio.opibuilder.editparts.PVWidgetEditpartDelegate.startPVs(Unknown Source)

                at org.csstudio.opibuilder.editparts.AbstractPVWidgetEditPart.activate(Unknown Source)

                at org.csstudio.opibuilder.widgets.editparts.TextUpdateEditPart.activate(Unknown Source)

                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)

                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)

                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)

                at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source)

                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)

                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)

                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)

                at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source)

                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)

                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)

                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)

                at org.csstudio.opibuilder.widgets.editparts.TabEditPart.activate(Unknown Source)

                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)

                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)

                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)

                at org.csstudio.opibuilder.editparts.DisplayEditpart.activate(Unknown Source)

                at org.eclipse.gef.editparts.AbstractEditPart.addChild(AbstractEditPart.java:215)

                at org.eclipse.gef.editparts.SimpleRootEditPart.setContents(SimpleRootEditPart.java:105)

                at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:617)

                at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:626)

                at org.csstudio.opibuilder.runmode.OPIRuntimeDelegate.init(Unknown Source)

                at org.csstudio.opibuilder.runmode.OPIRunner.init(Unknown Source)

                at org.csstudio.opibuilder.runmode.OPIRunner.setOPIInput(Unknown Source)

                at org.csstudio.opibuilder.runmode.RunModeService.replaceOPIRuntimeContent(Unknown Source)

                at org.csstudio.opibuilder.widgetActions.OpenDisplayAction.openOPI(Unknown Source)

                at org.csstudio.opibuilder.widgetActions.AbstractOpenOPIAction.run(Unknown Source)

                at org.csstudio.opibuilder.widgets.editparts.Draw2DButtonEditPartDelegate$1.actionPerformed(Unknown Source)

                at org.csstudio.swt.widgets.figures.ActionButtonFigure.fireActionPerformed(Unknown Source)

                at org.csstudio.swt.widgets.figures.ActionButtonFigure$ButtonEventHandler.mouseReleased(Unknown Source)

                at org.eclipse.draw2d.Figure.handleMouseReleased(Figure.java:944)

                at org.eclipse.draw2d.SWTEventDispatcher.dispatchMouseReleased(SWTEventDispatcher.java:267)

                at org.eclipse.gef.ui.parts.DomainEventDispatcher.dispatchMouseReleased(DomainEventDispatcher.java:374)

                at org.eclipse.draw2d.LightweightSystem$EventHandler.mouseUp(LightweightSystem.java:548)

                at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:219)

                at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)

                at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)

                at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3588)

                at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3209)

                at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2701)

                at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2665)

                at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2499)

                at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:679)

                at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)

                at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:668)

                at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)

                at org.csstudio.utility.product.Workbench.runWorkbench(Unknown Source)

                at org.csstudio.startup.application.Application.startApplication(Unknown Source)

                at org.csstudio.startup.application.Application.start(Unknown Source)

                at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
                [...]


Now for the fun part:

Setup 3:

Server: Gateway 2.0.4.0 using Base 3.14.12.3
Client: CSS using JCA 2.3.6 with CAJ 1.1.10

No errors whatsoever.


Bottom line:
Switching the CAS CA server from base 3.14.12.2 to 3.14.12.3 determines if the pure Java CA client will die a horrible death or just work fine.

Considering that Channel Access is the main separation layer that enables clients and servers of control systems to be updated independently, and that C/C++ Channel Access works reliably across virtually any combination of Base between 3.13 and 3.15, I would say this is very bad behaviour and should be considered a serious bug.

I know that the client is not using the latest version of CAJ, though. Has this issue been addressed?

Has anyone else seen this? I assume this is connected to LP issue 730720 [1]?
The only relevant change in CAS was adding support for the DBE_PROPERTY flag.

Thanks a lot,
~Ralph

[1] https://bugs.launchpad.net/epics-base/+bug/730720

Replies:
RE: Serious issue with JCA / CAJ Mark Rivers
Re: Serious issue with JCA / CAJ Michael Davidsaver

Navigate by Date:
Prev: Vacancy at Diamond Light Source for Accelerator Control Systems Leader mark.heron
Next: RE: Serious issue with JCA / CAJ Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Vacancy at Diamond Light Source for Accelerator Control Systems Leader mark.heron
Next: RE: Serious issue with JCA / CAJ Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·