EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Serious issue with JCA / CAJ
From: Matej Sekoranja <[email protected]>
To: "[email protected]" <[email protected]>
Date: Tue, 3 Sep 2013 22:36:09 +0200
I confirm that Michael is right.

The server responded with CA exception message. The version 1.1.10 had a bug in
decoding exception responses made when CAJ array handing performance
was improved in 1.1.10. This was monitor was not marked as invalid.
This was fixed in 1.1.11.
Gabriele has just fixed CAJ/JCA download links so that you can get the
latest versions.

There is one issue still. Imagine an invalid request is sent when
channel is created. Server drops the connection. Client detects this
and tries to reconnect, i.e. sends UDP search requests. Server
responds and client reconnects. On channel creation the client issues
the invalid request again... now, you have a live-lock :)

Matej

>
> Begin forwarded message:
>
> From: Michael Davidsaver <[email protected]>
> Subject: Re: Serious issue with JCA / CAJ
> Date: 03. september 2013 16:54:31 GMT+02:00
> To: [email protected]
>
> Ralph,
>
> This may be related to the issue described here.
>
> http://www.aps.anl.gov/epics/tech-talk/2012/msg02352.php
>
> Michael
>
>
>
> On 09/03/2013 06:23 AM, Ralph Lange wrote:
>
> Hi,
>
> ITER is experiencing issues with JCA and CAJ, leading to a CSS application
> freezing/crashing or running happily, depending on a bugfix level change in
> EPICS base on the server side.
>
> Nadine Utzel and I were running a few tests, showing the questionable
> behaviour.
>
>
> Setup 1:
> Server: Gateway 2.0.4.0 using Base 3.14.12.2
> Client: CSS using JCA 2.3.6 with JNI to Base 3.14.12.3
>
> When opening panels, or switching tabs in BOY tabbed containers, there are
> situations where the gateway prints:
>
> CAS:
> Sep 03 10:17:35 !!! Errlog message received (message is above)
> CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34
> cnt=1 psz=0 avail=1b7
> bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line
> 2203
>
> Sep 03 10:17:35 !!! Errlog message received (message is above)
> filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
> Bad resource identifier - unexpected problem with client's input - forcing
> disconnect
>
> Sep 03 10:17:35 !!! Errlog message received (message is above)
>
> while the CSS console shows:
>
> 2013-09-03 08:17:35.185 WARNING [Thread 40]
> org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) -
> Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3:
> Status: Bad event subscription (monitor) identifier
> Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=439
> detected at ../../../../src/cas/generic/casStrmClient.cc.2203
> file: null at line 0
> 2013-09-03 08:17:35.186 WARNING [Thread 58]
> org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) -
> Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3:
> Status: Virtual circuit disconnect
> Info: ca-gateway-util.codac.iter.org:5064
> file: ../cac.cpp at line 1214
> 2013-09-03 08:17:35.442 WARNING [Thread 39]
> org.csstudio.utility.pv.epics.JCACommandThread (run) - JCACommandThread
> exception
> java.lang.IllegalStateException: Invalid channel
>                 at
> gov.aps.jca.jni.JNIChannel.assertState(JNIChannel.java:71)
>                 at
> gov.aps.jca.jni.JNIChannel.getConnectionState(JNIChannel.java:221)
>                 at org.csstudio.utility.pv.epics.EPICS_V3_PV$3.run(Unknown
> Source)
>                 at
> org.csstudio.utility.pv.epics.JCACommandThread.run(Unknown Source)
> [...]
>
> The last exception gets repeated many times, probably once per PV.
> Many channels reconnect, some stay disconnected and will never connect.
>
> Sometimes that last repeated exception on the client side does not occur.
>
> The bad id followed by server disconnect also causes exceptions to be
> printed to STDERR of CSS:
>
> 2013-09-03 08:24:22.335 WARNING [Thread 102]
> org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) -
> Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3:
> Status: Bad event subscription (monitor) identifier
> Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=6334
> detected at ../../../../src/cas/generic/casStrmClient.cc.2203
> file: null at line 0
> 2013-09-03 08:24:22.336 WARNING [Thread 111]
> org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) -
> Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3:
> Status: Virtual circuit disconnect
> Info: ca-gateway-util.codac.iter.org:5064
> file: ../cac.cpp at line 1214
>
>
> Setup 2:
> Server: Gateway 2.0.4.0 using Base 3.14.12.2
> Client: CSS using JCA 2.3.6 with CAJ 1.1.10
>
> The gateway shows similar error messages, but always in pairs of two:
>
> CAS:
> Sep 03 10:44:50 !!! Errlog message received (message is above)
> bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line
> 2203
>
> Sep 03 10:44:50 !!! Errlog message received (message is above)
> CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34
> cnt=1 psz=0 avail=1ca
> filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
> Bad resource identifier - unexpected problem with client's input - forcing
> disconnect
>
> Sep 03 10:44:50 !!! Errlog message received (message is above)
> CAS:
> Sep 03 10:44:51 !!! Errlog message received (message is above)
> CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=204 typ=17
> cnt=1 psz=0 avail=31d
> bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line
> 2203
>
> Sep 03 10:44:51 !!! Errlog message received (message is above)
> filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
> Bad resource identifier - unexpected problem with client's input - forcing
> disconnect
>
> Sep 03 10:44:51 !!! Errlog message received (message is above)
>
> CSS freezes immediately and has to be killed manually (no access to CSS
> console). STDERR shows many times (once per PV?):
>
> 2013-09-03 08:47:01.439 SEVERE [Thread 41]
> com.cosylab.epics.caj.impl.CATransport (processRead) -
> java.lang.UnsupportedOperationException
>                 at java.nio.ByteBuffer.array(ByteBuffer.java:959)
>                 at
> com.cosylab.epics.caj.impl.handlers.ExceptionResponse.internalHandleResponse(ExceptionResponse.java:130)
>                 at
> com.cosylab.epics.caj.impl.handlers.AbstractCAResponseHandler.handleResponse(AbstractCAResponseHandler.java:110)
>                 at
> com.cosylab.epics.caj.impl.CAResponseHandler.handleResponse(CAResponseHandler.java:139)
>                 at
> com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:530)
>                 at
> com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:412)
>                 at
> com.cosylab.epics.caj.impl.CATransport.handleEvent(CATransport.java:350)
>                 at
> com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.handleEvent(LeaderFollowersHandler.java:77)
>                 at
> com.cosylab.epics.caj.impl.reactor.Reactor.processInternal(Reactor.java:400)
>                 at
> com.cosylab.epics.caj.impl.reactor.Reactor.process(Reactor.java:284)
>                 at
> com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.run(LeaderFollowersHandler.java:91)
>                 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>                 at java.lang.Thread.run(Thread.java:722)
> [...]
>
> followed by the really bad guy (when CSS freezes):
>
> 2013-09-03 08:49:15.736 WARNING [Thread 41]
> org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) -
> Channel Access Exception from com.cosylab.epics.caj.CAJContext@15546ea6:
> Virtual circuit disconnect
> 2013-09-03 08:49:15.757 SEVERE [Thread 1]
> org.csstudio.utility.pv.epics.EPICS_V3_PV (handleConnected) -
> UTIL-S15-AG91:MUT3-JT1 connection handling error
> java.lang.IllegalStateException: transport closed
>                 at
> com.cosylab.epics.caj.impl.CATransport.submit(CATransport.java:827)
>                 at
> com.cosylab.epics.caj.impl.requests.AbstractCARequest.submit(AbstractCARequest.java:88)
>                 at
> com.cosylab.epics.caj.impl.requests.ReadNotifyRequest.submit(ReadNotifyRequest.java:171)
>                 at com.cosylab.epics.caj.CAJChannel.get(CAJChannel.java:952)
>                 at
> org.csstudio.utility.pv.epics.EPICS_V3_PV.handleConnected(Unknown Source)
>                 at org.csstudio.utility.pv.epics.EPICS_V3_PV.connect(Unknown
> Source)
>                 at org.csstudio.utility.pv.epics.EPICS_V3_PV.start(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.editparts.PVWidgetEditpartDelegate.startPVs(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.editparts.AbstractPVWidgetEditPart.activate(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgets.editparts.TextUpdateEditPart.activate(Unknown
> Source)
>                 at
> org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
>                 at
> org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
>                 at
> org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown
> Source)
>                 at
> org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
>                 at
> org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
>                 at
> org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown
> Source)
>                 at
> org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
>                 at
> org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
>                 at
> org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgets.editparts.TabEditPart.activate(Unknown
> Source)
>                 at
> org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
>                 at
> org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
>                 at
> org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.editparts.DisplayEditpart.activate(Unknown Source)
>                 at
> org.eclipse.gef.editparts.AbstractEditPart.addChild(AbstractEditPart.java:215)
>                 at
> org.eclipse.gef.editparts.SimpleRootEditPart.setContents(SimpleRootEditPart.java:105)
>                 at
> org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:617)
>                 at
> org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:626)
>                 at
> org.csstudio.opibuilder.runmode.OPIRuntimeDelegate.init(Unknown Source)
>                 at org.csstudio.opibuilder.runmode.OPIRunner.init(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.runmode.OPIRunner.setOPIInput(Unknown Source)
>                 at
> org.csstudio.opibuilder.runmode.RunModeService.replaceOPIRuntimeContent(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgetActions.OpenDisplayAction.openOPI(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgetActions.AbstractOpenOPIAction.run(Unknown
> Source)
>                 at
> org.csstudio.opibuilder.widgets.editparts.Draw2DButtonEditPartDelegate$1.actionPerformed(Unknown
> Source)
>                 at
> org.csstudio.swt.widgets.figures.ActionButtonFigure.fireActionPerformed(Unknown
> Source)
>                 at
> org.csstudio.swt.widgets.figures.ActionButtonFigure$ButtonEventHandler.mouseReleased(Unknown
> Source)
>                 at
> org.eclipse.draw2d.Figure.handleMouseReleased(Figure.java:944)
>                 at
> org.eclipse.draw2d.SWTEventDispatcher.dispatchMouseReleased(SWTEventDispatcher.java:267)
>                 at
> org.eclipse.gef.ui.parts.DomainEventDispatcher.dispatchMouseReleased(DomainEventDispatcher.java:374)
>                 at
> org.eclipse.draw2d.LightweightSystem$EventHandler.mouseUp(LightweightSystem.java:548)
>                 at
> org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:219)
>                 at
> org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
>                 at
> org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
>                 at
> org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3588)
>                 at
> org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3209)
>                 at
> org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2701)
>                 at
> org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2665)
>                 at
> org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2499)
>                 at
> org.eclipse.ui.internal.Workbench$7.run(Workbench.java:679)
>                 at
> org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
>                 at
> org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:668)
>                 at
> org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
>                 at
> org.csstudio.utility.product.Workbench.runWorkbench(Unknown Source)
>                 at
> org.csstudio.startup.application.Application.startApplication(Unknown
> Source)
>                 at
> org.csstudio.startup.application.Application.start(Unknown Source)
>                 at
> org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
>                 [...]
>
>
> Now for the fun part:
>
> Setup 3:
> Server: Gateway 2.0.4.0 using Base 3.14.12.3
> Client: CSS using JCA 2.3.6 with CAJ 1.1.10
>
> No errors whatsoever.
>
>
> Bottom line:
> Switching the CAS CA server from base 3.14.12.2 to 3.14.12.3 determines if
> the pure Java CA client will die a horrible death or just work fine.
>
> Considering that Channel Access is the main separation layer that enables
> clients and servers of control systems to be updated independently, and that
> C/C++ Channel Access works reliably across virtually any combination of Base
> between 3.13 and 3.15, I would say this is very bad behaviour and should be
> considered a serious bug.
>
> I know that the client is not using the latest version of CAJ, though. Has
> this issue been addressed?
>
> Has anyone else seen this? I assume this is connected to LP issue 730720
> [1]?
> The only relevant change in CAS was adding support for the DBE_PROPERTY
> flag.
>
> Thanks a lot,
> ~Ralph
>
> [1] https://bugs.launchpad.net/epics-base/+bug/730720
>
>
>

Navigate by Date:
Prev: Re: autosave failure S_rpcLib_RPC_CANTSEND Ron Sluiter
Next: MKS 972-B vacuum gauge Matt Rippa
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Serious issue with JCA / CAJ Ralph Lange
Next: MKS 972-B vacuum gauge Matt Rippa
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·