EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: casStrmClient.cc error: invalid channel identifier
From: "Hill, Jeffrey O" <[email protected]>
To: Emma Shepherd <[email protected]>, Tim Mooney <[email protected]>, Mark Rivers <[email protected]>
Cc: Emmanuel Vettoor <[email protected]>, "[email protected]" <[email protected]>
Date: Tue, 3 Apr 2012 16:31:39 +0000
Hi Emma,

> Thanks for the replies.  At the moment I'm relying on reports from the
> beamline scientists so I'm not sure if it's a true 'crash', but it does
> look that way from the logs as other unrelated asyn messages stop being
> generated at that point, and from what I gather the IOC is unusable
> afterwards until it is rebooted.

If it’s a true crash you will see a suspended task in the vxWorks IOC when you type "i". If so then please type "tt <task id>" and send that result.

Otherwise, if the gateway isn't updating its clients then there is definitely a known issue with earlier versions of the PCAS library/ gateway combination which would stop sending updates to the gateway's clients when a put callback was outstanding. As I recall, when a put callback starts a motor running there can be quite substantial delays until the completion callback is delivered after the motor stops.

> Mar 18 07:09:19 !!! Errlog message received (message is above)
> filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=582
> Bad resource identifier - unexpected problem with client's input - forcing
> disconnect

This indicates that the PCAS (a component of the gateway) is unable to locate the unsigned integer resource identifier it received in an incoming message in its hash table. This certainly could be a sign of some type of data structure corruption, but does not by itself indicate that the gateway crashed as it appears that the gateway has taken proper evasive action forcing its badly behaving client to reconnect. One possibility certainly might be that that the ca client application has passed an invalid channel identifier as an argument (the channel identifier is actually a pointer to a data structure which stores the channel's resource identifier). On embedded systems invalid (for example nill) pointers are sometimes less often caught by the cpu hardware as segmentation violations resulting in the suspended thread scenario.

Jeff

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Emma Shepherd
> Sent: Monday, April 02, 2012 5:51 PM
> To: Tim Mooney; Mark Rivers
> Cc: [email protected]; Emmanuel Vettoor
> Subject: RE: casStrmClient.cc error: invalid channel identifier
> 
> Hi everyone,
> 
> Thanks for the replies.  At the moment I'm relying on reports from the
> beamline scientists so I'm not sure if it's a true 'crash', but it does
> look that way from the logs as other unrelated asyn messages stop being
> generated at that point, and from what I gather the IOC is unusable
> afterwards until it is rebooted.
> 
> The host IOC is on 3.14.8.2 and the IOC running saveData is on 3.14.9 at
> the moment.  We'll see if we can reproduce it in the lab, but it's
> sounding like it will be worth updating to the latest versions.
> 
> Cheers,
> Emma
> 
> -----Original Message-----
> From: Tim Mooney [mailto:[email protected]]
> Sent: Tuesday, 3 April 2012 2:15 AM
> To: Mark Rivers
> Cc: Emmanuel Vettoor; Emma Shepherd; [email protected]
> Subject: Re: casStrmClient.cc error: invalid channel identifier
> 
> Hi Emma,
> 
> Mark is right that saveData is not a sequence program.  It's just straight
> C code.
> 
> Can you say more about the nature of the ioc crash?  Does it cease to
> function completely, does the saveData task get suspended, do unrelated CA
> tasks also have problems?
> 
> I think I can narrow the scope a little bit.  If saveData has a connection
> through a PV gateway, I think it must be to one of what saveData calls
> "extra" PVs - PVs named in saveData.req, which are read once per scan.
> 
> If so, there are only two CA calls that can have occurred: ca_search() and
> ca_array_get_callback().  ca_search() is only called during saveData's
> initialization, so this is probably a ca_array_get_callback().  saveData
> doesn't use a connection callback, though it does check that the chid is
> nonzero before calling ca_array_get_callback().
> 
> It's pretty common at APS for saveData to connect to an "extra" PV through
> a gateway.  I'm not sure how common it is for the ioc hosting such a PV to
> be rebooted, though I think I've seen it happen personally a few tens of
> times over the last decade or so.
> 
> Tim
> 
> 
> ----- Original Message -----
> From: "Mark Rivers" <[email protected]>
> To: "Emma Shepherd" <[email protected]>, tech-
> [email protected]
> Cc: "Emmanuel Vettoor" <[email protected]>
> Sent: Monday, April 2, 2012 7:39:06 AM
> Subject: RE: casStrmClient.cc error: invalid channel identifier
> 
> I don't think saveData is actually a sequence program, but it is a channel
> access client.  That may be important for tracking down the problem.
> 
> Mark
> 
> ________________________________________
> From: [email protected] [[email protected]] on
> behalf of Emma Shepherd [[email protected]]
> Sent: Monday, April 02, 2012 12:01 AM
> To: [email protected]
> Cc: Emmanuel Vettoor
> Subject: RE: casStrmClient.cc error: invalid channel identifier
> 
> Sorry, I missed a possibly important bit of information - the CA client in
> the IOC is actually the sscan saveData sequence program.
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Emma Shepherd
> Sent: Monday, 2 April 2012 11:29 AM
> To: [email protected]
> Cc: Emmanuel Vettoor
> Subject: casStrmClient.cc error: invalid channel identifier
> 
> Hi,
> 
> I'm trying to track down the cause of an intermittent IOC crash, and came
> across launchpad bug 730720 which seems to be related (although in our
> case it doesn't involve CAJ clients).
> 
> Our symptoms are that when an IOC is rebooted, another IOC that has CA
> links to PVs served by that IOC through a gateway dies.  I believe that
> this happens on reboot of several different IOCs, not just one in
> particular.
> 
> We get the following error messages in the gateway log:
> 
> 
> <snip>
> Mar 18 07:09:11 Warning: Virtual circuit disconnect
> SR00IOC10.accelerator.synchrotron.org.au:5064
> CAS Request: ics on sr05id01ioc01: cmd=2 cid=31 typ=0 cnt=1 psz=0
> avail=101
> CAS:
> Mar 18 07:09:19 !!! Errlog message received (message is above) bad
> resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 1810
> 
> Mar 18 07:09:19 !!! Errlog message received (message is above)
> filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=582
> Bad resource identifier - unexpected problem with client's input - forcing
> disconnect
> 
> Mar 18 07:09:19 !!! Errlog message received (message is above) </snip>
> 
> 
> And the following message in the IOC that subsequently dies:
> 
> 
> <snip>
> CA.Client.Exception...............................................
>     Error: "Invalid channel identifier"
>     Context: "host=10.114.10.246:5064 ctx=Bad Resource ID=31 detected at
> ../../../../src/cas/generic/casStrmClient.cc.1810"
>     Current Time: Sun Mar 18 2012 07:09:19.697135000 </snip>
> 
> A reboot of the IOC is enough to fix it, the gateway itself doesn't need
> to be restarted.
> 
> We are using Gateway Version 2.0.3.0 and EPICS base 3.14.9 running on
> CentOS 4.4.  Has anybody been looking into this issue yet?
> 
> Thanks,
> Emma
> 
> 
> <br>This message and any attachments may contain proprietary or
> confidential information. If you are not the intended recipient or you
> received the message in error, you must not use, copy or distribute the
> message. Please notify the sender immediately and destroy the original
> message. Thank you.
> <br>This message and any attachments may contain proprietary or
> confidential information. If you are not the intended recipient or you
> received the message in error, you must not use, copy or distribute the
> message. Please notify the sender immediately and destroy the original
> message. Thank you.
> 
> 
> --
> Tim Mooney ([email protected]) (630)252-5417 Software Services Group
> (www.aps.anl.gov) Advanced Photon Source, Argonne National Lab
> 
> 
> <br>This message and any attachments may contain proprietary or
> confidential information. If you are not the intended recipient or you
> received the message in error, you must not use, copy or distribute the
> message. Please notify the sender immediately and destroy the original
> message. Thank you.



References:
RE: casStrmClient.cc error: invalid channel identifier Mark Rivers
Re: casStrmClient.cc error: invalid channel identifier Tim Mooney
RE: casStrmClient.cc error: invalid channel identifier Emma Shepherd

Navigate by Date:
Prev: RE: What is void* puser in ca_create_channel Hill, Jeffrey O
Next: Does anyone have EPICS Device Support for the DeltaTau Turbo PMAC2 VME Ernest L. Williams Jr.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: casStrmClient.cc error: invalid channel identifier Emma Shepherd
Next: Re: casStrmClient.cc error: invalid channel identifier Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·