Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017 
<== Date ==> <== Thread ==>

Subject: No more CA connections? R3.14.7, 3.14.8.2, CAJ
From: Kay-Uwe Kasemir <kasemirk@ornl.gov>
To: tech talk <tech-talk@aps.anl.gov>
Date: Mon, 24 Apr 2006 15:27:38 -0400
Hi:

At the SNS, we are suddenly experiencing issues with R3.14.7 IOCs under vxWorks.

The symptom: Everything on the IOC looks great,
but the IOC allows no new CA connections.
"casr" shows several existing CA connections,
and they in fact continue to work fine.

Additional suspicious fact:
A "caget some_pv" will typically print the current value of the PV.
Or give a message "connect timed out: 'some_pv' not found"
after (by default) 2 seconds.

When used with a PV from those affected IOCs,
it will print the error message after 2 seconds,
but then hang for a total of ~30 seconds.
And "casr" on the IOC will show a new connection,
even though caget never got a value!

So far, we have not seen any specific error message on the IOC.
Mostly in fact no error messages at all.
No suspended tasks, the network interface is still in "full duplex" with 0 in/out errors,
dbcar keeps all its client connections to other IOCs,
netStatDataPoolShow, endPoolShow have 0 "failed" entries, ...
The only known fix is an IOC reboot.


This started to happen in April.
It has affected at least two different types of IOCs:
Those running low-level RF and those running power-supply applications.
It first affected LLRF IOCs that hadn't changed since January,
so the guess is that this is caused by changes to our network
or CA clients.

What also changed since about that time:
We have more IOCs,
some CA clients already using R3.14.8.2,
and some using CAJ, the pure java CA client.

Maybe completely unrelated:
We are pretty sure that CAJ clients cause occasional IOC error messages
 CAS: request from someIP:port => "bad resource ID"

These messages were on different IOCs at different times.
They coincide with running a CAJ client,
and the IP address matches the host where the CAJ client runs.
They don't happen every time,
and so far the IOCs which gave the "bad resource ID" message
have not been the ones which had CA servers no-more-connections problems.
I still considered it a good enough reason for an SNS CAJ witch hunt:
If it's causing this error message, maybe CAJ is capable
of causing more confusion inside the CA server?
Confusion that goes unnoticed and then results in the connection issue?



Has anybody else seen anything similar?


Thanks,
-Kay




Replies:
Re: No more CA connections? R3.14.7, 3.14.8.2, CAJ Kay-Uwe Kasemir

Navigate by Date:
Prev: Re: ???registrar(function_name) Liyu, Andrei
Next: IOC error message - CA online suspended George Lahti
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017 
Navigate by Thread:
Prev: RE: ???registrar(function_name) Liyu, Andrei
Next: Re: No more CA connections? R3.14.7, 3.14.8.2, CAJ Kay-Uwe Kasemir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·