Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017 Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
<== Date ==> <== Thread ==>

Subject: Re: Stalled CA connection (IOC to CS-Studio archiver)
From: "Kasemir, Kay" <kasemirk@ornl.gov>
To: Ralph Lange <ralph.lange@gmx.de>, EPICS Core Talk <core-talk@aps.anl.gov>
Date: Thu, 15 Jun 2017 15:29:18 +0000

Hi:


No real clue.


On the archive engine VM, you would issue


  kill -QUIT {PID of the java process}


which causes Java to dump a stack trace of all threads to its console, including locks that each thread has taken or is trying to take.

(You can also use "jps" to list all java processes, then "jstack {PID}" to fetch a stack trace.)


Maybe do that again 5 minutes later and compare to see if there's one thread that's blocked by a lock, or stuck in some function call and not progressing for some other reason.


-Kay




From: core-talk-bounces@aps.anl.gov <core-talk-bounces@aps.anl.gov> on behalf of Ralph Lange <ralph.lange@gmx.de>
Sent: Thursday, June 15, 2017 5:37 AM
To: EPICS Core Talk
Subject: Stalled CA connection (IOC to CS-Studio archiver)
 
Hi all,

We have an ongoing issue in a test setup that includes a Linux "Fast Controller" (IP...37) running IOCs (40k records each) on one end and a CS-Studio BEAUTY archiver on a VM (IP...41) on the other end. IOCs are running Base 3.15.5, BEAUTY uses a current JCA/CAJ client.

The CA TCP connection is up, but blocked in both directions:

On the fast controller (...37) , netstat shows

tcp        0      0 IP...37:5064   0.0.0.0:*      LISTEN      29499/MAG-CYSI
tcp    86888 178656 IP...37:5064   IP...41:40147  ESTABLISHED 29499/MAG-CYSI

On the archiver VM (...41), we see

tcp   495144  70184 IP...41:40147  IP...37:5064   ESTABLISHED 9164/java
tcp        0      0 IP...41:40691  IP...49:5064   ESTABLISHED 9164/java

tcpdump shows no traffic on that connection.

The archive engine logs things like:

2017-06-12 22:17:53.047 WARNING [Thread 30] com.cosylab.epics.caj.impl.CATransport (noSyncSend) - Failed to send message to /IP...37:5064 - buffer full, will retry.

and has not written data to the archive from this IOC for a long time. It is happily archiving data from other connections (e.g. the one shown in line 2 of the netstat output above).

Obviously the TCP connection is blocked and backed up to the other host in both directions.

The IOC is alive and casr shows all channels as connected.

Why are both sides not taking data out of their receive-Qs?

In this test setup, this is not happening to us for the first time. Has anyone seen such situations before? Any ideas for how to proceed trying to find out what's happening?

Thanks a lot
~Ralph


Replies:
Re: Stalled CA connection (IOC to CS-Studio archiver) Michael Davidsaver
References:
Stalled CA connection (IOC to CS-Studio archiver) Ralph Lange

Navigate by Date:
Prev: Stalled CA connection (IOC to CS-Studio archiver) Ralph Lange
Next: Re: Stalled CA connection (IOC to CS-Studio archiver) Michael Davidsaver
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
Navigate by Thread:
Prev: Stalled CA connection (IOC to CS-Studio archiver) Ralph Lange
Next: Re: Stalled CA connection (IOC to CS-Studio archiver) Michael Davidsaver
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
ANJ, 15 Jun 2017 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·