We have an ongoing issue in a test setup that includes a Linux "Fast Controller" (IP...37) running IOCs (40k records each) on one end and a CS-Studio BEAUTY archiver on a VM (IP...41) on the other end. IOCs are running Base 3.15.5, BEAUTY uses a current
JCA/CAJ client.
tcp 0 0 IP...37:5064 0.0.0.0:* LISTEN 29499/MAG-CYSI
tcp 86888 178656 IP...37:5064 IP...41:40147 ESTABLISHED 29499/MAG-CYSI
On the archiver VM (...41), we see
tcp 495144 70184 IP...41:40147 IP...37:5064 ESTABLISHED 9164/java
tcp 0 0 IP...41:40691 IP...49:5064 ESTABLISHED 9164/java
tcpdump shows no traffic on that connection.
The archive engine logs things like:
2017-06-12 22:17:53.047 WARNING [Thread 30] com.cosylab.epics.caj.impl.CATransport (noSyncSend) - Failed to send message to /IP...37:5064 - buffer full, will retry.
and has not written data to the archive from this IOC for a long time. It is happily archiving data from other connections (e.g. the one shown in line 2 of the netstat output above).
Obviously the TCP connection is blocked and backed up to the other host in both directions.
The IOC is alive and
casr shows all channels as connected.
Why are both sides not taking data out of their receive-Qs?
In this test setup, this is not happening to us for the first time. Has anyone seen such situations before? Any ideas for how to proceed trying to find out what's happening?
Thanks a lot
~Ralph