EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: EPICS CA problems
From: "Jeff Hill" <[email protected]>
To: "'Mark Rivers'" <[email protected]>
Cc: "'Antonio Lanzirotti'" <[email protected]>, "'tech-talk'" <[email protected]>
Date: Wed, 17 Nov 2010 15:39:27 -0700
Title: RE: EPICS CA problems

Mark,

 

Ø  I wonder if a very simple socket server program and a simple socket client

Ø  could reproduce this problem?  Then there is some hope of the Cygwin developers

Ø  being able to reproduce it and fix it.

 

Considering this further, what might be different with CA compared to other cygwin socket codes?

 

o maybe send and receiving simultaneously on the same socket from two different threads

o maybe the very large contiguous buffer sizes used currently when configuring EPICS_CA_MAX_ARRAY_SIZE

 

I will place my bet on the 2nd one.

 

Jeff

______________________________________________________
Jeffrey O. Hill           Email       
[email protected]
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

 

Message content: TSPA

 

With sufficient thrust, pigs fly just fine. However, this is

not necessarily a good idea. It is hard to be sure where they

are going to land, and it could be dangerous sitting under them

as they fly overhead. -- RFC 1925

 

From: Mark Rivers [mailto:[email protected]]
Sent: Wednesday, November 17, 2010 9:21 AM
To: Mark Rivers; tech-talk
Cc: Jeff Hill; Antonio Lanzirotti
Subject: RE: EPICS CA problems

 

Folks,

 

Just a quick follow-up on this:

 

> We have now observed the failure on the following clients: IDL, Python, medm, and

> (almost definitely) the EPICS sscan record running in another IOC. 

 

I have now determined that indeed the sscan record client running on a vxWorks VME IOC was hanging up for the same reason when getting data from the Cygwin IOC.  The CAS-client thread in the Cygwin IOC that was connected to the VME IOC client was hung> It was waiting for the Send Lock mutex that a CAS-event thread had taken and never released.  This is almost certainly because the CAS-event thread called send(), which never returned.

 

I wonder if a very simple socket server program and a simple socket client could reproduce this problem?  Then there is some hope of the Cygwin developers being able to reproduce it and fix it.

 

Mark

 

 


From: Mark Rivers
Sent: Saturday, November 13, 2010 4:32 PM
To: 'tech-talk'
Cc: Jeff Hill; Antonio Lanzirotti
Subject: RE: EPICS CA problems

 

Folks,

 

Thanks to help from Jeff Hill, the problem with Channel Access clients losing connection to Cygwin IOCs has been tracked down.  It really appears to be a problem with Cygwin itself.

 

We have now observed the failure on the following clients: IDL, Python, medm, and (almost definitely) the EPICS sscan record running in another IOC.  The failure appears to only happen if there are channel access monitors on the arrays in the MCA records in the IOC.  Most of the failures I observed were when the client had monitors on 16 arrays, each 8KB in size.  These were updating frequently, between 1 and 10 Hz.  The failure could take as long as several hours to occur.

 

I put print statements in the cas_send_bs_msg function in rsrv/caserverio.c before and after the call to send().  send() is the function that sends the CA data over the socket to the client.  When the failure happens send() simply never returns.

 

This could happen if the client had failed, and was not reading the socket.  However, since it fails on so many commonly used clients, this is almost certainly not the case.

 

The problem really appears to be a bug in the Cygwin socket call.

 

This is really unfortunate, because it means that Cygwin cannot be used as a reliable platform for an IOC until this problem is solved.  Cygwin has the following advantages over the win32-x86 architecture on Windows:

 

- The gcc compiler is free

- It supports termios, which means asyn can work on local serial ports.  win32-x86 cannot.

- It supports xdr and rpc.  This is required for VXI-11 support in asyn.  It also required for the saveData utility in synApps that saves data from the sscan record directly to disk.

 

If anyone has any ideas on how to proceed on getting this problem fixed I'd love to hear it.

 

I think it is possible that this problem is new to Cygwin 1.7.x, since it has not been previously reported in Cygwin 1.5.x, but perhaps we just never stressed the systems in the same way with the older version of Cygwin.

 

Cheers,

Mark

 


Replies:
RE: EPICS CA problems Mark Rivers
RE: EPICS CA problems Mark Rivers
References:
RE: EPICS CA problems Mark Rivers
RE: EPICS CA problems Mark Rivers

Navigate by Date:
Prev: Re: EPICS Suggestions and Priorities Andrew Johnson
Next: RE: EPICS CA problems Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: EPICS CA problems Mark Rivers
Next: RE: EPICS CA problems Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 19 Nov 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·