g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014 
<== Date ==> <== Thread ==>

Subject: caRepeater must run before casr
From: Dennis Nicklaus <nicklaus@fnal.gov>
To: tech-talk@aps.anl.gov
Date: Wed, 03 Jan 2007 17:09:58 -0600
We recently ran into a very puzzling problem here using the EPICS casr (channel access save restore) tool. The problem showed up in one of two ways after you push the casrSave or casrRestore buttons.

Sometimes the Tcl/Tk casr interface would give an error dialog saying,
"error waiting for process to exit: child process lost (is SIGCHLD ignored or trapped?)"
and other times it would just hang forever after you push casrSave/casrRestore
without the error dialog (though the save/restore would be processed).


The short solution is that you must have caRepeater running before running casr.

A brief summary of the gory details: when one presses the Tk casrSave button, that causes tcl to
exec the casave program. casave in turn starts carepeater if carepeater isn't already there.
carepeater, in trying to be a nice forked process, closes all its file descriptors except
stdin, stdout ,and stderr. This is part of where the problem starts because the pipe open between
the top-level wish (tcl) shell and the casave program gets dup-ed to stdout of casave,
then when casave clones/forks off carepeater, the same stdout remains open in carepeater.
Then when casave finishes, it's dead, but the higher level tcl is still trying to read() on the pipe,
which is being held open by carepeater. This wouldn't be a problem if the high level tcl shell
were getting a SIGCHLD from the casave process, but by sifting through trace output,
we saw that the casave process was being started with the clone() system call without
specifying SIGCHLD in the flags, and, as the clone() man page says, "If no signal is specified, then the parent process is not signaled when the child terminates." We don't know if this is a mistake in the version of tcl we have or something with the version of linux and TLS we happen to be running,
though it happens on multiple linux kernel versions we have.


YMMV widely depending on your verions of unix and tcl.

I'm not suggesting anything necessarily needs to change in casr or caRepeater, just trying to point out a bizarre problem someone else may bump into along the way.

Many thanks to Ron Rechenmacher who spent many hours puzzling over this one.

Dennis




Replies:
RE: caRepeater must run before casr Jeff Hill

Navigate by Date:
Prev: RE: C function in when() clause of sequencer Laznovsky, Michael
Next: Re: I have a question about using muti IOC Ralph Lange
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014 
Navigate by Thread:
Prev: RE: C function in when() clause of sequencer Laznovsky, Michael
Next: RE: caRepeater must run before casr Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014 
ANJ, 10 Nov 2011 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·