EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: caRepeater must run before casr
From: Eric Norum <[email protected]>
To: Dennis Nicklaus <[email protected]>
Cc: tech-talk tech-talk <[email protected]>, Andrew Johnson <[email protected]>
Date: Mon, 8 Jan 2007 13:14:58 -0600
Jeff's reply is correct, but doesn't really deal with the problem discovered by Dennis since it is stdout which is causing the problem and stdout does not have 'close-on-exec' set.

I propose that on startup caRepeater check stdin/stdout/stderr and if fstat(fileno(fp))reveals that the stream is a socket that the stream be closed and reopened to /dev/null. This precludes running caRepeater in a pipeline, but I'm not sure that's really that much of a problem.



On Jan 8, 2007, at 12:53 PM, Jeff Hill wrote:


We recently ran into a very puzzling problem here using the EPICS casr
(channel access save restore) tool. The problem showed up in one of two
ways after you push the casrSave or casrRestore buttons.

On UNIX systems the caRepeater process is spawned off using a call to the
fork function to create the new process followed by a call to the exec
function to force the new process to run the caRepeater executable.


The fork function does duplicate any open file descriptors into the new
process. To avoid problems EPICS base does the following.


O In R3.13 the CA client library closes all open files except stdin/ out/err.

O In almost all versions of R3.14, instead of closing open files, the "close
on exec flag" is set for all sockets created by a special socket creation
function in EPICS base. This is a less intrusive approach.


Jeff

-----Original Message-----
From: Dennis Nicklaus [mailto:[email protected]]
Sent: Wednesday, January 03, 2007 4:10 PM
To: [email protected]
Subject: caRepeater must run before casr

We recently ran into a very puzzling problem here using the EPICS casr
(channel access save restore) tool. The problem showed up in one of two
ways after you push the casrSave or casrRestore buttons.


Sometimes the Tcl/Tk casr interface would give an error dialog saying,
"error waiting for process to exit: child process lost (is SIGCHLD
ignored or trapped?)"
and other times it would just hang forever after you push
casrSave/casrRestore
without the error dialog (though the save/restore would be processed).


The short solution is that you must have caRepeater running before
running casr.

A brief summary of the gory details: when one presses the Tk casrSave
button, that causes tcl to
exec the casave program. casave in turn starts carepeater if carepeater
isn't already there.
carepeater, in trying to be a nice forked process, closes all its file
descriptors except
stdin, stdout ,and stderr. This is part of where the problem starts
because the pipe open between
the top-level wish (tcl) shell and the casave program gets dup-ed to
stdout of casave,
then when casave clones/forks off carepeater, the same stdout remains
open in carepeater.
Then when casave finishes, it's dead, but the higher level tcl is still
trying to read() on the pipe,
which is being held open by carepeater. This wouldn't be a problem if
the high level tcl shell
were getting a SIGCHLD from the casave process, but by sifting through
trace output,
we saw that the casave process was being started with the clone() system
call without
specifying SIGCHLD in the flags, and, as the clone() man page says, "If
no signal is specified, then the parent process is not signaled when
the child terminates." We don't know if this is a mistake in the
version of tcl we have or something with the version of linux and TLS we
happen to be running,
though it happens on multiple linux kernel versions we have.


YMMV widely depending on your verions of unix and tcl.

I'm not suggesting anything necessarily needs to change in casr or
caRepeater, just trying to point out a bizarre problem someone else may
bump into along the way.


Many thanks to Ron Rechenmacher who spent many hours puzzling over this
one.


Dennis




-- Eric Norum <[email protected]> Advanced Photon Source Argonne National Laboratory (630) 252-4793



Replies:
Re: caRepeater must run before casr Eric Norum
References:
RE: caRepeater must run before casr Jeff Hill

Navigate by Date:
Prev: RE: caRepeater must run before casr Jeff Hill
Next: RE: soft IOC string and array records Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: caRepeater must run before casr Jeff Hill
Next: Re: caRepeater must run before casr Eric Norum
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Nov 2011 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·