2002 2003 2004 2005 2006 2007 2008 <2009> 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 | Index | 2002 2003 2004 2005 2006 2007 2008 <2009> 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | EPICS and interruptable system calls |
From: | "Mark Rivers" <[email protected]> |
To: | "Eric Norum" <[email protected]>, "Andrew Johnson" <[email protected]> |
Cc: | Core-Talk <[email protected]> |
Date: | Fri, 30 Jan 2009 12:30:27 -0600 |
Eric and Andrew, According to this message from Prosilica they are simply using the the
new POSIX Timer API provided in librt.so. I’ve looked at the
documentation for this, and indeed the default for the timers is to use
SIGALRM, and indeed if one uses this library then all system calls need to be
restarted if they terminate with EINTR. I realize this is a big job, but it seems to me that allowing the use
of the POSIX Timer API within EPICS applications is actually a very good idea. I
have made the minor change to epicsThreadSleep in libCom/osi/os/posix/osdThread.c,
and it now works fine when the Prosilica driver is initialized and thus SIGALRM
signals are happening frequently. I am not sure how many other system
calls in EPICS base would need to be changed. I found, for example, that
disk I/O I am doing in the areaDetector application, using the netCDF library,
seems to work fine. They do not appear to be retrying if write() fails. The
documentation in the link below says that record-oriented I/O will never return
partial data and hence not need to be retried. If the epicsTimer were changed to use that API would it be possible to
get timer resolution better than, for example, the 0.01 second that Linux now
provides? Mark From: Arlin Kalenchuk
[mailto:[email protected]] Hello Mark, The last version we sent you should be
able to find the camera ... It is the version you should be using to avoid the
seg-fault issues you have encountered. We do not use any signal in the driver,
aside from SIGALRM which is use by the POSIX Timer API (librt.so). As a result
of this, you really should be protecting EVERY system call that can
be interrupted by a signal (usually stated in the call's documentation). This
can be done as shown in the version of epicsThreadSleep() we sent earlier, or
by using the macro TEMP_FAILURE_RETRY: http://www.gnu.org/software/libtool/manual/libc/Interrupted-Primitives.html The flag SA_RESTART is unfortunately not
respected by all system calls, if at all on Linux. I understand that rewriting your code
isn’t an attractive option. We'd be more than happy to get rid of the
POSIX Timer API we are using, however we aren’t aware of any suitable
substitute. If you know of something, we'd love to hear about it. ------------------------------------------------ Prosilica Inc. Tel: 604-875-8855 ext 124 Fax: 604-875-8856 www.prosilica.com From: Mark Rivers
[mailto:[email protected]] Dear Arlin, I have now done some more tests. I modified epicsThreadSleep()
as you suggested, so that it retries if it failed. My driver and other parts of EPICS are the same as for the
previous results I sent, i.e. I am waiting 5 seconds 3 times in my driver, and
I am not installing any signal handler for SIGALRM in EPICS channel access in
base. Here is the result with your new version of libPvAPI.a prosilicaConfig("PS1", 101271, 50,
200000000) 2009/01/29 13:37:18.846 Calling epicsThreadSleep for
5 seconds before calling PvInitialize() 2009/01/29 13:37:23.846 Return from epicsThreadSleep
for 5 seconds before calling PvInitialize() 2009/01/29 13:37:23.848 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() 2009/01/29 13:37:30.835 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() Called siginterrupt, status=0 2009/01/29 13:37:30.835 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() and siginterrupt() 2009/01/29 13:37:35.835 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() and siginterrupt() 2009/01/29 13:37:35.843 prosilica:connectCamera:
Cannot find camera 101271 prosilica:prosilica: cannot connect to camera
101271, manually connect when available. Note that the call to epicsThreadSleep now takes 5.0, 7.0 and
5.0 seconds the 3 times I call it. Before modifying epicsThreadSleep it
was only 1ms on the last 2 calls. However, this does not fix the problem of not finding the
camera. It is not finding the camera, which is available. I then simply relinked my application with the Oct. 3.
version of libPvAPI.a. It produces the following output: prosilicaConfig("PS1", 101271, 50,
200000000) 2009/01/29 13:45:02.143 Calling epicsThreadSleep for
5 seconds before calling PvInitialize() 2009/01/29 13:45:07.143 Return from epicsThreadSleep
for 5 seconds before calling PvInitialize() 2009/01/29 13:45:07.145 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() 2009/01/29 13:45:12.145 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() Called siginterrupt, status=0 2009/01/29 13:45:12.145 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() and siginterrupt() 2009/01/29 13:45:17.145 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() and siginterrupt() asynSetTraceIOMask("PS1",0,2) The old version of the driver works fine. Each call to
epicsThreadSleep waits for 5.0 seconds as it should. It finds the camera, and I
can control the camera fine. With this version of the driver I also ran my file saving plugin.
I was able to stream 1000 frames to disk at 50 frames/second, for a total of
348MB. So perhaps my worries about read() and write() function calls
being interrupted and not completing correctly are unfounded? Can you
tell me if this is true, do I need to worry about those calls failing because
of your drivers use of signals? Exactly what Linux system calls do I need
to retry if I use your driver in my application? The readline library still does not work correctly. It does
not output a newline between commands and command line recall does not work. Bottom line: I can make the old version of the library
work (with significant issues), but the new version of the library will not
find the camera. Thanks, Mark From: Mark Rivers I just did a Google search on readline and SIGALRM.
Readline does by default install a signal handler for SIGALRM: http://www.delorie.com/gnu/docs/readline/rlman_43.html 2.5
Readline Signal Handling
Signals
are asynchronous events sent to a process by the Unix kernel, sometimes on
behalf of another process. They are intended to indicate exceptional events,
like a user pressing the interrupt key on his terminal, or a network connection
being broken. There is a class of signals that can be sent to the process
currently reading input from the keyboard. Since Readline changes the terminal
attributes when it is called, it needs to perform special processing when such
a signal is received in order to restore the terminal to a sane state, or
provide application writers with functions to do so manually. Readline contains
an internal signal handler that is installed for a number of signals ( There is
an additional Readline signal handler, for Readline provides
two variables that allow application writers to control whether or not it will
catch certain signals and act on them when they are received. It is important
that applications change the values of these variables only when calling Variable: int rl_catch_signals If this variable is non-zero, Readline will install
signal handlers for The default value of Mark From: Mark Rivers Dear Arlin, I am making some real progress in getting the Prosilica
cameras running under Linux. I now have a configuration which allows me
to reliably start my application, connnect to the camera and read frames.
However, there are some side-effects that we really need to work out. The first problem is the one I mentioned in my previous
message: a call to PvInitialize appears to result in a call to
siginterrupt(SIGALRM, 1) or the equivalent. This means that all
“slow” system calls such as nanosleep(), read(), write(), etc. can
return prematurely and need to be retried. I really don’t think it
is an option to rewrite all of EPICS to look for all system calls that could be
interrupted and retry them. My first set of tests is with the old (Oct.3 2008) version
of the driver. I modified my driver temporarily to the following to do 3
things: 1) Do an
epicsThreadSleep(5) before I call PvInitialize(). It works as expected. 2) Do an
epicsThreadSleep(5) after calling PvInitialize(). It returns in 1.0
second, not 5.0. 3) Call
siginterrupt(SIGALRM, 0) to turn off interruptable system calls, and then call
epicsThreadSleep(5) again. To my surprise it still returns in 1.0 or 2.0
seconds, not 5.0 seconds. I don’t understand this. asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Calling epicsThreadSleep for 5 seconds before calling
PvInitialize()\n"); epicsThreadSleep(5.); asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Return from epicsThreadSleep for 5 seconds before calling
PvInitialize()\n"); if (!PvApiInitialized) { status =
PvInitialize(); if
(status) {
printf("%s:%s: ERROR: PvInitialize failed for camera %d,
status=%d\n",
driverName, functionName, uniqueId, status);
return; }
PvApiInitialized = 1; } asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Calling epicsThreadSleep for 5 seconds after calling
PvInitialize()\n"); epicsThreadSleep(5.); asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Return from epicsThreadSleep for 5 seconds after calling PvInitialize()\n"); #ifdef linux /* Try changing the siginterrupt
to not interrupt system calls */ status = siginterrupt(SIGALRM,
0); printf("Called siginterrupt,
status=%d\n", status); #endif /* It appears to be necessary to wait
a little for the PvAPI library to find the cameras */ asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Calling epicsThreadSleep for 5 seconds after calling PvInitialize() and
siginterrupt()\n"); epicsThreadSleep(5.); asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Return from epicsThreadSleep for 5 seconds after calling PvInitialize()
and siginterrupt()\n"); /* Try to connect to the
camera. * It is not a fatal error
if we cannot now, the camera may be off or owned by * someone else. It
may connect later. */ status = connectCamera(); Here is the output when I start my application: prosilicaConfig("PS1", 101271, 50,
200000000) 2009/01/29 12:27:22.357 Calling epicsThreadSleep for
5 seconds before calling PvInitialize() 2009/01/29 12:27:27.357 Return from epicsThreadSleep
for 5 seconds before calling PvInitialize() 2009/01/29 12:27:27.359 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() 2009/01/29 12:27:28.359 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() Called siginterrupt, status=0 2009/01/29 12:27:28.359 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() and siginterrupt() 2009/01/29 12:27:30.359 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() and siginterrupt() Note that the second 2 calls to epicsThreadSleep returned in
1.0 and 2.0 seconds respectively, not 5.0 seconds. The application connected to the camera and exchanged some
information, but still crashed with a segfault. The traceback showed that a signal handler in EPICS base was
being called, and then calling your driver function. I then modified EPICS base to remove the calls to install
our handler for SIGALRM (which simply calls any preexisting handler). I
was told that on Linux EPICS does not actually generate any SIGALRM
signals. I removed 2 calls to epicsSignalInstallSigAlarmIgnore() in the
EPICS channel access code. Once I did that my application runs OK, and
can control the camera, read frames, etc. I believe I may understand that problem. Once I
removed the calls to epicsSignalInstallSigAlarmIgnore() it appears that
the readline library is no longer working: it does not insert newlines
when I type Enter, and command line recall no longer works. I strongly suspect
that the Linux readline library is using SIGALRM, and somehow that was
interacting badly with your driver when had installed our own signal handler
for SIGALRM. Perhaps readline was generating a SIGALRM signal and then we
were calling your driver when you did not expect it? I then did the identical test but with my application linked
with the new version of libPvAPI.a that you sent yesterday. That version behaves very differently, as seen in this
output: 2009/01/29 12:59:08.761 Calling epicsThreadSleep for
5 seconds before calling PvInitialize() 2009/01/29 12:59:13.761 Return from epicsThreadSleep
for 5 seconds before calling PvInitialize() 2009/01/29 12:59:13.762 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() 2009/01/29 12:59:13.763 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() Called siginterrupt, status=0 2009/01/29 12:59:13.763 Calling epicsThreadSleep for
5 seconds after calling PvInitialize() and siginterrupt() 2009/01/29 12:59:13.763 Return from epicsThreadSleep
for 5 seconds after calling PvInitialize() and siginterrupt() 2009/01/29 12:59:13.763 prosilica:connectCamera:
Cannot find camera 101271 The first call to epicsThreadSleep(5) waits 5 seconds as
expected. But the next 2 calls to epicsThreadSleep(5) return in 1 msec or
less. This is not enough time for your library to find all the cameras,
so it fails to find the camera. It appears that the new version of your
library is generating SIGALRM signals over 1000 times per second because the nanosleep()
is always returning in 1msec or less. Is this true? So the bottom line is that I can make my application
“work” with the old version of the library. But it still has
serious problems because EPICS is not going to function well if system calls
like nanosleep(), read() and write() can be interrupted. Is it possible to re-write your driver so that it does not
call siginterrupt(SIGALRM, 1)? We can help provide code for timers or
other functions if that is what is needed. Thanks, Mark From: Mark Rivers Hi Arlin, The problem is that EPICS has hundreds of calls to system
functions that can be interrupted by signals such as SIGALRM, including read(),
write(), etc. It is simply not possible to change EPICS to be compatible
with a call to: siginterrupt(SIGARLM, 1); That would require way too much modification to the tens of
thousands of lines of code in EPICS. Does your driver use a call to siginterrupt such as that
above? I have done some experimenting with adding a call to
siginterrupt(SIGALRM, 0) after the call to PvInitialize(). However, this
does not seem to fix the problem, it still seems to interrupt calls to
nanosleep(). Do you repeatedly call siginterrupt()? I have actually made a change to EPICS to remove our calls
to catch SIGALRM signals. That actually fixes the seg faults we were
getting with the old version of the driver, and I can actually reliably collect
images. However, we don’t have an acceptable solution until we
resolve the problem with interruptable system calls since calls to nanosleep
will return too soon, calls to write() or read() may return without completing
properly, etc. I will send a more detailed message in a few minutes that
explains what I have done and how it works better now. I will also make
tests with both your old library and the new version you sent yesterday. Thanks, Mark From: The POSIX function nanosleep() can be
interrupted by a signal (such as SIGALRM which is used by the Linux timer API).
When this occurs, the function will return -1 and errno will be set to EINTR.
The epicsThreadSleep() function could be modified as follow: epicsShareFunc
void epicsShareAPI epicsThreadSleep(double seconds) {
struct timespec delayTime;
struct timespec remainingTime;
double nanoseconds;
delayTime.tv_sec = (time_t)seconds;
nanoseconds = (seconds - (double)delayTime.tv_sec) *1e9;
delayTime.tv_nsec = (long)nanoseconds; while(nanosleep(&delayTime,&remainingTime)
== -1)
delayTime = remainingTime; } ------------------------------------------------ Prosilica Inc. Tel: 604-875-8855 ext 124 Fax: 604-875-8856 www.prosilica.com From: Mark Rivers
[mailto:[email protected]] Hi Arlin, Thanks for the quick response. I untarred that file and linked my application with the following
library from it. -rw-r--r-- 1 epics epics 2300264 Jan 28 16:14
PvAPI1.19.3/lib-pc/x86/4.2/libPvAPI.a The behavior is different, it does not consistently crash. But it
still does not work correctly. It is not finding the camera, because the
epicsThreadSleep function we are using is not working correctly when linked
with your library. My code does the following. /* Initialize the Prosilica PvAPI
library * We get an error if we call this
twice, so we need a global flag to see if * it's already been done.*/ if (!PvApiInitialized) { status =
PvInitialize(); if
(status) {
printf("%s:%s: ERROR: PvInitialize failed for camera %d,
status=%d\n",
driverName, functionName, uniqueId, status);
return; }
PvApiInitialized = 1; } /* It appears to be necessary to
wait a little for the PvAPI library to find the cameras */ asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Sleeping for 20 seconds\n"); epicsThreadSleep(20.0); asynPrint(this->pasynUserSelf, ASYN_TRACE_ERROR,
"Done sleeping for 20 seconds\n"); /* Try to connect to the camera. * It is not a fatal error
if we cannot now, the camera may be off or owned by * someone else. It
may connect later. */ status = connectCamera(); In the normal version of this code the call to epicsThreadSleep() is for
0.2 seconds (not 20 seconds), and it does not print the messages. That
0.2 second delay is always sufficient (on Windows) to allow your code to have
found the cameras. I changed it to 20 seconds just to prove that it is not sleeping at
all! My code now prints out the message “Sleeping for 20
seconds”, but in fact it does not sleep it all, it immediately goes to
the next statement and calls connectCamera(), which fails. There must be something that your library is doing which causes the
epicsThreadSleep() function to return prematurely under Linux. Here is the output when I run my program. As shown by the
timestamps on these messages, there is only 6 milliseconds between the message
“Sleeping for 20 seconds” and the “Done sleeping for 20
second seconds” message, so the epicsThreadSleep command is returning
immediately and connectCamera() is failing. prosilicaConfig("PS1", 101271, 50,
200000000) 2009/01/28 18:00:07.462 Sleeping for 20 seconds 2009/01/28 18:00:07.468 Done sleeping for 20 seconds 2009/01/28 18:00:09.463 prosilica:connectCamera:
Cannot find camera 101271 When I execute this modified version of my driver on Windows it works
fine, and I get the following. It sleeps for 20 seconds, and does not get
an error trying to connect to the camera. prosilicaConfig("PS1", 101271, 50,
200000000) 2009/01/28 18:04:45.640 Sleeping for 20 seconds 2009/01/28 18:05:05.641 Done sleeping for 20 seconds asynSetTraceIOMask("PS1",0,2) On Posix platforms (e.g. Linux) EPICS uses the following code to
implement epicsThreadSleep() epicsShareFunc void epicsShareAPI
epicsThreadSleep(double seconds) { struct timespec delayTime; struct timespec remainingTime; double nanoseconds; delayTime.tv_sec =
(time_t)seconds; nanoseconds = (seconds - (double)delayTime.tv_sec)
*1e9; delayTime.tv_nsec =
(long)nanoseconds;
nanosleep(&delayTime,&remainingTime); } So it looks like nanosleep is returning before the time has elapsed. I then reverted back to the previous version of your library. When
I run that I get the following: prosilicaConfig("PS1", 101271, 50,
200000000) 2009/01/28 17:55:30.547 Sleeping for 20 seconds 2009/01/28 17:55:31.547 Done sleeping for 20 seconds asynSetTraceIOMask("PS1",0,2) So it does not wait for 20 seconds, but it does wait for 1.0 seconds
and it does find the camera. The fact that it only sleeps for 1 second
instead of 20 is obviously a problem, but at least it sleeps long enough to
allow your code time to find the cameras. But then it crashes later, after
some communication with the camera, with the seg fault like I sent you
previously. Here is the backtrace from gdb: (gdb) bt #0 0x080f6753 in SIGAction () #1 0x0823d61d in ignoreSigAlarm (signal=14) at
../../../src/libCom/osi/os/posix/osdSignal.cpp:109 #2 <signal handler called> #3 0x00110416 in __kernel_vsyscall () #4 0x002228ff in sigprocmask () from
/lib/libc.so.6 #5 0x05ae08f7 in ?? () from
/lib/libreadline.so.5 #6 <signal handler called> #7 0x00110416 in __kernel_vsyscall () #8 0x002ce671 in select () from /lib/libc.so.6 #9 0x080f6b3e in pPvMultiplexer::Body () #10 0x080fa720 in _ThreadFunction () #11 0x003b932f in start_thread () from
/lib/libpthread.so.0 #12 0x002d620e in clone () from /lib/libc.so.6 Mark From: Hello Mark, Can you try running the attached
engineering version of our latest driver? It should fix the issue you’re
having. Let us know the results. Thanks. ------------------------------------------------ Prosilica Inc. Tel: 604-875-8855 ext 124 Fax: 604-875-8856 www.prosilica.com From: Mark Rivers [mailto:[email protected]]
Dear Arlin, I have not resolved the issue, it is still outstanding. The customer you mention may be John Skinner from Brookhaven
National Laboratory? He got a Prosilica camera on trial, and hoped to use
it with my EPICS software. Unfortunately he had the same trouble I did,
it would crash the program after initial successful communication with the
camera. This really looks like a problem with Linux signals being used
both by EPICS and by your driver. There are a lot of potential Prosilica users who want to run
my EPICS software under Linux. My software works fine on Windows, but
crashes on Linux. My code is identical in terms of how it communicates
with your driver in both cases. Thanks, Mark From: Hello Mark, A customer recently contacted us, who apparently is running
into the same problems you had running our GigE series cameras with EPICS. Paul
Kozik forwarded me your email correspondence, but it seems to me nobody got
back to you after a Dec 18th email: “Paul, While most of the time the backtrace is the one I
sent previously, ocassionally it is as follows: (gdb) bt #0 0x0808ce33 in SIGAction () #1 0x0821bf4d in ignoreSigAlarm (signal=14) at ../../../src/libCom/osi/os/posix/osdSignal.cpp:109 #2 <signal handler called> #3 0x00110416 in __kernel_vsyscall () … “ Have you managed to resolve this issue, or is it
still outstanding? Thanks, ------------------------------------------------ Prosilica Inc. Tel: 604-875-8855 ext 124 Fax: 604-875-8856 www.prosilica.com |