Experimental Physics and Industrial Control System
|
Hi all,
This night, I had a problem with a CA gateway. The log file contained the
following error messages:
epicsMutex pthread_mutex_lock failed: error Invalid argument
Mar 02 00:04:46 !!! Errlog message received (message is above)
fatal error: epicsMutexOsdLock
Mar 02 00:04:46 !!! Errlog message received (message is above)
I found these messages not very helpful because they do not tell me where the
error occurred and why the argument is invalid. This makes it impossible to
debug the problem.
Unfortunately, this is often the case with EPICS errors. I observe three
problems with locations of errors:
1. The error message does not give any hint of the location at all.
2. The reported location is inside a (EPICS) system function or a macro, but the
caller who actually caused the error is not reported.
3. The reported location is inside the source code of a file parser but the
error location in the faulty file is not reported.
In this case, we have situation 2. The problem occurred in epicsMutexOsdLock(),
but this function is called at so many places in EPICS that it's impossible to
track down the origin of the fault.
When looking into epicsMutexOsdLock(), I found that the calling thread is
suspended via a call to cantProceed() when an error happens. In my opinion, this
is as well not very helpful because the calling function does not get any
indication of the failure and thus cannot print a proper error message.
In my opinion, cantProceed() does more harm than good as long as it is not able
to provide useful debug information (at least a stack trace) -- which is
probably hard to do in a portable way. I think, a call to abort() would be more
helpful as it produces a process dump (at least on Unix systems) which can be
analyzed.
Generally, error handling should not be part of the low-level functions. These
functions do not have any application knowledge and thus cannot decide if
suspending the thread, terminating the program, printing an error message or
continuing normally is the "correct" behavior. Instead, low-level functions
should only report errors (by or exceptions or return values -- I am not
religious in this matter) and leave the choice of the correct response to the
caller.
In this case, the gateway (maybe the CAS library) stopped accepting new clients
which made it effectively useless. On the other hand, it did not terminate. Thus
any mechanism to restart the gateway on failure does not work, too. I do not
consider this a "useful" behavior.
It may be a goal of the next codathlon to improve the error messages provided by
EPICS.
Dirk
--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182
- Replies:
- Re: Useless error messages in EPICS Martin L. Smith
- Re: Useless error messages in EPICS Andrew Johnson
- Navigate by Date:
- Prev:
RE: EPICS drivers or an EPICS interface to lantronix Embedded Deviceserver communicating over TCP/IP Mark Rivers
- Next:
Re: Useless error messages in EPICS Martin L. Smith
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Basic misunderstanding concerning record chaining in sub records Elliott Wolin
- Next:
Re: Useless error messages in EPICS Martin L. Smith
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|
ANJ, 31 Jan 2014 |
·
Home
·
News
·
About
·
Base
·
Modules
·
Extensions
·
Distributions
·
Download
·
·
Search
·
EPICS V4
·
IRMIS
·
Talk
·
Bugs
·
Documents
·
Links
·
Licensing
·
|