EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Useless error messages in EPICS
From: Dirk Zimoch <[email protected]>
To: "Martin L. Smith" <[email protected]>, Jeff Hill <[email protected]>
Cc: tech-talk <[email protected]>
Date: Mon, 02 Mar 2009 16:47:41 +0100
Hi Martin,

Thanks for the hint.

I still use R3.14.8.2. The code looks a bit different:

    if ( status != S_cas_success ) {
        this->chanTable.remove ( *pChan->pChanI );
        this->chanList.remove ( *pChan->pChanI );
        pChan->pChanI->uninstallFromPV ( this->eventSys );
		pChan->getPV()->pPVI->deleteSignal ();
        delete pChan->pChanI;
    }

    return status;

I can try:
- pChan->getPV()->pPVI->deleteSignal ();

Jeff, do you think that will do?

Dirk


Martin L. Smith wrote:
Hi Dirk,

I ran into this same problem back in January and it turns out that this
is Mantis entry 329. Jeff Hill made a fix for this; from a previous
email that I still have:

diff -u -b -w -b -r1.92.2.16 casStrmClient.cc
--- casStrmClient.cc 21 Oct 2008 20:50:26 -0000 1.92.2.16
+++ casStrmClient.cc 8 Jan 2009 22:13:09 -0000
@@ -1413,9 +1413,7 @@
this->chanTable.remove ( *pChan->pChanI );
this->chanList.remove ( *pChan->pChanI );
pChan->pChanI->uninstallFromPV ( this->eventSys );
- casPVI * pPVI = pChan->getPV()->pPVI;
delete pChan->pChanI;
- pPVI->deleteSignal ();
}

return status;

After Janet Anderson rebuilt the GW using this fix, which should be
committed to the repository, I have not had this problem since.

Marty

Dirk Zimoch wrote:
Hi all,

This night, I had a problem with a CA gateway. The log file contained the following error messages:

epicsMutex pthread_mutex_lock failed: error Invalid argument

Mar 02 00:04:46 !!! Errlog message received (message is above)
fatal error: epicsMutexOsdLock

Mar 02 00:04:46 !!! Errlog message received (message is above)

I found these messages not very helpful because they do not tell me where the error occurred and why the argument is invalid. This makes it impossible to debug the problem.

Unfortunately, this is often the case with EPICS errors. I observe three problems with locations of errors:

1. The error message does not give any hint of the location at all.

2. The reported location is inside a (EPICS) system function or a macro, but the caller who actually caused the error is not reported.

3. The reported location is inside the source code of a file parser but the error location in the faulty file is not reported.

In this case, we have situation 2. The problem occurred in epicsMutexOsdLock(), but this function is called at so many places in EPICS that it's impossible to track down the origin of the fault.

When looking into epicsMutexOsdLock(), I found that the calling thread is suspended via a call to cantProceed() when an error happens. In my opinion, this is as well not very helpful because the calling function does not get any indication of the failure and thus cannot print a proper error message.

In my opinion, cantProceed() does more harm than good as long as it is not able to provide useful debug information (at least a stack trace) -- which is probably hard to do in a portable way. I think, a call to abort() would be more helpful as it produces a process dump (at least on Unix systems) which can be analyzed.

Generally, error handling should not be part of the low-level functions. These functions do not have any application knowledge and thus cannot decide if suspending the thread, terminating the program, printing an error message or continuing normally is the "correct" behavior. Instead, low-level functions should only report errors (by or exceptions or return values -- I am not religious in this matter) and leave the choice of the correct response to the caller.

In this case, the gateway (maybe the CAS library) stopped accepting new clients which made it effectively useless. On the other hand, it did not terminate. Thus any mechanism to restart the gateway on failure does not work, too. I do not consider this a "useful" behavior.

It may be a goal of the next codathlon to improve the error messages provided by EPICS.

Dirk





--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182

Replies:
RE: Useless error messages in EPICS Jeff Hill
References:
Useless error messages in EPICS Dirk Zimoch
Re: Useless error messages in EPICS Martin L. Smith

Navigate by Date:
Prev: Re: Useless error messages in EPICS Bob Soliday
Next: Uncovered gold??? - "Channel Access Client Library Tutorial, R3.13" John Hammonds
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Useless error messages in EPICS Bob Soliday
Next: RE: Useless error messages in EPICS Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 31 Jan 2014 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·