EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: soft ioc runs into fatal exception (base-3.14.12)
From: Carsten Winkler <[email protected]>
To: Jeff Hill <[email protected]>, <[email protected]>
Date: Mon, 14 Feb 2011 12:46:05 +0100
Hello Jeff,
after manually patching osdThread.c (patchfile has corrupt line numbers :-( ) "softIoc.exe" works well.
But now "caput.exe" runs sometimes into exception! This exception is very hard to catch (1% of calls
in my case).
The exception occurs in following function: "src/ca/tcpiiu.cpp":

bool tcpiiu :: connectNotify (
    epicsGuard < epicsMutex > & guard, nciu & chan )
{
...
    if ( chan.channelNode::listMember == channelNode::cs_createRespPend ) {
        this->createRespPend.remove ( chan );
        this->subscripReqPend.add ( chan ); <==== exception occurs here!
...

"this->subscripReqPend.add ( chan );"  is dereferencing by "include/tsDLList.h"

template <class T>
inline void tsDLList<T>::remove ( T &item )
{
    tsDLNode<T> &theNode = item;

    if ( this->pLast == &item ) {
        this->pLast = theNode.pPrev;
    }
    else {
        tsDLNode<T> &nextNode = *theNode.pNext;
        nextNode.pPrev = theNode.pPrev; <====== sometimes "nextNode.pPrev" is NULL here!!!
...

Memory map:
-        nextNode    {pNext=??? pPrev=??? }    tsDLNode<nciu> &
        pNext    CXX0030: Fehler: Ausdruck kann nicht ausgewertet werden
        pPrev    CXX0030: Fehler: Ausdruck kann nicht ausgewertet werden
+        this    0x003db1b4 {pFirst=0x00000000 pLast=0x00000000 itemCount=0 }    tsDLList<nciu> * const
-        item    {eventq={...} accessRightState={...} cacCtx={...} ...}    nciu &
+        cacChannel    {priorityMax=99 priorityMin=0 priorityDefault=0 ...}    cacChannel
+        chronIntIdRes<nciu>    {...}    chronIntIdRes<nciu>
+        channelNode    {listMember=cs_createRespPend }    channelNode
+        privateInterfaceForIO    {...}    privateInterfaceForIO
+        eventq    {pFirst=0x00000000 pLast=0x00000000 itemCount=0 }    tsDLList<baseNMIU>
+        accessRightState    {f_readPermit=true f_writePermit=true
f_operatorConfirmationRequest=false }    caAccessRights
+        cacCtx    {_refLocalHostName={...} chanTable={...} ioTable={...} ...}    cac &
+        pNameStr    0x003dafd8 "demoHost:double1"    char *
+        piiu    0x100543ec class noopiiu noopIIU    netiiu *
        sid    4294967295    unsigned int
        count    0    unsigned int
        retry    0    unsigned int
        nameLength    17    unsigned short
        typeCode    65535    unsigned short
        priority    0    unsigned char
-        theNode    {pNext=0x00000000 pPrev=0x00000000 }    tsDLNode<nciu> &
+        pNext    0x00000000 {eventq={...} accessRightState={...} cacCtx=??? ...}    nciu *
+        pPrev    0x00000000 {eventq={...} accessRightState={...} cacCtx=??? ...}    nciu *

I checked this at several machines and I don't know what to do next.

Hope you can help me
Carsten
Hello Carsten,

I committed, and pushed, a fix to the R3.14 branch at launch-pad. Please
test to see if the relevant changes address your issue.

I created lauchpad bug 717252. The details, including diffs, are in the bug
report.

https://bugs.launchpad.net/epics-base/+bug/717252

Thanks for your detailed bug report,

Jeff
______________________________________________________
Jeffrey O. Hill           Email        [email protected]
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

Message content: TSPA

With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead. -- RFC 1925


-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Carsten Winkler
Sent: Friday, February 11, 2011 1:01 AM
To: [email protected]
Subject: soft ioc runs into fatal exception (base-3.14.12)

Problem:             softIoc.exe runs into fatal exception after caput
call from local host.
                               When running the test, there's a 40% chance
for this exception to occur.
                               Waiting between starting the softIoc and
caput, and/or waiting between
caput calls does not change the behavior.

Error message:  "softIoc has encountered a problem and needs to close. We
are sorry for the
inconvience."

Error details:      AppName: softioc.exe; AppVer:0.0.0.0; ModName:
com.dll; ModVer: 3.14.12.0;
Offset: 0000a613

System:              EPICS 3.14.12 plus all published patches (8 Feb.
2011), no local changes
(compiled with MVS 2010 professional - without any error)

Host:                   Windows XP SP3 @ Pentium 4 with 3.2GHz and 3GB RAM
                                                     AND
                              Windows XP SP3 @ VMWARE 3.5.0
                                                     AND
                               Windows XP SP3 @ VIRTUALBOX 3.2.12
                                                     AND
                               Windows XP SP3 @ KVM

Test setup:         The following configuration files have been used:
      startDemo.bat:
                                  set EPICS_CA_ADDR_LIST=localhost
                                  set EPICS_CA_AUTO_ADDR_LIST=NO
                                  start DemoIOC.cmd
                                  caput demoHost:double1 3.141
                                  caput demoHost:double2 2.718
                                  caput demoHost:long 12345
      DemoIOC.cmd:
                                  softIoc.exe -D dbd\softIoc.dbd -d
db\demo.db
      demo.db:
                                  record(ao, "demoHost:double1")
                                  {
                                      field(DESC, "Double output with range
infos")
                                      field(EGU, "mm")
                                      field(HOPR, "10")
                                      field(LOPR, "0")
                                      field(HIHI, "8")
                                      field(HIGH, "6")
                                      field(LOW, "4")
                                      field(LOLO, "2")
                                      field(HHSV, "MAJOR")
                                      field(HSV, "MINOR")
                                      field(LSV, "MINOR")
                                      field(LLSV, "MAJOR")
                                   }
                                  record(ao, "demoHost:double2")
                                  {
                                      field(DESC, "Double output without
range infos")
                                      field(EGU, "nm")
                                  }
                                  record(longout, "demoHost:long")
                                  {
                                      field(DESC, "Long output without
range infos")
                                      field(EGU, "m")
                                  }

call stack:
Com.dll!ellDelete(ELLLIST * pList=0x00a22ed0, ELLNODE * pNode=0x00cf0d08)
Zeile 82 + 0xb Bytes    C
Com.dll!epicsParmCleanupWIN32(epicsThreadOSD * pParm=0x00cf0d08)  Zeile
246 + 0x10 Bytes    C
Com.dll!epicsWin32ThreadEntry(void * lpParameter=0x00cf0d08)  Zeile 516 +
0x9 Bytes    C
msvcr100d.dll!_callthreadstartex()  Zeile 314 + 0xf Bytes    C
msvcr100d.dll!_threadstartex(void * ptd=0x00cf1500)  Zeile 297    C
kernel32.dll!_BaseThreadStart@8()  + 0x37 Bytes

exception occurred here:
void ellDelete (ELLLIST *pList, ELLNODE *pNode)
{
      if (pList->node.previous == pNode)
          pList->node.previous = pNode->previous;
      else
          pNode->next->previous = pNode->previous;<== "pNode->next" is a
NULL pointer in error case!
(s. memory map)

      if (pList->node.next == pNode)
          pList->node.next = pNode->next;
      else
          pNode->previous->next = pNode->next;

      pList->count--;

      return;
}
This function was called from "static void epicsParmCleanupWIN32 (
win32ThreadParam * pParm )" of
osdThread.c

memory maps:
- pList    0x00a22ed0 {node={...} count=23 }    ELLLIST *
      - node    {next=0x00a24728 previous=0x00cf0c60 }    ELLNODE
          - next    0x00a24728 {next=0x00a26340 previous=0x00000000 }
ELLNODE *
              - next    0x00a26340 {next=0x00adbd90 previous=0x00a24728 }
ELLNODE *
[... 20 thread parameter blocks in this list  without address 0x00cf0d08]
                     - next    0x00cf0c60 {next=0x00000000
previous=0x00cf0de0 }    ELLNODE *
                         - next    0x00000000 {next=??? previous=??? }
ELLNODE *
                             next    CXX0030: Fehler: Ausdruck kann nicht
ausgewertet werden
                             previous    CXX0030: Fehler: Ausdruck kann
nicht ausgewertet werden
          + previous    0x00cf0de0 {next=0x00cf0c60 previous=0x00cf0b40 }
ELLNODE *
              + previous    0x00cf0b40 {next=0x00cf0de0 previous=0x00af1ac0
}    ELLNODE *
[... 20 thread parameter blocks in this list without address 0x00cf0d08]
      count    23    int

- pNode    0x00cf0d08 {next=0x00000000 previous=0x00000000 }    ELLNODE *
      + next    0x00000000 {next=??? previous=??? }    ELLNODE *
          next    CXX0030: Fehler: Ausdruck kann nicht ausgewertet werden
          previous    CXX0030: Fehler: Ausdruck kann nicht ausgewertet
werden
      + previous    0x00000000 {next=??? previous=??? }    ELLNODE *
          next    CXX0030: Fehler: Ausdruck kann nicht ausgewertet werden
          previous    CXX0030: Fehler: Ausdruck kann nicht ausgewertet
werden

0x00CF0C80  00 00 00 00 43 41 53 2d 65 76 65 6e 74 00 fd fd fd fd dd dd dd
dd dd dd 09 00 0c 00 a1
01 0c 02 a0 12  ....CAS-event.ýýýýÝÝÝÝÝÝ....¡... .
0x00CF0CA2  cf 00 e8 0c cf 00 00 00 00 00 00 00 00 00 18 00 00 00 01 00 00
00 fd 26 00 00 fd fd fd
fd 00 00 00 00  Ï.è.Ï.................ý&..ýýýý....

This looks like there is a situation (with a 40% chance), in which the
thread parameter block of a
CAS-event task is not added to the thread list when the client connects.
When the clients
disconnects and the thread gets shut down, it tries to remove its
parameter block from the thread
list by calling ellDelete() with a node that is not in the list.
ellDelete() behaves fragile and
crashes the softIoc (by dereferencing a null pointer). This exception
seems to occur only when caput
has been called from local host.

Has anyone seen this behavior?

________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
Forschungszentren e.V

Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv.
Vorsitzende Dr. Beatrix Vierkorn- Rudolph
Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Prof. Dr. Dr. h.c.
Wolfgang Eberhardt, Dr. Ulrich Breuer

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de


________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V

Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv. Vorsitzende Dr. Beatrix Vierkorn- Rudolph
Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Prof. Dr. Dr. h.c. Wolfgang Eberhardt, Dr. Ulrich Breuer

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de


References:
soft ioc runs into fatal exception (base-3.14.12) Carsten Winkler
RE: soft ioc runs into fatal exception (base-3.14.12) Jeff Hill

Navigate by Date:
Prev: Error vxi11 with Lecroy 760Zi DENIS Jean-françois
Next: Re: no space in pool for a new client (below max block thresh) in EPICS R3.14.12 Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: soft ioc runs into fatal exception (base-3.14.12) Jeff Hill
Next: RE: soft ioc runs into fatal exception (base-3.14.12) Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·