EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: proper thread cleanup on Linux?
From: Andrew Johnson <[email protected]>
To: Jeff Hill <[email protected]>
Cc: "'Core-Talk'" <[email protected]>
Date: Tue, 25 Aug 2009 15:25:56 -0500
Hi Jeff,

We don't use pthread_cancel() any more, which was what was throwing those 
funny exception objects in earlier releases.

I don't know whether it's relevent, but I get some strange behavior when I ran 
acctst under gdb, which might be related.  Here's my first session:

======================================================
(gdb) run anjCas:bill 10
Starting 
program: /home/phoebus3/ANJ/epics/base/3-14-dev/bin/linux-x86_64/acctst 
anjCas:bill 10
[Thread debugging using libthread_db enabled]
CA Client V4.11, channel name "anjCas:bill", timeout 1e+20
[New Thread 140647857112832 (LWP 31531)]
[New Thread 1084512592 (LWP 31539)]
[Thread 1084512592 (LWP 31539) exited]
[New Thread 1087383888 (LWP 31541)]
[New Thread 1101023568 (LWP 31543)]
Spurious thread death event.
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64 
ncurses.x86_64 readline.x86_64
(gdb) info threads
[New Thread 1083005264 (LWP 31544)]
  5 Thread 1083005264 (LWP 31544)  0x0000003a152d4ad1 in clone ()
   from /lib64/libc.so.6
  4 Thread 1101023568 (LWP 31543)  0x0000003a15e0cebe in 
__lll_lock_wait_private () from /lib64/libpthread.so.0
  3 Thread 1087383888 (LWP 31541)  0x0000003a15e0a8f9 in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 140647857112832 (LWP 31531)  0x0000003a152d4ad1 in clone ()
   from /lib64/libc.so.6
(gdb) c
Continuing.
[New Thread 1084512592 (LWP 31551)]
[Thread 1083005264 (LWP 31544) exited]
Waiting for test channel to connect.[New Thread 1090611536 (LWP 31553)]
Spurious thread death event.
(gdb) c
Continuing.
.[New Thread 1077299536 (LWP 31555)]
confirmed.
Please force test channel to disconnect...[Thread 1090611536 (LWP 31553) 
exited]
[Thread 1077299536 (LWP 31555) exited]
confirmed.
Waiting for test channel to connect..Spurious thread death event.
(gdb) c
Continuing.
Couldn't get registers: No such process.
(gdb) info threads
Couldn't get registers: No such process.
======================================================

Once the "Couldn't get registers: No such process." message appears even 
though the process does actually still exist it is no longer possible to 
debug it.

When I run it without the debugger, I get this output which I assume is 
normal:

======================================================
tux% bin/linux-x86_64/acctst anjCas:bill 10
CA Client V4.11, channel name "anjCas:bill", timeout 1e+20
Waiting for test channel to connect..confirmed.
Please force test channel to disconnect..confirmed.
Waiting for test channel to connect..confirmed.
verifyImmediateTearDown {..........} 2.926606 sec
verifyTearDownWhenChannelConnected {} 3.196328 sec
unequalServerBufferSizeTest {...} 0.055779 sec
clearChannelInGetCallbackTest {} 0.208205 sec
clearChannelInPutCallbackTest {} 0.107796 sec
clearChannelInSubscrCallbackTest {} 0.108289 sec
monitorAddConnectionCallbackTest {} 0.200117 sec
connecting to test channel {} 0.023999 sec
native type was DBF_DOUBLE, native count was 1
verifyConnectWithDisconnectedChannels {..........} 10.028629 sec
grEnumTest {} 0.000398 sec
test_sync_groups {} 0.043113 sec
verifyChannelPriorities {} 0.795214 sec
verifyTimeStamps {Processing time of channel "anjCas:bill" was "Tue Aug 25 
2009 14:13:11.039982982"
Time difference between two successive reads was 0 sec
Time difference between client and server -0.000202196 sec
} 0.001146 sec
verifyOldPend {} 0.000019 sec
exceptionTest {} 0.501087 sec
arrayTest {} 0.200782 sec
verifyMonitorSubscriptionFlushIO {} 0.100090 sec
monitorSubscriptionFirstUpdateTest {} 0.422984 sec
ctrlDoubleTest {} 0.000265 sec
verifyBlockInPendIO {} 0.000178 sec
verifyAnalogIO {} 0.098819 sec
verifyAnalogIO {} 0.101892 sec
multiSubscriptionDeleteTest {...} 0.158459 sec
singleSubscriptionDeleteTest {..........} 0.841244 sec
channelClearWithEventTrafficTest {} 27.008180 sec
monitorUpdateTest {...} 1.124018 sec
verifyHighThroughputRead {} 0.035451 sec
verifyHighThroughputWrite {} 0.003346 sec
verifyHighThroughputReadCallback {} 0.213053 sec
verifyHighThroughputWriteCallback {} 0.034205 sec
verifyBadString {Test fails if stim "8" isnt roughly equiv to resp "8.0000"
} 0.000435 sec
fdManagerVerify {...} 2.019339 sec

CA pend event delay = 1.000000 sec results in error = 0.003195 %
CA pend event delay = 0.100000 sec results in error = 0.026119 %
CA pend event delay = 0.250000 sec results in error = 0.008583 %
ca_client_context at 0x7f546c021460 pndRecvCnt=0 ioSeqNo=4371
verifyConnectionHandlerConnect {....} 9.417790 sec
verifyBlockingConnect {....} 2.577709 sec
verifyClear {} 0.000096 sec
verifyReasonableBeaconPeriod {Beacon anomalies detected since program start 9
Estimated beacon period for channel anjCas:bill = 14.995 sec.
busy: receive watchdog for "anjCas:bill" expires in 29.9935 sec.
...............
inactive: receive watchdog for "anjCas:bill" expires in 29.9914 sec.
} 30.001330 sec
caTaskExitTest {} 0.005461 sec
verifyContextRundownFlush {..........} 58.721795 sec
verifyContextRundownChanStillExist {} 0.323408 sec

Test Complete
======================================================

As that runs, it causes three copies of this message to appear from the excas 
process, after the connect/reconnect:

filename="../../../include/gddAppFuncTable.h" line number=263
application type unregistered - ukn appl type code = 19


Going back to the debugger, I put a breakpoint on main() and subsequently 
epicsThreadCreate() to trace the threads starting, with this output:

======================================================
(gdb) break main
Breakpoint 1 at 0x4020b0: file ../acctstMain.c, line 18.
(gdb) run anjCas:bill 10
Starting 
program: /home/phoebus3/ANJ/epics/base/3-14-dev/bin/linux-x86_64/acctst 
anjCas:bill 10
[Thread debugging using libthread_db enabled]
[New Thread 140403263534848 (LWP 21713)]
[Switching to Thread 140403263534848 (LWP 21713)]

Breakpoint 1, main (argc=3, argv=0x7fff36a8d9e8) at ../acctstMain.c:18
18      {
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64 
ncurses.x86_64 readline.x86_64
(gdb) break epicsThreadCreate
Breakpoint 2 at 0x64a690: file ../../../src/libCom/osi/os/posix/osdThread.c, 
line 363.
(gdb) c
Continuing.
CA Client V4.11, channel name "anjCas:bill", timeout 1e+20

Breakpoint 2, epicsThreadCreate (name=0x65787c "ipToAsciiProxy", priority=10,
    stackSize=524288, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x13dd858) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[New Thread 1106401616 (LWP 21724)]

Breakpoint 2, epicsThreadCreate (name=0x659ea3 "timerQueue", priority=2,
    stackSize=262144, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x13ddfd0) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[New Thread 1086654800 (LWP 21726)]

Breakpoint 2, epicsThreadCreate (name=0x162507 "CAC-UDP", priority=4,
    stackSize=262144, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x13fd1d0) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[New Thread 1085253968 (LWP 21728)]
Waiting for test channel to connect...
[Switching to Thread 1085253968 (LWP 21728)]

Breakpoint 2, epicsThreadCreate (name=0x162c15 "CAC-TCP-recv", priority=0,
    stackSize=524288, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x14018d0) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[New Thread 1089636688 (LWP 21732)]

Breakpoint 2, epicsThreadCreate (name=0x162c22 "CAC-TCP-send", priority=2,
    stackSize=262144, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x1401928) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[New Thread 1089108304 (LWP 21734)]
confirmed.
Please force test channel to disconnect..
[Thread 1089636688 (LWP 21732) exited]
[Thread 1089108304 (LWP 21734) exited]
confirmed.
Waiting for test channel to connect..
Breakpoint 2, epicsThreadCreate (name=0x162c15 "CAC-TCP-recv", priority=0,
    stackSize=524288, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x14018d0) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
Spurious thread death event.
(gdb) info threads
* 4 Thread 1085253968 (LWP 21728)  0x0000003a15e07130 in 
pthread_create@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
  3 Thread 1086654800 (LWP 21726)  0x0000003a15e0cef4 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  2 Thread 1106401616 (LWP 21724)  0x0000003a15e0a8f9 in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1 Thread 140403263534848 (LWP 21713)  0x0000003a1529ac61 in nanosleep ()
   from /lib64/libc.so.6
(gdb) bt
#0  0x0000003a15e07130 in pthread_create@@GLIBC_2.2.5 ()
   from /lib64/libpthread.so.0
#1  0x000000000064a7f8 in epicsThreadCreate (name=0x162c15 "CAC-TCP-recv",
    priority=0, stackSize=524288, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x14018d0) at ../../../src/libCom/osi/os/posix/osdThread.c:377
#2  0x0000000000642f7f in epicsThread (this=0x14018d0,
    runableIn=<value optimized out>, pName=0x162c15 "CAC-TCP-recv",
    stackSize=524288, priority=0)
    at ../../../src/libCom/osi/epicsThread.cpp:183
#3  0x0000000000149529 in tcpRecvThread (this=0x14018c8, iiuIn=@0x14017e0,
    cbMutexIn=@0x13dcaf0, ctxNotifyIn=@0x13dca20,
    pName=0x20 <Address 0x20 out of bounds>, stackSize=4294967295, priority=0)
    at ../tcpiiu.cpp:389
#4  0x000000000014ef39 in tcpiiu (this=0x14017e0, cac=@0x13dce10,
    mutexIn=@0x13dcae8, cbMutexIn=@0x13dcaf0, ctxNotifyIn=@0x13dca20,
    connectionTimeout=30, timerQueue=@0x13ddf00, addrIn=@0x40af9df0,
    comBufMemMgrIn=@0x13dcf88, minorVersion=11, engineIn=@0x13dd400,
    priorityIn=@0x40af9d9c) at ../tcpiiu.cpp:697
#5  0x0000000000134a6d in cac::transferChanToVirtCircuit (this=0x13dce10,
    cid=<value optimized out>, sid=4294967295, typeCode=65535, count=0,
    minorVersionNumber=11, addr=@0x40af9df0, currentTime=@0x40afa000)
    at ../cac.cpp:540
#6  0x0000000000146495 in udpiiu::searchRespAction (
    this=<value optimized out>, msg=<value optimized out>,
    addr=<value optimized out>, currentTime=@0x206) at ../udpiiu.cpp:672
#7  0x0000000000146c90 in udpiiu::postMsg (this=0x13ecd80,
    net_addr=@0x40af9fe0, pInBuf=0x13ed1b0 "\006", blockSize=24,
    currentTime=@0x40afa000) at ../udpiiu.cpp:839
#8  0x00000000001478bd in udpRecvThread::run (this=0x13fd1b0)
    at ../udpiiu.cpp:386
#9  0x00000000006437fa in epicsThreadCallEntryPoint (
    pPvt=<value optimized out>) at ../../../src/libCom/osi/epicsThread.cpp:85
#10 0x0000000000649763 in start_routine (arg=<value optimized out>)
    at ../../../src/libCom/osi/os/posix/osdThread.c:282
#11 0x0000003a15e06407 in start_thread () from /lib64/libpthread.so.0
#12 0x0000003a152d4b0d in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 1086654800 (LWP 21726))]#0  0x0000003a15e0cef4 
in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003a15e0cef4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003a15e0894b in _L_lock_879 () from /lib64/libpthread.so.0
#2  0x0000003a15e087db in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000064ac66 in epicsMutexOsdLock (pmutex=<value optimized out>)
    at ../../../src/libCom/osi/os/posix/osdMutex.c:44
#4  0x0000000000643e4c in epicsMutex::lock (this=<value optimized out>)
    at ../../../src/libCom/osi/epicsMutex.cpp:238
#5  0x000000000014300d in searchTimer::expire (this=0x13fe110,
    currentTime=@0x80) at ../../../include/epicsGuard.h:71
#6  0x0000000000650c22 in timerQueue::process (this=0x13ddf30,
    currentTime=@0x40c50020) at ../../../src/libCom/timer/timerQueue.cpp:140
#7  0x0000000000651786 in timerQueueActive::run (this=0x13ddf00)
    at ../../../src/libCom/timer/timerQueueActive.cpp:91
#8  0x00000000006437fa in epicsThreadCallEntryPoint (
    pPvt=<value optimized out>) at ../../../src/libCom/osi/epicsThread.cpp:85
#9  0x0000000000649763 in start_routine (arg=<value optimized out>)
    at ../../../src/libCom/osi/os/posix/osdThread.c:282
#10 0x0000003a15e06407 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003a152d4b0d in clone () from /lib64/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1106401616 (LWP 21724))]#0  0x0000003a15e0a8f9 
in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003a15e0a8f9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x000000000064b2a1 in epicsEventWait (pevent=0x13dd990)
    at ../../../src/libCom/osi/os/posix/osdEvent.c:75
#2  0x000000000064424c in epicsEvent::wait (this=<value optimized out>)
    at ../../../src/libCom/osi/epicsEvent.cpp:63
#3  0x0000000000641452 in ipAddrToAsciiEnginePrivate::run (this=0x13dd400)
    at ../../../src/libCom/misc/ipAddrToAsciiAsynchronous.cpp:305
#4  0x00000000006437fa in epicsThreadCallEntryPoint (
    pPvt=<value optimized out>) at ../../../src/libCom/osi/epicsThread.cpp:85
#5  0x0000000000649763 in start_routine (arg=<value optimized out>)
    at ../../../src/libCom/osi/os/posix/osdThread.c:282
#6  0x0000003a15e06407 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003a152d4b0d in clone () from /lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 140403263534848 (LWP 21713))]#0  
0x0000003a1529ac61 in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003a1529ac61 in nanosleep () from /lib64/libc.so.6
#1  0x00000000006490ea in epicsThreadSleep (seconds=999991320.00000036)
    at ../../../src/libCom/osi/os/posix/osdThread.c:572
#2  0x0000000000153504 in ca_client_context::pendEvent (
    this=<value optimized out>, timeout=@0x7fff36a8d7d8)
    at ../ca_client_context.cpp:585
#3  0x000000000013d584 in ca_pend_event (timeout=5) at ../access.cpp:466
#4  0x0000000000402c03 in verifyDisconnect (
    pName=0x7fff36a8f3b8 "anjCas:bill", interestLevel=<value optimized out>)
    at ../acctst.c:2926
#5  0x000000000040ba36 in acctst (pName=0x7fff36a8f3b8 "anjCas:bill",
    interestLevel=10, channelCount=20000, repetitionCount=1,
    select=ca_disable_preemptive_callback) at ../acctst.c:3067
#6  0x0000000000402120 in main (argc=3, argv=0x7fff36a8d9e8)
    at ../acctstMain.c:67
(gdb) c
Continuing.
.
[New Thread 1089108304 (LWP 21756)]

Breakpoint 2, epicsThreadCreate (name=0x162c22 "CAC-TCP-send", priority=2,
    stackSize=262144, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x1401928) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[New Thread 1086388560 (LWP 21758)]
.confirmed.
verifyImmediateTearDown {
[Thread 1086388560 (LWP 21758) exited]
[Thread 1089108304 (LWP 21756) exited]

Breakpoint 2, epicsThreadCreate (name=0x162c15 "CAC-TCP-recv", priority=0,
    stackSize=524288, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x14018d0) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
Spurious thread death event.
(gdb) c
Continuing.
[New Thread 1089636688 (LWP 21762)]

Breakpoint 2, epicsThreadCreate (name=0x162c22 "CAC-TCP-send", priority=2,
    stackSize=262144, funptr=0x643780 <epicsThreadCallEntryPoint>,
    parm=0x1401928) at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
Spurious thread death event.
(gdb) c
Continuing.
[New Thread 1086388560 (LWP 21765)]
[Thread 1089636688 (LWP 21762) exited]
[Thread 1086388560 (LWP 21765) exited]
[Switching to Thread 140403263534848 (LWP 21713)]

Breakpoint 2, epicsThreadCreate (name=0x653fe4 "errlog", priority=10,
    stackSize=131072, funptr=0x633dd0 <errlogThread>, parm=0x0)
    at ../../../src/libCom/osi/os/posix/osdThread.c:363
363     {
(gdb) c
Continuing.
[Thread 1085253968 (LWP 21728) exited]
[Thread 1106401616 (LWP 21724) exited]
[Thread 1086654800 (LWP 21726) exited]
Spurious thread death event.
(gdb)
Continuing.
Couldn't get registers: No such process.
(gdb)
======================================================

I'm seeing those "Spurious thread death event." errors on both 64-bit and 
32-bit SMP linux systems, Fedora Core 8 and 10 respectively.

I did a Google search on that error message and it came up with the source 
file from gdb, where it looks like it means that gdb received a thread death 
message from the process for a thread that it didn't know about.

- Andrew
-- 
The best FOSS code is written to be read by other humans -- Harold Welte

References:
proper thread cleanup on Linux? Jeff Hill

Navigate by Date:
Prev: proper thread cleanup on Linux? Jeff Hill
Next: RE: proper thread cleanup on Linux? Jeff Hill
Index: 2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: proper thread cleanup on Linux? Jeff Hill
Next: RE: proper thread cleanup on Linux? Jeff Hill
Index: 2002  2003  2004  2005  2006  2007  2008  <20092010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·