Hi Jeff,
It sounds like it might be a tricky one to track down!
I've seen it occur at least twice (on separate but similarly configured
IOCs), but haven't yet figured out how to reproduce it. If I do I will
let you know and try to get some more debugging information.
Cheers,
Emma
> -----Original Message-----
> From: Jeff Hill [mailto:[email protected]]
> Sent: 04 September 2007 17:52
> To: Shepherd, EL (Emma); [email protected]
> Subject: RE: CAC-TCP-recv suspended
>
>
>
> Hello Emma,
>
> I created Mantis 299 to track this issue. It's difficult at
> this point to isolate to a subsystem. The assert fail in
> dbCa.c initially points to a logic error in the db ca link
> code, or alternatively a race condition - possibly a data
> structure that is being used after it was deleted.
> Alternatively, this might be generalized corruption, or a
> failure in another subsystem (possibly the CA client
> library). I am not intimately familiar with the dbCa.c code
> so this may require some time spent looking at the sources.
>
> Have you seen this occur more than once?
>
> If the problem is repeatable, is it possible to reproduce it
> with a small database along with a well defined recipe of
> external circumstances? If the problem is repeatable, but not
> with a small database, you might also obtain further details
> (a stack trace with arguments and possibly the contents of
> related data structures) by building base for debugging and
> then attaching to the crashed thread using the Tornado debugger.
>
> Jeff
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> On Behalf Of Shepherd, EL (Emma)
> Sent: Monday, September 03, 2007 10:09 AM
> To: [email protected]
> Subject: CAC-TCP-recv suspended
>
> Hi all,
>
> I have come across a problem on an R3.14.8.2 IOC that is
> affecting channel access links - some records are in LINK
> ERROR and others have CP links that fail to update. When we
> started investigating we found that the CAC-TCP-recv task was
> in SUSPEND+I state, and the following messages had been
> printed to the console:
>
> BL18I-MO-IOC-01.diamond.ac.uk:1 Wed Aug 15 16:37:26 2007
> CAC-TCP-recv: A call to "assert (pca->pgetNative)" failed in
> ../dbCa.c at 629 BL18I-MO-IOC-01.diamond.ac.uk:1 Wed Aug 15
> 16:37:26 2007 Current time WED AUG 15 2007
> 15:37:23.708349950. BL18I-MO-IOC-01.diamond.ac.uk:1 Wed Aug
> 15 16:37:26 2007 EPICS Release EPICS R3.14.8.2 $R3-14-8-2$
> $2006/01/06 15:55:13$. BL18I-MO-IOC-01.diamond.ac.uk:1 Wed
> Aug 15 16:37:26 2007 Please E-mail this message and the
> output from "tt (0x1e0ff9e0)" BL18I-MO-IOC-01.diamond.ac.uk:1
> Wed Aug 15 16:37:26 2007 to the author or to [email protected]
>
> Here is the task trace:
>
> BL18I-MO-IOC-01 -> tt 0x1e0ff9e0
> 231ff8 vxTaskEntry +68 : 1e8cb6e4 ()
> 1e8cb754 epicsThreadPrivateGet+f8 : epicsThreadCallEntryPoint
> () 1e8bd048 epicsThreadCallEntryPoint+15c: 1e88b718 (1)
> 1e88b718 tcpRecvThread::run(void)+990: 1e88e78c () 1e88e78c
> tcpiiu::processIncoming(epicsTime const &, callbackManager
> &)+408: cac::executeResponse(callbackManager &, tcpiiu &,
> epicsTime const &, caHdrLargeArray &, char *) () 1e87a588
> cac::executeResponse(callbackManager &, tcpiiu &, epicsTime
> const &, caHdrLargeArray &, char *)+bc : cac
> ::eventRespAction(callbackManager &, tcpiiu &, epicsTime
> const &, caHdrLargeArray const &, void *) () 1e875fc8
> cac::eventRespAction(callbackManager &, tcpiiu &, epicsTime
> const &, caHdrLargeArray const &, void *)+19 4:
> netSubscription::completion(epicsGuard<epicsMutex> &,
> cacRecycle &, unsigned int, unsigned long, void const *) ()
> 1e89a364 netSubscription::completion(epicsGuard<epicsMutex>
> &, cacRecycle &, unsigned int, unsigned long, void co nst
> *)+84 : oldSubscription::current(epicsGuard<epicsMutex> &,
> unsigned int, unsigned long, void const *) () 1e855ff4
> oldSubscription::current(epicsGuard<epicsMutex> &, unsigned
> int, unsigned long, void const *)+104: 1e815 434 ()
> 1e8156d0 dbCaGetUnits +790: epicsAssert ()
> 1e8c9a5c epicsAssert +154: epicsThreadSuspendSelf ()
> 1e8cb010 epicsThreadSuspendSelf+2c : taskSuspend ()
> value = 0 = 0x0
>
> Any ideas what could have caused this?
>
> Emma
> <DIV><FONT size="1" color="gray">This e-mail and any
> attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended
> addressee only. If you are not the intended addressee or an
> authorised recipient of the addressee please notify us of
> receipt by returning the e-mail and do not use, copy, retain,
> distribute or disclose the information in or attached to the
> e-mail. Any opinions expressed within this e-mail are those
> of the individual and not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail
> or any attachments are free from viruses and we cannot accept
> liability for any damage which you may sustain as a result of
> software viruses which may be transmitted in or with the
> message. Diamond Light Source Limited (company no. 4375679).
> Registered in England and Wales with its registered office at
> Diamond House, Harwell Science and Innovation Campus, Didcot,
> Oxfordshire, OX11 0DE, United Kingdom </FONT></DIV>
>
>
<DIV><FONT size="1" color="gray">This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
</FONT></DIV>
- Replies:
- Re: CAC-TCP-recv suspended Andrew Johnson
- References:
- RE: CAC-TCP-recv suspended Jeff Hill
- Navigate by Date:
- Prev:
EPICS Python?? Heinrich du Toit
- Next:
cmlog compile problems Heinrich du Toit
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: CAC-TCP-recv suspended Jeff Hill
- Next:
Re: CAC-TCP-recv suspended Andrew Johnson
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|