Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017 
<== Date ==> <== Thread ==>

Subject: RE: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8
From: "Jeff Hill" <johill@lanl.gov>
To: "'Hammonds, John P.'" <jphammonds@anl.gov>, <tech-talk@aps.anl.gov>
Date: Mon, 11 Dec 2006 13:01:16 -0700

John,

 

One possibility might be that your state sequencer distribution isn’t compatible with R3.14, but I suspect that if that were the case there would be undefined symbols when the sequencer support code was loaded.

 

I see that the channel is connected directly to a record, and that the crash occurs in dbChannelIO::write () calling a function at address 4. This dbChannelIO::write ()  function does very little other than calling db_put_field, and it is compile time bound to db_put_field so there isn’t a good explanation for a call to a function at address 4. It seems like a reasonable guess that there has been either object code corruption or stack corruption in your system. Object code corruption might have occurred at a time before the failure occurred. Stack corruption might have occurred in some function that is called by db_put_field - perhaps in some device support that db_put_field is calling. You could have a look at the validity of the object code for dbChannelIO::write by typing “l dbChannelIO::write”. A less likely possibility might be that you are running into a bug in the GNU compiler that is architecture specific.

 

Here are some debugging steps you might try.

O Change the target record temporarily to soft device support, and see if that changes the behavior.

O Temporarily turn off optimization and rebuild device support and or EPICS base.

O Set a break point in dbChannelIO::write or in db_put_field and single step until the bug comes into sight. This will be easier to do in the Tornado debugger than from the command line because you will be debugging at the C source code level instead of at the machine instruction level.

 

Jeff

 


From: Hammonds, John P. [mailto:jphammonds@anl.gov]
Sent: Monday, December 11, 2006 10:53 AM
To: Jeff Hill; tech-talk@aps.anl.gov
Subject: RE: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8

 

Jeff,

 

I finally got a change to get back to this.  I have done the tt printout that you suggested.  See the output below

 

tt 0x27598f8

19b9a8 vxTaskEntry    +60 : 77a6110 ()

77a6180 epicsThreadPrivateGet+f8 : sequencer ()

7685df8 sequencer      +1a0: ss_entry ()

76860b8 ss_entry       +280: 765ab00 (e8402a60)

765ab00 dbAddrSetup    +880: 767fb4c ()

767fb4c LoadAncParamFiles+624: ca_array_put ()

771e8d8 ca_array_put   +37c: 76e974c (761ac50)

76e974c dbChannelIO::write(epicsGuard<epicsMutex> &, unsigned int, unsigned long, void const *)+818: 4 ()

value = 0 = 0x0

 

LoadAncParamFiles is the routine that I have managed to tracking this to.  This is C code that is called by the sequencer that performs the basic Pseudo-code in my previous message. 

 

I am not familiar with debugging from this information.  Is there a way from this to determine where in the code failure occurred?  I am currently not configured to run with the Tornado tools.

 

Thanks, John

 

 

John Hammonds

Data Acquisition Systems Manager

Intense Pulsed Neutron Source

JPHammonds@anl.gov

(630)252-5317

 


From: Jeff Hill [mailto:johill@lanl.gov]
Sent: Wednesday, December 06, 2006 3:08 PM
To: Hammonds, John P.; tech-talk@aps.anl.gov
Subject: RE: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8

 

John,

 

The ETIMEDOUT indicates that there are problems building a TCP/IP circuit to 192.168.6.101:5064. To get that far the UDP services on 192.168.6.101:5064 must have been responding ok so presumably the local host has a route to 192.168.6.101. Nevertheless you might try pinging 192.168.6.101 from this host. Modern versions of vxWorks include the ping command, or not, depending on build options. Depending on the options you supply to the ping command you can get some coarse information about network responsiveness. You might also look at ifShow to see if there are any link errors on either host. You might also test to see if the IP kernel is working before and after this code runs. A change might indicate corruption of the IP kernel’s data structures.

 

I can’t speculate about the exception with instruction address 0x4 other than to guess that the code was clobbered or a nill function pointer was used. I might be able to make further guesses if you sent the output from “tt <task id>” in this case “tt 0x275bfd8”.  This will provide a stack trace. There are also ways to request a stack trace from the debugging facilities supplied with Tornado.

 

Note also that “preemptive callback mode” was the default for vxWorks prior to R3.14, but in R3.14 non-preemptive mode is the default and you must specifically request “preemptive callback mode” invariant of the OS you are running on. This maybe wouldn’t be a significant issue with the pseudo code that you sent.

 

Jeff

 

 


From: Hammonds, John P. [mailto:jphammonds@anl.gov]
Sent: Wednesday, December 06, 2006 12:37 PM
To: tech-talk@aps.anl.gov
Subject: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8

 

I am working on moving our detector data acquisition code from R3.13.7 to R3.14.8 and am having trouble with a piece of the code that makes channel access calls from C code.  I have 2 crates both running as VxWorks IOC.  One of these is a master crate that reads setup data from a file and loads this into the iocs.  The slave crate basically handles Sample environment and beamline control stuff.  The master crate does ca_put C library call to write data to the slave crate.  On moving this to R3.14.8 I am having trouble getting connection between the crates.  The basic code does

      ca_create_channel()  ( Under 3.13 I was using ca_search())

      ca_pend_io()  I now added code to check for ECA_NORMAL

      ca_put()

      ca_pend_io()

ca_clear_channel()

 

I get the following messages coming out of the IOC.  This code has worked well under 3.13.  Anyone have ideas for what could be going on here?

 

     

 

 

iocsand1> CAC: Unable to connect because "S_errno_ETIMEDOUT"

CA.Client.Exception...............................................

    Warning: "Virtual circuit disconnect"

    Context: "192.168.6.101:5064"

    Source File: ../cac.cpp line 1142

    Current Time: WED DEC 06 2006 11:41:56.170496250 ..................................................................

CAC: Unable to connect because "S_errno_ETIMEDOUT"

CA.Client.Exception...............................................

    Warning: "Virtual circuit disconnect"

    Context: "192.168.6.101:5064"

    Source File: ../cac.cpp line 1142

    Current Time: WED DEC 06 2006 11:43:16.170493050 ..................................................................

CAC: Unable to connect because "S_errno_ETIMEDOUT"

CA.Client.Exception...............................................

    Warning: "Virtual circuit disconnect"

    Context: "192.168.6.101:5064"

    Source File: ../cac.cpp line 1142

    Current Time: WED DEC 06 2006 11:44:36.170489850 ..................................................................

Setting sand0A:chngr_0:veto.DESC to SAND Sample Chan

 

program

Exception current instruction address: 0x00000004 Machine Status Register: 0x0008b030 Condition Register: 0x44000049

Task: 0x275bfd8 "ipns_daq"

filename="../../../src/libCom/taskwd/taskwd.c" line number=174 task 0x275bfd8 suspended

           

 

John Hammonds

Data Acquisition Systems Manager

Intense Pulsed Neutron Source

JPHammonds@anl.gov

(630)252-5317

 


References:
RE: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8 Hammonds, John P.

Navigate by Date:
Prev: RE: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8 Hammonds, John P.
Next: caDDE source code? Geoff Savage
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017 
Navigate by Thread:
Prev: RE: Problems with Channel Access calls on an IOC moving from R3.13.7 -> 3.14.8 Hammonds, John P.
Next: SLAC Controls Department Position for LCLS Shoaee, Hamid
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·