EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Asyn ModbusTCP communication KO without error messages
From: Mark Rivers <[email protected]>
To: "'haquin'" <[email protected]>, tech-talk <[email protected]>
Date: Wed, 23 Sep 2015 15:44:54 +0000
Hi,

It is not a call to lfSetErrLogSev.  That is just the next global symbol that VxWorks could find at a lower memory address than the function being called.  Note that it says:

lfSetErrLogSev+0x3584

That means the address of the calling function is 0x3584 bytes larger than lfSetErrLogSev, which is probably in some completely unrelated function.

Similarly this line:

drvAsynIPPortConfigure+0xc94: send ()

does not mean that the function being executed was drvAsynIPPortConfigure, but again that was the closest global symbol.  Because it is calling send(), and that is only called in one place in drvAsynIPPort.c I know that it is being called from this function:

static asynStatus writeIt(void *drvPvt, asynUser *pasynUser,
    const char *data, size_t numchars,size_t *nbytesTransfered)
{

That is a static function so its address is not in the VxWorks symbol table, and it cannot tell you what function it was actually being called from.

Mark


-----Original Message-----
From: haquin [mailto:[email protected]] 
Sent: Wednesday, September 23, 2015 10:01 AM
To: Mark Rivers; tech-talk
Subject: Re: Asyn ModbusTCP communication KO without error messages

Thank you Mark for this analysis !

In the result of the tt command I realized there is a call to a "lfSetErrLogSev" between "drvAsynIPServerPortConfigure" 
and "drvAsynIPServerPortConfigure".

This is a local function setting the log level, called at initialisation of a genSub record.
How can this function be called by "drvAsynIPServerPortConfigure" ... any idea ?



Le 23/09/2015 14:28, Mark Rivers a écrit :
> Hi,
>
> So it looks like the last line being executed in drvAsynIPPort.c is this:
>
>              thisWrite = send(tty->fd, (char *)data, (int)numchars, 0);
>
> That is simply calling the vxWorks function send().  That ultimately results in a call to the internal vxWorks function ipcom_sendmsg() which is calling free().  It looks to me like the memory pointer being passed to free() is corrupted.  However, that would not be freeing the memory buffer "data" that was passed from drvAsynIPPort.c, it must be some internal buffer in the vxWorks routines.
>
> I suspect some other software in your IOC has a buffer overflow problem or uninitialized pointer and is corrupting memory elsewhere in the system.
>
> These are hard problems to track down, particularly when the failure can take days.  The typical approach is to remove software modules from the IOC one at a time until the problem goes away and then zero in on the last thing you removed. But that can take a long time when the failure rate is low.  And there is no guarantee that will work, since the memory corruption may just move to an area where it never shows up.
>
> Mark
>
>
> ________________________________
> From: haquin [[email protected]]
> Sent: Wednesday, September 23, 2015 7:13 AM
> To: Mark Rivers; tech-talk
> Subject: Re: Asyn ModbusTCP communication KO without error messages
>
> Hello Mark,
>
> the problem just happened again, it's always the same task being suspended without error message.
> Here is the output of the "tt" command
>
> iocVMELB1 > tt 0x17d1fb0
> 0x0012bd34 vxTaskEntry  +0x48 : 0x00a4c4cc ()
> 0x00a4c538 epicsThreadCreate+0x1c0: 0x008ffc84 ()
> 0x008ffc84 modbusInterposeConfig+0xb88c: 0x00917bf8 ()
> 0x00917c60 asynInterposeFlushConfig+0x79d4: 0x008f2898 ()
> 0x008f2a44 drvModbusAsynConfigure+0x828: 0x008f157c ()
> 0x008f1934 lfSetErrLogSev+0x3584: 0x009081e4 ()
> 0x00908344 drvAsynIPServerPortConfigure+0x2904: 0x008f4b04 ()
> 0x008f4bf4 modbusInterposeConfig+0x7fc: 0x009070fc ()
> 0x0090711c drvAsynIPServerPortConfigure+0x16dc: 0x009049a4 ()
> 0x00904c0c drvAsynIPPortConfigure+0xc94: send ()
> 0x002b0a48 send         +0x7c : 0x002e1938 ()
> 0x002e194c ipcom_spinlock_delete+0x97c: ipcom_send ()
> 0x001d6430 ipcom_sendto +0x48 : ipcom_sendmsg ()
> 0x001d639c ipcom_sendmsg+0x718: free ()
> 0x002077b4 free         +0x3c : 0x00207148 ()
> 0x00207238 memPartBlockIsValid+0x180: taskSuspend ()
> value = 0 = 0x0
> iocVMELB1 >
>
> ... what to think about this "memPartBlockIsValid", it's coherent with the task error code : Errno 0x110003 => S_memLib_BLOCK_ERROR
> Is there some memory addresses corrupted ?
>
> thanx in advance
> best regards
>
> Le 09/09/2015 18:01, Mark Rivers a écrit :
> I suspect your problem is indeed that suspended task.  Are you sure there was no error message when the task got suspended?
> You can learn something about the task status by running the "tt" command on it:
> tt 0x16963d0
> Mark
>
> From: haquin [mailto:[email protected]]
> Sent: Wednesday, September 09, 2015 6:44 AM
> To: Mark Rivers; tech-talk
> Subject: Re: Asyn ModbusTCP communication KO without error messages
>
> Mark,
>
> The problem happened again yesterday, I've attached files with the result of the several commands.
> What I've noticed is that there is a PLC task that is suspended with a error code:
> Errno 0x110003 => S_memLib_BLOCK_ERROR
>    NAME         ENTRY       TID    PRI   STATUS      PC       SP     ERRNO  DELAY
> ----------  ------------ -------- --- ---------- -------- -------- ------- -----
> LBE1-PLCV   a4c48c        167fb20 149 PEND         2c9038  1682b70       0     0
> LBE1-PLCV-> a4c48c        1686000 149 PEND         2c7154  1689080       0     0
> LBE1-PLCV-> a4c48c        168a9e0 149 PEND         2c7154  168c990      46     0
> LBE1-PLCV-> a4c48c        168f510 149 PEND         2c7154  1692420       0     0
> LBE1-PLCV-> a4c48c        16963d0 149 SUSPEND      2ce5c4  1699110  110003     0
> LBE1-PLCV-> a4c48c        169c1b0 149 PEND         2c7154  169f0c0       0     0
> LBE1-PLCV-> a4c48c        16a29c0 149 PEND         2c7154  16a58d0       0     0
> LBE1-PLCV-> a4c48c        16a87c0 149 PEND         2c7154  16ab6d0       0     0
> LBE1-PLCV-> a4c48c        16ae5c0 149 PEND         2c7154  16b14d0       0     0
> LBE1-PLCV-> a4c48c        16b4280 149 PEND         2c7154  16b7190       0     0
> LBE1-PLCV-> a4c48c        16ba620 149 PEND         2c7154  16bd6a0       0     0
> LBE1-PLCV-> a4c48c        16c0420 149 PEND         2c7154  16c34a0       0     0
> LBE1-PLCV-> a4c48c        16c6230 149 PEND         2c7154  16c92b0       0     0
> LBE1-PLCV-> a4c48c        16cc030 149 PEND         2c7154  16cef40       0     0
>
> - I don't think there is a deadlock, but I don't know what to expect with the output of the command in this case.
> - There is no task overflow, especially the one that is suspended.
> - The result of Asyn report is in attachment
>
> I've added your Asyn record and there was the following errors when I tried to connect to the READ  port or set ON/OFF a trace:
> iocVMELB1 > 2015/09/08 11:29:24.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:1509] [CAS-client,0x8f70500,20] LBE1-PLCV-READ addr 0 queueRequest priority 3 not lockHolder
> 2015/09/08 11:29:24.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:1520] [CAS-client,0x8f70500,20] LBE1-PLCV-READ schedule queueRequest timeout
> 2015/09/08 11:29:27.925 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:637] [timerQueue,0xc04f20,60] LBE1-PLCV-READ asynManager:queueTimeoutCallback
> 2015/09/08 11:29:27.925 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:2003] [timerQueue,0xc04f20,60] LBE1-PLCV-asyn: special queueRequest timeout
> 2015/09/08 11:29:34.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:637] [timerQueue,0xc04f20,60] LBE1-PLCV-READ asynManager:queueTimeoutCallback
> 2015/09/08 11:29:34.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:2003] [timerQueue,0xc04f20,60] LBE1-PLCV-asyn: special queueRequest timeout
>
>
>
> iocVMELB1 > 2015/09/08 11:31:44.577 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 3
> 2015/09/08 11:31:51.910 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 4
> 2015/09/08 11:31:54.527 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 4
> 2015/09/08 11:31:57.244 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 3
> 2015/09/08 11:32:00.077 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] LBE1-PLCV-asyn: exception 5
> 2015/09/08 11:32:03.277 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 5
>
> Thanks for your help
> best regards
>
> Le 02/09/2015 16:46, Mark Rivers a écrit :
>
> Hi Christophe,
>
>
>
> Here are some things to look for:
>
>
>
> - On vxWorks perhaps a task has been suspended.  Issue the "i" command to look at the status of all of the tasks.
>
>
>
> - Perhaps there is a deadlock.  Issue this command several times in a row to see if there is a mutex that is always locked:
>
> epicsMutexShowAll 1
>
>
>
> - Perhaps there was a stack overflow.  Issue this command and look for tasks with a margin of 0
>
> checkStack
>
>
>
> If you don't find anything there then send us the output of "asynReport 10" on the Read port and Write port.
>
>
>
> Mark
>
>
>
>
>
>
>
>
>
> -----Original Message-----
>
> From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of haquin
>
> Sent: Wednesday, September 02, 2015 9:00 AM
>
> To: tech-talk
>
> Subject: Asyn ModbusTCP communication KO without error messages
>
>
>
> Hi all,
>
>
>
> I have a VxWorks IOC (with MVME-CPU using both eth interfaces) communicating with a siemens S7PLC via Asyn/ModbusTCP.
>
> EPICS release 3.14.12.4 Asyn v4.22
>
>
>
> After a while  (1 or 2 days), the communication is not working anymore ... but I have no error messages (no timeout nor
>
> disconnection ...).
>
>   From IOC side I a have "Read Multiple Register" function reading the whole modbus table (109 registers) every second
>
> and a "Write Multiple Register" function writing the value of a counter incremented every seconds from record level.
>
>
>
> When I activate AsynTrace on IP Port or Read or Write ports there is no messages ...
>
> asynReport on Read port indicates only 1 Read OK
>
> asynReport on Write port indicates 0 Write OK
>
>
>
> I can read the PLC register via "modpoll" tool from a Linux PC
>
> I can start a Linux IOC connected to the same PLC
>
> The netstat command on IOC shell tells that the TCP port is established but the Recv-Q is not equal to 0 (12 for example)
>
>
>
> What can explain this behavior ?
>
>
>
> thanks in advance !
>
>
>
>
>
> --
>
> Christophe Haquin
>
> Control and Real Time systems Engineer
>
>
>
> +33 231454661 office
>
> +33 231454728 fax
>
> SdA/GIM
>
> GANIL
>
> Bd Henri Becquerel BP 55027
>
> 14076 CAEN CEDEX5
>
>
> --
> Christophe Haquin
> Control and Real Time systems Engineer
>
> +33 231454661 office
> +33 231454728 fax
> SdA/GIM
> GANIL
> Bd Henri Becquerel BP 55027
> 14076 CAEN CEDEX5

-- 
Christophe Haquin
Control and Real Time systems Engineer

+33 231454661 office
+33 231454728 fax
SdA/GIM
GANIL
Bd Henri Becquerel BP 55027
14076 CAEN CEDEX5



References:
Asyn ModbusTCP communication KO without error messages haquin
RE: Asyn ModbusTCP communication KO without error messages Mark Rivers
RE: Asyn ModbusTCP communication KO without error messages Mark Rivers
Re: Asyn ModbusTCP communication KO without error messages haquin
RE: Asyn ModbusTCP communication KO without error messages Mark Rivers
Re: Asyn ModbusTCP communication KO without error messages haquin

Navigate by Date:
Prev: Re: Asyn ModbusTCP communication KO without error messages haquin
Next: RE: Wireless devices inside the accelerator tunnel austen.rose
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Asyn ModbusTCP communication KO without error messages haquin
Next: job openings for neutron-event streaming Tobias Richter
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  <20152016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 16 Dec 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·