EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Sequence monitor not getting callback
From: "Shankar, Murali" <[email protected]>
To: Jeff Hill <[email protected]>, "[email protected]" <[email protected]>
Date: Wed, 12 Oct 2011 10:18:39 -0700

This is on linux so I will pop this into gdb and send you the relevant info offline.

 

Also, I did manage to reproduce this in a scenario where the sequencer was not involved at all; though the test code was doing what I thought the sequencer would be doing.

 

Thanks again for your assistance.

 

Regards,

Murali

 

 

From: Jeff Hill [mailto:[email protected]]
Sent: Wednesday, October 12, 2011 10:08 AM
To: Shankar, Murali; [email protected]
Subject: RE: Sequence monitor not getting callback

 

Hi Murali,

 

Very sorry about the delay responding, due to my own mistakes packing I didn’t have access to my email while on travel.

 

Ø  In a recent run, I had 14 out of 2600 sequences not getting their monitor callbacks.

Ø  I did a dbel on the non-functioning PV’s (H75 H73 H82 H78 H72 L52 L54 P93 R71

Ø  R80 V82 V71 V86 Z80)  and a couple of functioning ones (A01 A75). The results are

Ø  pasted below. The ones that are not getting their monitor callbacks called seem

Ø  to have a fair number of “undelivered” or “discarded by replacement” messages.

Ø  The ones that are functioning have “queue empty”.  Is this expected?

 

The Channel Access event queue serves as a conduit from the high priority database scan threads to the low priority CA server, or the local sharing-the-same-address-space ca client (specifically in this situation - the sequencer). The purpose of the event queue is to guarantee that the high priority threads never block when they post updates to the event queue, and also to temporarily preserve update bursts from the high priority db scan threads until low priority threads can service them.

 

The symptoms that you observe which are a backlog of un-serviced events and also replaced events on the queue is probably indicating that for some reason the sequencer is either slow on the uptake when reading events from the event queue (the sequencer’s update consumption rate is slower than the database’s update production rate), or alternatively some code in the sequencer is even blocked or deadlocked. It is also possible, but maybe somewhat less likely (depending on your perspective :-), that something is wrong in the ca client library or the event queue.

 

The fact that the issue disappears when moving this sequence out of the IOC does help somewhat to fault isolate. It’s true that the ca client library uses a more direct attachment to the database when its embedded inside the IOC, and so there are different failure possibilities. In particular, since this in-memory attachment of the ca client library to the database runs much faster there are increasing possibilities for race conditions manifesting in the users sequence program, in the sequencer infrastructure, in the ca client infrastructure, or in the ca event queue infrastructure.

 

Since we know that there is a significant event queue backlog then we can expect that the thread managing the event queue is probably stuck somewhere. What we need to fault isolate the situation is a stack trace for that thread obtained on vxWorks using the “i” and “tt < task id>” commands (presuming that the sequencer is running on vxWorks). Look for an event queue thread that has almost the same priority as the sequencer. If this is not running on linux then one can use some commands in gdb to get a stack trace {“info threads”, “thread nnn”, and “bt”}. Once we have a few stack trace snapshots from the running system then perhaps it’s easy to look for a culprit by examining what part of the code this thread is spending most its time lingering in. The event queue thread typically waits on an event semaphore when there is no event queue backlog. Otherwise it is typically executing inside of one of the user callbacks.

 

Feel free to call me and or send a list of threads obtained on vxWorks with the “i” command, or in gdb with the “info threads” command if you need help determining which thread to look at.

 

Jeff
______________________________________________________
Jeffrey O. Hill           Email       
[email protected]
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

 

Message content: TSPA

 

With sufficient thrust, pigs fly just fine. However, this is

not necessarily a good idea. It is hard to be sure where they

are going to land, and it could be dangerous sitting under them

as they fly overhead. -- RFC 1925

 

From: [email protected] [mailto:[email protected]] On Behalf Of Shankar, Murali
Sent: Tuesday, October 04, 2011 12:26 PM
To: [email protected]
Subject: Re: Sequence monitor not getting callback

 

I traced it down to ca_add_masked_array_event/ca_create_subscription as well.

 

In a recent run, I had 14 out of 2600 sequences not getting their monitor callbacks. I did a dbel on the non-functioning PV’s (H75 H73 H82 H78 H72 L52 L54 P93 R71 R80 V82 V71 V86 Z80)  and a couple of functioning ones (A01 A75). The results are pasted below. The ones that are not getting their monitor callbacks called seem to have a fair number of “undelivered” or “discarded by replacement” messages. The ones that are functioning have “queue empty”.  Is this expected?

 

Any suggestions as to next steps for debugging are appreciated.

 

Regards,

Murali

 

epics> < /tmp/dbel.txt

dbel H75  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=48, thread=0x7f1058000b60, unused entries=32, discarded by replacement=28, duplicate count =94

, ev 0x7f1094013890, ev que 0x7f10a4008520, ev user 0x7f105400ed70

dbel H73  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=48, thread=0x7f1058000b60, unused entries=32, discarded by replacement=28, duplicate count =94

, ev 0x7f1094013980, ev que 0x7f10a40074f8, ev user 0x7f105400ed70

dbel H82  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=48, thread=0x7f1058000b60, unused entries=32, discarded by replacement=28, duplicate count =94

, ev 0x7f10940138e0, ev que 0x7f10a4008520, ev user 0x7f105400ed70

dbel H78  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=48, thread=0x7f1058000b60, unused entries=32, discarded by replacement=28, duplicate count =94

, ev 0x7f1094013840, ev que 0x7f10a40074f8, ev user 0x7f105400ed70

dbel H72  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=76, thread=0x7f1058000b60, unused entries=52, duplicate count =75

, ev 0x7f10940139d0, ev que 0x7f10a4009548, ev user 0x7f105400ed70

dbel L52  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=76, thread=0x7f1058000b60, unused entries=52, duplicate count =75

, ev 0x1ed5230, ev que 0x7f109c009670, ev user 0x7f105400ed70

dbel L54  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=76, thread=0x7f1058000b60, unused entries=52, duplicate count =75

, ev 0x1ed5280, ev que 0x7f109c00a698, ev user 0x7f105400ed70

dbel P93  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=76, thread=0x7f1058000b60, unused entries=52, duplicate count =75

, ev 0x7f1090014670, ev que 0x7f10700137c8, ev user 0x7f105400ed70

dbel R71  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=76, thread=0x7f1058000b60, unused entries=52, duplicate count =75

, ev 0x7f1090010ac0, ev que 0x7f106c01f178, ev user 0x7f105400ed70

dbel R80  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=49, thread=0x7f1058000b60, unused entries=32, discarded by replacement=27, duplicate count =94

, ev 0x7f1090010a20, ev que 0x7f106c0211c8, ev user 0x7f105400ed70

dbel V82  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=32, thread=0x7f1058000b60, unused entries=32, discarded by replacement=44, duplicate count =93

, ev 0x7f10a0024e10, ev que 0x7f105000bd00, ev user 0x7f105400ed70

dbel V71  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=32, thread=0x7f1058000b60, unused entries=32, discarded by replacement=44, duplicate count =93

, ev 0x7f10a0024dc0, ev que 0x7f105000bd00, ev user 0x7f105400ed70

dbel V86  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=32, thread=0x7f1058000b60, unused entries=32, discarded by replacement=44, duplicate count =93

, ev 0x7f10a0024e60, ev que 0x7f105000bd00, ev user 0x7f105400ed70

dbel Z80  10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM } undelivered=76, thread=0x7f1058000b60, unused entries=52, duplicate count =75

, ev 0x7f109c038ee0, ev que 0x7f1094028f80, ev user 0x7f105400ed70

 

#### Functioning PV’s

 

dbel A01 10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM }, thread=0x7f1058000b60, queue empty, ev 0x7f105800b4e0, ev que 0x7f105400ed70, ev user 0x7f105400ed70

dbel A75 10

1 PV Event Subscriptions ( monitors ).

VAL { VALUE ALARM }, thread=0x7f1058000b60, queue empty, ev 0x7f1058009e10, ev que 0x7f1070006dc0, ev user 0x7f105400ed70

epics>

 

 

 


Replies:
Re: Sequence monitor not getting callback J. Lewis Muir
References:
Re: Sequence monitor not getting callback Shankar, Murali
RE: Sequence monitor not getting callback Jeff Hill

Navigate by Date:
Prev: RE: Sequence monitor not getting callback Jeff Hill
Next: Re: Sequence monitor not getting callback J. Lewis Muir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Sequence monitor not getting callback Jeff Hill
Next: Re: Sequence monitor not getting callback J. Lewis Muir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·