EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: IOC hangs (still)
From: [email protected] (Ralph Lange)
To: [email protected] (EPICS Tech-Talk)
Date: Fri, 27 Jun 1997 17:58:57 +0200 (METDST)
Hi all -

While we're sitting at our test stand again trying to reproduce the IOC's
crashes, we think we have thought of a scenario where the onceQ is getting
filled with crap without some faulty driver or device support doing weird
things.

[First of all: During the last week I have learned *a lot* about CA and the
database by reading the sources and found a couple of things and thoughts
in the concept and design really amazing.]

We found that the crap in the onceQ looked like as if a rngBufPut() call
was interrupted by itself.

This is an excerpt from the onceQ buffer:

00e3e010:  00f7941c 00f7941c 00f7941c 00f7941c   *................*
00e3e020:  00f7941c 00f7941c 00f7941c 00f7941c   *................*
00e3e030:  f7941c00 f7941c00 f7941c00 f7941c00   *................*
00e3e040:  941c00f7 941c00f7 941c00f7 941c00f7   *................*
00e3e050:  941c00f7 1c00f794 1c00f794 1c00f794   *................*
00e3e060:  1c00f794 1c00f794 1c00f794 00f7941c   *................*

00f7941c is a valid record pointer. Do you see how bytes get swallowed from
time to time?

This is what we think is happening: [I'm giving more details than necessary
to give you the chance to find exactly the point where we are wrong ...;-]

scanOnce() is the only function that writes into the onceQ.

The onceQ is not guarded by a mutex semaphore, so interruption of
scanOnce() must be impossible by means of design.

scanOnce() may be called from two different function contexts:
- recGblFwdLink() calls scanOnce() if the record is to be reprocessed. This
  is part of record processing, so possible task contexts are all cb and
  scan tasks (once and periodic) running at priorities between 51 and 65.
- the event_task() function calls scanOnce() indirectly by executing
  event_read() on each event it takes out of its event queue. event_read()
  calls the event->user_sub() which depends on the event type. The
  user_sub() may be (e.g.) eventCallback(), accessRightsCallback() or
  connectionCallback() from dbCa.c which call scanOnce() if the link's type
  is CP or CPP. The possible task contexts for this are CA event tasks (at
  priority 181) and the EV dbCaLink task (at priority 99) which handles the
  event queue for the "user" database.

Imagine the following:
A CA event task (prio 181) is peacefully processing its queue and calls
scanOnce() for a CPP link. While scanOnce() uses rngBufPut() to put the
record pointer into the onceQ, a periodic scan task (prio ~56) gets ready
and interrupts. During record processing the monitor for a CPP link is
fired: db_post_events() puts an event into the database's event queue and
wakes up the owner, i.e. the EV dbCaLink task (prio 99) gets ready. After
the periodic scan task is finished, the EV dbCaLink task resumes execution,
because its priority being lower than the CA event task's. The EV dbCaLink
task processes its event queue, calls the event->user_sub() of the CPP
monitor event and ... scanOnce() is called again thereby interrupting a
call to rngBufPut() by another rngBufPut() which ruins the onceQ.

Solution: Guarding the onceQ with a binary semaphore.

What are we missing?
Ralph
-- 
      __  Ralph Lange                         Email:       [email protected] 
     /\ \                                     WWW: http://www.bessy.de/~lange
    /  \ \  BESSY II                          
   / /\ \ \  Berliner Elektronenspeicherring- Snail:                 BESSY II
  / / /\ \ \  Gesellschaft fuer Synchrotron-               Rudower Chaussee 5
 / / /__\_\ \  strahlung m.b.H.                       D-12489 Berlin, Germany
/ / /________\                                Phone:         +49 30 6392-4862
\/___________/ Control System Group           Fax:                ...   -4859

Replies:
Re: IOC hangs (still) Marty Kraimer
References:
Re: IOC hangs (again) Ralph Lange

Navigate by Date:
Prev: Statically built MEDM/GDCT, xfdApp distribution Bakul Banerjee
Next: Boot files for MVME167 Marian ZUREK
Index: 1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: IOC hangs (still) Ralph Lange
Next: Re: IOC hangs (still) Marty Kraimer
Index: 1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·