EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

<19941995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index <19941995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Asynchronous Momentary Binary Output Bug
From: [email protected]
Date: Fri, 15 Jul 94 15:29:36 -0500
>I think that the problem can be stated as follows: 
>
>1) For asyn records what do we do if anyone besides that record's
>   support module attempts to write into one or more fields of the
>   record between asyn start and asyn completion? 

I would like to see the record locked such they are treated the same
way that sync records... and that is to keep it completely locked and
block any other thread from accessing it until it is finished.  I do
not believe that a queue alone would do this because, depending on how
the thread that does the put() or get() is implemented, it can
introduce a time delay between the access request and the resulting 
processing that is, again, not the same as how a sync record is
processed.  Therefore a queue would probably end up serveing as a
band-aid to yet another problem caused by some larger and more
primative annoyance.

I believe that the whole reason for even HAVING async processing in the
first place was caused by our decision on how to handle the usage
of CPU on the threads that process records.  Again, I have no better
ideas about that (other than that load balanceing might be nice, but
that is _another_ issue.)  The technical issue with the threads is that
we have multiple record instances whose processing is done by the same
thread.  If we block that thread for too long a time, we fall behind in
processing everything the thread is responsible for.

In an ideal world, every record could have its own thread.  Then, that
thread could block all it wants.  And this problem would not even exist.

I would be interested in discussing a detailed walkthru of what is
really happening to the data, along with why, in the current
implementation.  And then a few alternate blocking orientated schemes. 
I believe that we will be able to provide a locking type service
(so that everything appears to be synchronous) without holding up the
entire workload of a thread.

Just to make things clear for those that have not REALLY considered
what the task threads are doing in async processing... There is a
thread that is used for the call to the process() routine.  It can call
it because it is periodic and time to go, because a put() to a PP
record was just done, or a get() is GOING TO BE DONE on a PP field and
the record needs processing first.

Tere are some async details:

1) Periodic processing.
   No big deal here.  Nothing is really WAITING for the results of the
   processing unless a monitor is on some fields in the record.  And
   then the thread that put the monitor on the field is not waiting on
   a result at this time.  Theoretically a periodic process call COULD
   block and wait as needed if the thread doing it does not have any
   other work that needs immediate attention.

   The way this works right now is that the thread that calls process
   to get the record processed only does the first async phase and then
   returns.  At a future time, the second phase is done by a different
   thread (often a callback task... never the periodic scan task, by
   the way.)  Both phases of the processing hold the record lock while
   running and PACT=TRUE during the time between process phases.

2) put() to a PP field.
   In this operation the record is locked, the target field updated and
   then the process routine is called.  When process returns, PACT
   will be set TRUE, the record is unlocked and the put() call returns.
   The caller to put() does not even know that the record is running in
   async mode and continues on its way.  When async completion time
   comes around, some task thread (typically a callback) calls process.
   The record is locked, async completion is done, PACT is set FALSE,
   the record is unlocked, and the call to process() returns.

   This appears OK, but I have often wondered about the impact of the
   put() operation possibly setting signal values AFTER the put() call
   returns.  It is possible that the lack of guarenteed operation order 
   (like you would otherwise get with a truely sync device support)
   could cause trouble.  At the bare minimum, it is not even close to
   being an intuitive way for it to work.

3) get() on a PP field.
   This is my personal fave.  This little beauty is similar to the
   put() but it processes the record first.

   So then, the get() calls process which locks the record, does phase
   1 of async processing, leaves PACT=TRUE, takes the value from
   the record, unlocks the record, and returns.  Note that the value
   was grabbed at this point... where it is fairly sure that the device
   support has not set it yet or could be currently altering it.  So
   then later the async phase 2 time comes and process() is called,
   record locked, value updated, record unlocked, and the call to
   process() returns.

   This one leaves little to be desired.  The only correct thing here
   is that the record is getting processed at the right time.

In all three of these cases, it should be pretty obvious that the
record field values are exposed to be the subject of put() and get()
calls when idle and PACT=TRUE that do not do the processing that would
be done with sync records, return and/or generate possible stale or
incorrect values that can leave the record with internal referential
integrity problems (that are sort-a fixed by the extra processing call
done when it finally finishes when the put-cacheing is enabled.)  And
all this is due to the fact that we don't want to block the thread that
made the call to put() get() or process().

>2) Are gets always meaningfull if values are read between the two
>   phases?

No.  The prime example of this is the get() that is done while a device
support module is updating a waveform array.  The get() can end up with
the first 1/2 of the new waveform and last 1/2 of the old one.  This is
caused by a get() that is done by a high priority vxWorks thread that
comes along while a device support module (running at a lower priority
thread) is churning thru a data conversion or copy operation.

I believe that we have not seen problems like this yet (at least with
GPIB), because I run the GPIB work task (that would do the conversion &
copy) at a higher priority than most other stuff on the IOC.  This is a
bit of a drag because this conversion stuff is time consuming and
should be done at a really low priority.

I regret that I will not be able to discuss this stuff for the next 2 1/2
weeks as I will be at a class and on vacation.  This is an issue that
has been a thorn in my side since my 4th week with APS (3 years even at
the EPICS meeting in June.)  And NOTHING would please me more
(professionally speaking) than to see it improved upon.

! John Winans                     Advanced Photon Source  (Controls)    !
! [email protected]       Argonne National Laboratory, Illinois !
!                                                                       !
!"The large print giveth, and the small print taketh away." - Tom Waits !

Navigate by Date:
Prev: Re: Re: Asynchronous Momentary Binary Output Bug mrk
Next: Re: Re: Asynchronous Momentary Binary Output Bug mrk
Index: <19941995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Re: Asynchronous Momentary Binary Output Bug mrk
Next: Re: Re: Asynchronous Momentary Binary Output Bug mrk
Index: <19941995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·