Experimental Physics and
Industrial Control System

[email protected] · Mon, 18 Jul 94 09:31:21 -0500

Reply to comments by Winans.

> 
> 
> 
> >I think that the problem can be stated as follows: 
> >
> >1) For asyn records what do we do if anyone besides that record's
> >   support module attempts to write into one or more fields of the
> >   record between asyn start and asyn completion? 
> 
> I would like to see the record locked such they are treated the same
> way that sync records... and that is to keep it completely locked and
> block any other thread from accessing it until it is finished.

The entire lock set? Think of what this means.

>I do
> not believe that a queue alone would do this because, depending on how
> the thread that does the put() or get() is implemented, it can
> introduce a time delay between the access request and the resulting 
> processing that is, again, not the same as how a sync record is
> processed.  Therefore a queue would probably end up serveing as a
> band-aid to yet another problem caused by some larger and more
> primative annoyance.

What primative annoyance? By the way I remember someone making the claim that
queuing was the answer to all problems. I am happy to see this is no longer
true.

>In regards to keeping I believe that the whole reason for even HAVING async processing in the
> first place was caused by our decision on how to handle the usage
> of CPU on the threads that process records.  Again, I have no better
> ideas about that (other than that load balanceing might be nice, but
> that is _another_ issue.)  The technical issue with the threads is that
> we have multiple record instances whose processing is done by the same
> thread.  If we block that thread for too long a time, we fall behind in
> processing everything the thread is responsible for.

Load balencing has nothing to do with this discussion. It is true that many
records share the same thread. See the next comment. 

> In an ideal world, every record could have its own thread.  Then, that
> thread could block all it wants.  And this problem would not even exist.
> 

Ah a 5000 record database with 5000 tasks!! That would really be great on a
68040 running vxWorks. 

It would still not solve the problem. Let me give an example. Lets say that we
have a pid record that drives an asyn ao record. Lets say that the ao is a
GPIB device. Once and awhile the ao request gets delayed because the GPIB
requests queue gets unusually long. Should we then make the pid record wait?
I say no. Just let it proceed. It can continue writing values to the ao record
and as long as the latest value is used when the ao record finally gets a
chance to send a GPIB message (cacheing) things work as well as they can. By
the way dont give the solution of saying that the pid record can look to see
if the ao record is busy because it will not become busy UNTIL it sends the
GPIB message.

> I would be interested in discussing a detailed walkthru of what is
> really happening to the data, along with why, in the current
> implementation.  And then a few alternate blocking orientated schemes. 
> I believe that we will be able to provide a locking type service
> (so that everything appears to be synchronous) without holding up the
> entire workload of a thread.
> 

The application developers guide has at least the beginnings of this walk
through.

> Just to make things clear for those that have not REALLY considered
> what the task threads are doing in async processing... There is a
> thread that is used for the call to the process() routine.  It can call
> it because it is periodic and time to go, because a put() to a PP
> record was just done, or a get() is GOING TO BE DONE on a PP field and
> the record needs processing first.
> 

Lets remember that we also have Event and I/O Event scanning.

> Tere are some async details:
> 
> 1) Periodic processing.
>    No big deal here.  Nothing is really WAITING for the results of the
>    processing unless a monitor is on some fields in the record.  And
>    then the thread that put the monitor on the field is not waiting on
>    a result at this time.  Theoretically a periodic process call COULD
>    block and wait as needed if the thread doing it does not have any
>    other work that needs immediate attention.
> 

Like the other records being processed at the same scan rate!!! Also remember
that each periodically scanned record can be the root of a tree of
records to be processed.

>    The way this works right now is that the thread that calls process
>    to get the record processed only does the first async phase and then
>    returns.  At a future time, the second phase is done by a different
>    thread (often a callback task... never the periodic scan task, by
>    the way.)  Both phases of the processing hold the record lock while
>    running and PACT=TRUE during the time between process phases.
> 
> 2) put() to a PP field.
>    In this operation the record is locked, the target field updated and
>    then the process routine is called.  When process returns, PACT
>    will be set TRUE, the record is unlocked and the put() call returns.

Should read: process returns leaving pact TRUE.

>    The caller to put() does not even know that the record is running in
>    async mode and continues on its way.  When async completion time
>    comes around, some task thread (typically a callback) calls process.
>    The record is locked, async completion is done, PACT is set FALSE,
>    the record is unlocked, and the call to process() returns.
> 
>    This appears OK, but I have often wondered about the impact of the
>    put() operation possibly setting signal values AFTER the put() call
>    returns. It is possible that the lack of guarenteed operation order 
>    (like you would otherwise get with a truely sync device support)
>    could cause trouble.  At the bare minimum, it is not even close to
>    being an intuitive way for it to work.
>

What is not intuitive? A value is put in a record and the record process
routine is called. If necessary process converts to raw value and asks to have raw
value sent to device. Some time later (asyn completion) the device says if
request was successful. The record then performs alarm and monitoring. I agree
it would be nicer if everything happened instantly but thats not the real world.

> 3) get() on a PP field.
>    This is my personal fave.  This little beauty is similar to the
>    put() but it processes the record first.
> 
>    So then, the get() calls process which locks the record, does phase
>    1 of async processing, leaves PACT=TRUE, takes the value from
>    the record, unlocks the record, and returns.  Note that the value
>    was grabbed at this point... where it is fairly sure that the device
>    support has not set it yet or could be currently altering it.  So
>    then later the async phase 2 time comes and process() is called,
>    record locked, value updated, record unlocked, and the call to
>    process() returns.
> 
>    This one leaves little to be desired.  The only correct thing here
>    is that the record is getting processed at the right time.
> 

The application developers guide mentions this case and warns that it is the
application developers responsibility to prevent it from happening.

> In all three of these cases, it should be pretty obvious that the
> record field values are exposed to be the subject of put() and get()
> calls when idle and PACT=TRUE that do not do the processing that would
> be done with sync records, return and/or generate possible stale or
> incorrect values that can leave the record with internal referential
> integrity problems (that are sort-a fixed by the extra processing call
> done when it finally finishes when the put-cacheing is enabled.)

What is it that is supposed to be pretty obvious?

> And
> all this is due to the fact that we don't want to block the thread that
> made the call to put() get() or process().
>

Yes this is the root of our problems. We could make everything MUCH MUCH
slower and easily solve our problem. Asking for a 10 Hertz scan rate could
mean at most 10 hertz but give no guarantee of actual rate.

> >2) Are gets always meaningfull if values are read between the two
> >   phases?
> 
> No.  The prime example of this is the get() that is done while a device
> support module is updating a waveform array.  The get() can end up with
> the first 1/2 of the new waveform and last 1/2 of the old one.  This is
> caused by a get() that is done by a high priority vxWorks thread that
> comes along while a device support module (running at a lower priority
> thread) is churning thru a data conversion or copy operation.
> 

This does not have to be true. The device support can keep two arrays. It can
then do one of two things: It can make one array private and one public. At
asyn completion it can copy the private array to the public array. The second
thing it can do is double buffer. When anyone asks to read the array the
get_array_info record support routine is called with the dbAddr structure as
one argument. It could set pfield to the array that currently has the latest
data. For performace or memory usage, if neither of these is desired then the
user should be warned. If the user is obtaining data via a Channel Access
monitor the monitor will be invoked after the asyn completion. If the value
can not be read out until after the beginning of the next asyn start the
client is in trouble anyway because it can't keep up.

> I believe that we have not seen problems like this yet (at least with
> GPIB), because I run the GPIB work task (that would do the conversion &
> copy) at a higher priority than most other stuff on the IOC.  This is a
> bit of a drag because this conversion stuff is time consuming and
> should be done at a really low priority.
> 
> I regret that I will not be able to discuss this stuff for the next 2 1/2
> weeks as I will be at a class and on vacation.  This is an issue that
> has been a thorn in my side since my 4th week with APS (3 years even at
> the EPICS meeting in June.)  And NOTHING would please me more
> (professionally speaking) than to see it improved upon.

I still have not heard any perfect solution. Caching puts and locking
gets are an improvement on what existed 3 years ago.

> 
> ! John Winans                     Advanced Photon Source  (Controls)    !
> ! [email protected]       Argonne National Laboratory, Illinois !
> !                                                                       !
> !"The large print giveth, and the small print taketh away." - Tom Waits !

I just want to add that I see only two problems with what we are presently
doing. Remember also that problems only exist for asyn records.

1) If something outside of record/device support pokes values into a record
   between the asyn start and asyn completion phases how does this affect 
   record procssing? I also will say that there are solutions available.
   a) The put caching implemented in 3.12 makes sure that the record will get
      processed again when the asyn completion phase completes.
   b) If some other field can affect the current record processing the field
      can be declared SPC_MOD and the rercord support routine will be called
      whenever the field is changed.

2) Between asyn start ans asyn complete fields can be out of sync. For
   example if a put is done to an asyn output record the alarm fields
   are not changed until the asyn completion phase. Thus gets to the record
   between asyn start and asyn complete can have the value and alarms
   conflict. Note that monitors are used thatn they go off at asyn completion
   and everything is back in sync.

I think that these are small problems to face for the performance and
flexability we obtain.

Marty Kraimer

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System