Experimental Physics and
Industrial Control System

Tim Mooney <[email protected]> · Tue, 23 Nov 2010 12:46:46 -0600 (CST)

----- Original Message -----
> From: "Ben Franksen" <[email protected]>
> To: [email protected]
> Sent: Monday, November 22, 2010 9:59:04 PM
> Subject: Re: ca_put_callback once again
> Hi Andrew
> 
> We have two separate issues here.
> 
> (1) The bo record. I was aware of the fact that the bo record does not
> remain active during its HIGH time. Maybe I should have explained the
> idea behind the test db file a bit more, but I added the test setup as
> an afterthought and the mail was already long enough. I am using the
> bo
> record's HIGH field only so that I can see a change in the value with
> a
> camonitor in another window. The actual asynchronous processing is
> done
> by the seq record which has a longer delay for exactly this reason. I
> could as well have used a longout for the two "front-end" records and
> the result would have surprised me as much.
> 
> I am not proposing to change the bo behaviour.
> 
> (2) The putNotify behaviour. I admit that I did not read the manual
> carefully enough. You cite the ADG:
> 
> >     6. In general a set of records may be processed as a result of a
> > single dbPutNotify. If a record in the set is found to be active,
> > either because PACT is true or because a putNotify already owns the
> > record, then that record is not made part of the set of records that
> > must complete before the putNotify request completes.
> 
> This should have led me to expect the behaviour I see.
> 
> > Your second caput is completing immediately because its bo record
> > tries to process the OUT link but finds the seq.PACT=TRUE, so it
> > gives up and tells putNotify that everything it started has
> > finished.
> 
> Agreed, but this is exactly the problem I complained about.
> 
> It tells me that an operation successfully completed, when in fact it
> did *not* succeed. The bo was supposed to start more than its own
> processing, namely processing of the seq record, and this id not
> happen.

I think there's a legitimate question about what should happen in this
case, and I hope to get a better view of it by thinking of similar cases:

Suppose the second dbput to the seq record had resulted from a loop in
the database, rather than from a second ca_put_callback.  Normal database
processing rules would have called for loop processing to stop when it
tried to process a record already processing as the result of a dbput.
I think neither of us would regard this as an error -- I have databases,
in fact, that rely on this behavior to function correctly.  But this
case would have looked almost exactly the same as yours to the database
processing routines: the only difference is that the second dbput would
have propagated the /original/ put_callback to the seq record for a
second time, instead of propagating a /new/ put_callback.  (Thus, one
conceivable solution might involve named putNotifys.)

Another similar case: Suppose you had executed the bo records with 
ordinary ca_puts, instead of ca_put_callbacks.  You would have gotten
exactly the same database-processing result, which is good -- ca_put
should achieve the same result as ca_put_callback.  But, with caput,
there's no possibility of signaling the collision as an error, because
the execution is not traced, so when a collision occurs no code knows
who started it.  (The nice thing about ca_put_callback is that it gives
the caller has a way to avoid a database-processing collision, as long
as nobody else is trying to process the database at the same time.)

One possible view of this is that the problem is in the collision, not
in either the database-processing rules or in ca_put_callback.  It's
unfortunate that the user can rarely be expected to know whether or
not two puts are going to the same lock set and might collide, but I
can't think of a way out that doesn't involve a deep redesign.

> It should be possible to record such a failure and return it
> to
> the requester. Even better, the CA server could detect the failure and
> wait until the record in question is no longer busy. It already does
> so
> under certain circumstances, namely if I target the same front-end
> record from the same client: in this case the second call waits for
> the
> first fo finish, i.e. the CA server queues the second request.
> (Unfortunately I cannot as easily demonstrate this because caput does
> not allow to group several asynchronous put operations in one
> command.)
> 
> If I target a different record (even from the same client), this
> queuing
> does not happen. This is inconsistent. I think that the queueing
> should
> ideally always happen for ca_put_callback as long as any record in teh
> chain is busy. Failing that, or if the wait times out, at least an
> error should be sent back.
> 
> Considering the effort that went into the implementation of the
> putNotify feature it seems a shame that it cannot be used for what it
> was designed, at least not reliably.
>
> Since I don't expect this will be fixed any time soon, I will warn
> users
> of the sequencer that using pvPut(var,SYNC) is inherently unreliable.
> Instead completion should be detected by other means, such as reading
> back a status bit from the hardware.

We tried that.  For years I tried different ways to make that work.  It was
awful.  put_callback made a *huge* improvement in the speed, robustness, and
flexibility of scans.  Inmportantly, it allows clients to scan a PV without
knowing which other PV to read to detect completion.

I agree that put_callback has problems with collisions, and I don't mean
to minimize the problems.  But I think it's so much better than the
achievable alternatives that I would not suggest SNL developers avoid it.

> Cheers
> Ben

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System