Experimental Physics and
Industrial Control System

Benjamin Franksen <[email protected]> · Wed, 24 Nov 2010 11:51:15 +0100

Hi Tim, Andrew, Jeff

thanks for all your comments.

I don't think it is necessary or even a good idea to try to avoid all 
collisions in the database. For the moment I would be happy just to be 
notified if parts of the process network is busy and so doesn't get 
processed as a result of my put.

I went ahead and took a look at the code. It turns out that this can be 
achieved with a minimal effort (a 4 lines change to the code in 
src/db/dbPutNotify.[ch]). And indeed, this takes care that the CA server 
sends back an error. My test now says:

> time caput pvPutAsync1 1 &; sleep 1; time caput -c -w5 pvPutAsync2 1             
1 
[1] 19959
Old : pvPutAsync1                    Low 
New : pvPutAsync1                    High 
caput pvPutAsync1 1  0.01s user 0.00s system 20% cpu 0.038 total
[1]  + 19959 done       time caput pvPutAsync1 1
Old : pvPutAsync2                    Low 
Error occured writing data.
caput -c -w5 pvPutAsync2 1  0.00s user 0.01s system 22% cpu 0.035 total

However, I though it would be nicer if the server could send a more specific 
error code, instead of the generic "put failed", the more so because the put 
itself did in fact not fail, only the processing chain gets stopped short.

This is also easy to do, it needs a single 4 lines change in 
src/rsrv/camessage.c. Plus the two lines for the new error code -- actually 
just a warning, as the definition is:

#define ECA_PUTCBNOTCMPL    DEFMSG(CA_K_WARNING,   61)

Then I saw that caput does not retrieve the error message from the CA 
library, so I added this too.

All in all:

> darcs whatsnew -s
M ./src/ca/access.cpp -1 +2
M ./src/ca/caerr.h +1
M ./src/catools/caput.c -1 +2
M ./src/db/dbNotify.c -1 +4
M ./src/db/dbNotify.h -1 +2
M ./src/rsrv/camessage.c -1 +4

A patch against base-3.14.12-rc1 is attached.

My test now does what it's supposed to do:

> time ./bin/linux-x86/caput pvPutAsync1 1 &; sleep 1; time ./bin/linux-
x86/caput -c -w5 pvPutAsync2 1
[1] 29635
Old : pvPutAsync1                    Low
New : pvPutAsync1                    High
./bin/linux-x86/caput pvPutAsync1 1  0.01s user 0.00s system 24% cpu 0.032 
total
[1]  + 29635 done       time ./bin/linux-x86/caput pvPutAsync1 1
Old : pvPutAsync2                    Low
Error occured writing data:Put callback processing incomplete
./bin/linux-x86/caput -c -w5 pvPutAsync2 1  0.00s user 0.00s system 12% cpu 
0.032 total

On Tuesday, November 23, 2010, you wrote:
> ----- Original Message -----
> 
> > From: "Ben Franksen" <[email protected]>
> > To: [email protected]
> > Sent: Monday, November 22, 2010 9:59:04 PM
> > Subject: Re: ca_put_callback once again
> > Hi Andrew
> > 
> > We have two separate issues here.
> > 
> > (1) The bo record. I was aware of the fact that the bo record does not
> > remain active during its HIGH time. Maybe I should have explained the
> > idea behind the test db file a bit more, but I added the test setup as
> > an afterthought and the mail was already long enough. I am using the
> > bo
> > record's HIGH field only so that I can see a change in the value with
> > a
> > camonitor in another window. The actual asynchronous processing is
> > done
> > by the seq record which has a longer delay for exactly this reason. I
> > could as well have used a longout for the two "front-end" records and
> > the result would have surprised me as much.
> > 
> > I am not proposing to change the bo behaviour.
> > 
> > (2) The putNotify behaviour. I admit that I did not read the manual
> > 
> > carefully enough. You cite the ADG:
> > >     6. In general a set of records may be processed as a result of a
> > > 
> > > single dbPutNotify. If a record in the set is found to be active,
> > > either because PACT is true or because a putNotify already owns the
> > > record, then that record is not made part of the set of records that
> > > must complete before the putNotify request completes.
> > 
> > This should have led me to expect the behaviour I see.
> > 
> > > Your second caput is completing immediately because its bo record
> > > tries to process the OUT link but finds the seq.PACT=TRUE, so it
> > > gives up and tells putNotify that everything it started has
> > > finished.
> > 
> > Agreed, but this is exactly the problem I complained about.
> > 
> > It tells me that an operation successfully completed, when in fact it
> > did *not* succeed. The bo was supposed to start more than its own
> > processing, namely processing of the seq record, and this id not
> > happen.
> 
> I think there's a legitimate question about what should happen in this
> case, and I hope to get a better view of it by thinking of similar cases:
> 
> Suppose the second dbput to the seq record had resulted from a loop in
> the database, rather than from a second ca_put_callback.  Normal database
> processing rules would have called for loop processing to stop when it
> tried to process a record already processing as the result of a dbput.
> I think neither of us would regard this as an error -- I have databases,
> in fact, that rely on this behavior to function correctly.  But this
> case would have looked almost exactly the same as yours to the database
> processing routines: the only difference is that the second dbput would
> have propagated the /original/ put_callback to the seq record for a
> second time, instead of propagating a /new/ put_callback.  (Thus, one
> conceivable solution might involve named putNotifys.)
> 
> Another similar case: Suppose you had executed the bo records with
> ordinary ca_puts, instead of ca_put_callbacks.  You would have gotten
> exactly the same database-processing result, which is good -- ca_put
> should achieve the same result as ca_put_callback.  But, with caput,
> there's no possibility of signaling the collision as an error, because
> the execution is not traced, so when a collision occurs no code knows
> who started it.  (The nice thing about ca_put_callback is that it gives
> the caller has a way to avoid a database-processing collision, as long
> as nobody else is trying to process the database at the same time.)
> 
> One possible view of this is that the problem is in the collision, not
> in either the database-processing rules or in ca_put_callback.  It's
> unfortunate that the user can rarely be expected to know whether or
> not two puts are going to the same lock set and might collide, but I
> can't think of a way out that doesn't involve a deep redesign.
> 
> > It should be possible to record such a failure and return it
> > to
> > the requester. Even better, the CA server could detect the failure and
> > wait until the record in question is no longer busy. It already does
> > so
> > under certain circumstances, namely if I target the same front-end
> > record from the same client: in this case the second call waits for
> > the
> > first fo finish, i.e. the CA server queues the second request.
> > (Unfortunately I cannot as easily demonstrate this because caput does
> > not allow to group several asynchronous put operations in one
> > command.)
> > 
> > If I target a different record (even from the same client), this
> > queuing
> > does not happen. This is inconsistent. I think that the queueing
> > should
> > ideally always happen for ca_put_callback as long as any record in teh
> > chain is busy. Failing that, or if the wait times out, at least an
> > error should be sent back.
> > 
> > Considering the effort that went into the implementation of the
> > putNotify feature it seems a shame that it cannot be used for what it
> > was designed, at least not reliably.
> > 
> > Since I don't expect this will be fixed any time soon, I will warn
> > users
> > of the sequencer that using pvPut(var,SYNC) is inherently unreliable.
> > Instead completion should be detected by other means, such as reading
> > back a status bit from the hardware.
> 
> We tried that.  For years I tried different ways to make that work.  It
> was awful.  put_callback made a *huge* improvement in the speed,
> robustness, and flexibility of scans.  Inmportantly, it allows clients
> to scan a PV without knowing which other PV to read to detect
> completion.
> 
> I agree that put_callback has problems with collisions, and I don't mean
> to minimize the problems.  But I think it's so much better than the
> achievable alternatives that I would not suggest SNL developers avoid it.
> 
> > Cheers
> > Ben

Helmholtz-Zentrum Berlin fÃr Materialien und Energie GmbH   
Hahn-Meitner-Platz 1, 14109 Berlin   
Vorsitzende des Aufsichtsrates: Prof. Dr. Dr. h.c. mult. Joachim Treusch   
Stellvertretende Vorsitzende: Dr. Beatrix Vierkorn-Rudolph GeschÃftsfÃhrer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Prof. Dr. Dr. h.c. Wolfgang Eberhardt, Dr. Ulrich Breuer   
Sitz der Gesellschaft: Berlin Handelsregister: AG Charlottenburg, 89 HRB 5583   

Disclaimer automatically attached by the E-Mail Security Appliance   
mail0.bessy.de 11/24/10 at Helmholtz-Zentrum Berlin GmbH. 

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System