EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: ca_put_callback once again
From: "Jeff Hill" <[email protected]>
To: "'Andrew Johnson'" <[email protected]>, <[email protected]>
Cc: "'Ben Franksen'" <[email protected]>
Date: Tue, 23 Nov 2010 09:04:09 -0700
> > #define ECA_PUTCBINPROG     DEFMSG(CA_K_ERROR,     45)
> >
> > that I assumed is there for this case. Not so.
> 
> There is code in src/rsrv that generates this error under some
> circumstances, but I don't think I need to try and describe what 
> causes it.

If a 2nd put notify, from the same client for the same channel, waits for the 1st put notify to finish for more than 60 seconds then the 1st put notify is canceled and ECA_PUTCBINPROG (which has the associated message "Put callback timed out") is returned to the client's callback.

Jeff
______________________________________________________
Jeffrey O. Hill           Email        [email protected]
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107

Message content: TSPA

With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead. -- RFC 1925


> -----Original Message-----
> From: [email protected] [mailto:tech-talk-
> [email protected]] On Behalf Of Andrew Johnson
> Sent: Monday, November 22, 2010 3:24 PM
> To: [email protected]
> Cc: Ben Franksen
> Subject: Re: ca_put_callback once again
> 
> Hi Ben,
> 
> On Sunday 21 November 2010 18:12:32 Ben Franksen wrote:
> >
> > I have made more experiments with ca_put_callback and discovered what
> I
> > think is a serious bug.
> 
> There is a bug, but it's not where you think it is.  It's actually in
> the bo
> record's implementation of the HIGH delay, and fixing it may cause
> screams
> from some database maintainers.  I'll get back to this topic at the end
> of
> this message.
> 
> The code that implements putNotify is complicated, but it does still
> pass
> Marty Kraimer's regression tests and I believe it is still behaving as
> it was
> originally supposed to.
> 
> > If the two calls target different records, but the processing chains
> (or
> > networks) overlap, then what happens is that the second call
> > *immediately* gets a *successful* callback.
> 
> Actually I don't believe that's what's happening, but there is a very
> long-
> standing non-standard behavior in the bo record that is causing what
> you're
> seeing.
> 
> The AppDevGuide section 15.4.4 says:
> 
>     6. In general a set of records may be processed as a result of a
> single
>        dbPutNotify. If a record in the set is found to be active,
> either
>        because PACT is true or because a putNotify already owns the
> record,
>        then that record is not made part of the set of records that
> must
>        complete before the putNotify request completes.
> 
> This is important as it helps explain the behavior you're seeing.
> 
> > This is not what I expected. I had assumed that the CA server would
> send
> > back an error to the second client, saying that there is already
> > another request pending. Indeed, there is an error code in caerr.h,
> > namely
> >
> > #define ECA_PUTCBINPROG     DEFMSG(CA_K_ERROR,     45)
> >
> > that I assumed is there for this case. Not so.
> 
> There is code in src/rsrv that generates this error under some
> circumstances,
> but I don't think I need to try and describe what causes it.
> 
> > The problem is that this defeats the whole purpose of the
> > ca_put_callback resp. putNotify which is to make sure that the
> > processing really has completed. If another client, *any* client
> > anywhere on the control network happens to issue a call that
> interferes
> > with mine it might be that I get a success callback even though
> nothing
> > has completed yet.
> 
> As I said above, that's not what's happening; your second
> ca_put_callback()
> _does_ complete the bo record processing, but the OUT link points to
> the seq
> record that has PACT=TRUE so as the AppDevGuide passage above says, the
> putNotify code drops the seq record from the list of records to wait
> for and
> replies almost immediately.  You could forward link the second bo to
> another
> seq record to prove that.
> 
> If your overall system design allows for overlapping putNotify
> operations
> you need to treat that as a resource conflict and protect (or
> duplicate) the
> records in the overlapping sets.
> 
> > This is a bug, IMO. If a client specifically asks for a callback to
> make
> > sure something it wishes to happen in the server has actually
> happened,
> > then the server must either wait for completion, however long it
> takes,
> > or else respond with an error.
> 
> The promise that putNotify and ca_put_callback make is that all
> processing on
> the local IOC that was directly caused by your put operation has
> finished when
> you receive the callback.  Unfortunately it is not hard for individual
> record
> types or device support to break the requirements that putNotify
> depends on.
> 
> One of those requirements is that the call to recGblFwdLink(prec) in a
> record's process() routine come after _all_ processing.  The long-
> standing
> implementation of the bo record breaks that rule though, and I believe
> that's
> partly what is causing the behavior that you're seeing, see below...
> 
> > BTW, the scenario can be easily tested, this is my test db file
> (which I
> > run with 'softIoc -d <dbfile>'):
> 
> Thanks that helps me understand your complaint.  The thing about the bo
> records is that using the HIGH delay field of the bo record does not
> trigger
> asynchronous completion; the record code calls recGblFwdLink(prec) at
> the end
> of processing the VAL=1 phase.  The act of processing the later VAL=0
> phase is
> run by a private timer callback, and will not be seen by the putNotify
> code as
> being caused by your initial ca_put_callback().
> 
> > This is the test command line and the output:
> >
> > ben@sarun[3]: .../seq/work > time caput -c -w5 pvPutAsync1 1 &; sleep
> 1;
> > time caput -c -w5 pvPutAsync2 1
> > [1] 32140
> > Old : pvPutAsync1                    Low
> > Old : pvPutAsync2                    Low
> > New : pvPutAsync2                    High
> > caput -c -w5 pvPutAsync2 1  0,00s user 0,01s system 22% cpu 0,036
> total
> > New : pvPutAsync1                    Low
> > caput -c -w5 pvPutAsync1 1  0,00s user 0,00s system 0% cpu 2,034
> total
> > [1]  + done       time caput -c -w5 pvPutAsync1 1
> 
> Your second caput is completing immediately because its bo record tries
> to
> process the OUT link but finds the seq.PACT=TRUE, so it gives up and
> tells
> putNotify that everything it started has finished.  A second later the
> HIGH
> delay kicks in and reprocesses the record, but as I said above that's
> not seen
> by the putNotify since it has already notified you.
> 
> You have several delays in your test which are interacting and may be
> making
> it hard for you to follow what's actually going on; try setting HIGH to
> 5 and
> doing a single caput -c to see what I meant about the bo reprocessing
> being
> disconnected from the putNotify.  It's also very instructive to set
> TPRO=1 on
> all your records as that tells you if the IOC managed to process a
> record
> "<task>: Process <rec>" or was if it was unable to because it was
> already busy
> "<task>: Active <rec>".
> 
> Ok, so there's a bug in the bo record; the simple fix is to only
> process the
> forward link of a bo record that has HIGH set at the end of the HIGH
> delay.  I
> don't think that would be very popular though, and there might be sites
> that
> depend on the existing ca_put_callback() to a bo record behavior, so
> even the
> slightly more complex fix that I can see isn't necessarily something we
> want
> to do.
> 
> Discuss...
> 
> HTH,
> 
> - Andrew
> 
> PS: It's thanksgiving week here in the US, I may not respond to email
> for the
> rest of this week.
> --
> If a man is offered a fact which goes against his instincts, he will
> scrutinize it closely, and unless the evidence is overwhelming, he will
> refuse to believe it.  If, on the other hand, he is offered something
> which affords a reason for acting in accordance to his instincts, he
> will accept it even on the slightest evidence.  -- Bertrand Russell




References:
ca_put_callback once again Ben Franksen
Re: ca_put_callback once again Andrew Johnson

Navigate by Date:
Prev: RE: MAXv issue Mark Rivers
Next: RE: ca_put_callback once again Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: ca_put_callback once again Ben Franksen
Next: MAXv issue Dirk Zimoch
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 23 Nov 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·