EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Fwd: Sequencer bug?
From: Benjamin Franksen <[email protected]>
To: [email protected]
Date: Thu, 25 Feb 2010 14:43:51 +0100
On Wednesday 24 February 2010, Andrew Johnson wrote:
> On Wednesday 24 February 2010 14:43:09 Eric Norum wrote:
> > So, what is the bug?
> > I can certainly reproduce it.   I thought that adding 'option +r' at
> > the beginning would fix it, but no such luck.
> >
> > >> Since there is only one place where x gets set to one (in the
> > >> master), and one place where it gets reset to zero (in the slave),
> > >> the assertions in the calls to check() should never fail.
> 
> The above statement is false; x can also be set by the sequencer itself
>  since it is named in a monitor statement.  

Yes.

>  I haven't played with this,
>  but I suspect the issue is that the state sets don't wait for the CA
>  round-trip, and they can get ahead of the server replies.

Not exactly. We checked (experiment-wise) that state sets do indeed wait for 
the CA round-trip. However, it is true that nevertheless "the state sets ... 
can get ahead of the server replies". The reason is that an event generated 
by a pvPut (and the subsequent change in the PV) arrives at /both/ state 
sets, so both wake up and try to check their when-conditions.

Now, if the state set that issued the pvPut in the first place gets woken 
up, but immediately, before it has a chance to check the condition, gets 
interrupted, then the other state set may continue /and overwrite the value 
of the state variable/. When the original state set finally gets the 
processor and checks the condition things get out of control. It's a 
classical race condition.


For the interested, we (that is Götz Pfeiffer and me) have analyzed a 
concrete trace and re-constructed execution in detail. The following is a 
slightly modified version of the program, annotated with some ad-hoc made-up 
syntax to describe (in the comments) the flow of execution:

program SeqTest

%{
#include <assert.h>

static int check_zero(x) {
    assert(x==0);
    return 1;
}

static int check_one(x) {
    assert(x==1);
    return 1;
}
}%

int x;
assign x to "x";
monitor x;

ss Master {
    state idle {
        when (check_zero(x)) {                          /* 3, 24, 34 */
            printf("Master: after check (x=%d)\n",x);   /* 4, 25 */
            x = 1;                                      /* 5, 26 */
            printf("Master: before put (x=%d)\n",x);    /* 6, 27 */
            pvPut(x);                                   /* 7->ev1[1]
                                                           28->ev3[1] */
            printf("Master: after put (x=%d)\n",x);     /* 8, 29 */
        } state work
    }
    state work {                                        /* 9:S
                                                           10<-ev1[1]
                                                           30<-ev2[0] */
        when (x==0) {                                   /* 21, 31 */
            printf("Master: after when (x=%d)\n",x);    /* 1, 22, 32 */
        } state idle                                    /* 2, 23
                                                           33<-ev3[1] */
    }
}

ss Slave {
    state idle {                                        /* 10<-ev1[1]
                                                           20:M
                                                           30<-ev2[0]
                                                           33<-ev3[1] */
        when (x==1) {                                   /* 11 */
            printf("Slave: after when (x=%d)\n",x);     /* 12 */
        } state work
    }
    state work {                                        /* 13 */
        when (check_one(x)) {                           /* 14 */
            printf("Slave: after check (x=%d)\n",x);    /* 15 */
            x = 0;                                      /* 16 */
            printf("Slave: before put (x=%d)\n",x);     /* 17 */
            pvPut(x);                                   /* 18->ev2[0] */
            printf("Slave: after put (x=%d)\n",x);      /* 19 */
        } state idle
    }
}

The annotations (in comments) are to be understood as follows:

timestamp / program counter	= integer     (1,2,...)
event 					= ev<number>[<value>]  (e.g. ev1[1])
event generation 			= "->" event
event consumption			= "<-" event
task switch				= ":S" | ":M" (switch to slave resp. master)

These are the last 14 lines of the output (annotated with "program counters" 
and event generation):

1   Master: after when (x=0)
4   Master: after check (x=0)
6   Master: before put (x=1)
7       (pvPut generates ev1[1])
8   Master: after put (x=1)
12  Slave: after when (x=1)
15  Slave: after check (x=1)
17  Slave: before put (x=0)
18      (pvPut generates ev2[0])
19  Slave: after put (x=0)
22  Master: after when (x=0)
25  Master: after check (x=0)
27  Master: before put (x=1)
28      (pvPut generates ev3[1])
29  Master: after put (x=1)
32  Master: after when (x=0)
34  check_zero: Assertion `x==0' failed.


The interesting stuff happens after point 9; control switches to the slave 
and afterwards ev1[1] arrives, causing both state sets to unblock. But only 
the slave actually proceeds, overwriting x with 0 and sending ev2[0]. Thus, 
when the master finally gets back control at point 20, it will see the 
"wrong" (overwritten) value for x, so proceeds to send yet another event 
ev3[1] (at point 28), while ev2[0] has not yet been consumed. At point 30, 
ev2[0] finally arrives and gets consumed by both state sets. The master 
keeps going, goes to state idle and at the same time ev3[1] arrives, 
invalidating the assertion that x is zero.

It remains to draw practical conclusions for how to avoid such race 
conditions in SNL code. Should we generally be wary of pvPut and monitor to 
the same state variable? Could the compiler warn us about that? Or is this 
too restrictive?

Cheers
Ben


Replies:
Re: Fwd: Sequencer bug? Andrew Johnson
Re: Fwd: Sequencer bug? Tim Mooney
References:
Re: Sequencer bug? Benjamin Franksen

Navigate by Date:
Prev: Re: caml Pelaia II, Tom
Next: EPICS Spring 2010 Collaboration meeting Di Maio Franck
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Sequencer bug? Benjamin Franksen
Next: Re: Fwd: Sequencer bug? Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·