EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Status Report
From: "Jeff Hill" <[email protected]>
To: "'Matej Sekoranja'" <[email protected]>
Cc: "'Thomas Pelaia II'" <[email protected]>, "'Hiroyuki Sako'" <[email protected]>, "'John Galambos'" <[email protected]>, "'Christopher K. Allen'" <[email protected]>, "'Ernest Williams'" <[email protected]>, "'Kay Kasemir'" <[email protected]>, "'NP Rees'" <[email protected]>, "'EPICS core-talk'" <[email protected]>
Date: Fri, 22 Dec 2006 15:32:17 -0700

Hi Matej,

 

> While testing I've also found a (dead)lock in CA library.

> Simple lines as:

> for (p = 0; p < 1000; p++) {

>        ca_create_channel (pvs[n].name,

>                           pCB,

>                           &pvs[n],

>                           0,

>                           &pvs[n].chid);

>        /*epicsThreadSleep(0.1);*/

>        }

> can produce a (dead)lock.

> Lock happens randomly and can not be determined by a number of issued requests.

> If I uncomment thread sleep, the lock disappears.

> I can provide a stack trace.

 

Tests of this type are a normal part of “catime” (the ca performance test and “acctst” (the ca regression test). For example the following code is in catime.

 

So I am not sure what is different with your situation. Do you see this problem only when access security files are not installed? I am at the moment testing with the example server. I will need to need to run the tests again against an ordinary IOC also.

 

/*

 * test_search ()

 */

LOCAL void test_search (

ti      *pItems,

unsigned    iterations,

unsigned    *pInlineIter

)

{

    unsigned i;

    int status;

 

    for ( i = 0u; i < iterations; i++ ) {

        status = ca_search ( pItems[i].name, &pItems[i].chix );

        SEVCHK (status, NULL);

    }

    status = ca_pend_io ( 0.0 );

        SEVCHK ( status, NULL );

 

    *pInlineIter = 1;

}

 

 

 


From: Matej Sekoranja [mailto:[email protected]]
Sent: Tuesday, December 19, 2006 12:35 PM
To: Jeff Hill
Cc: Thomas Pelaia II; Hiroyuki Sako; John Galambos; Christopher K. Allen; Ernest Williams; Kay Kasemir; NP Rees
Subject: Re: Status Report

 

Hi,

 

I've been investigating this issue about CAJ deadlocking an IOC. I've done lots of testing on the latest CAJ and an IOC but I've always failed to deadlock the IOC.

From the stack traces of IOC deadlock (once, but later never reproducible) and Mantis bug 268 mentioned by Jeff, I've concentrated on access rights callback.

 

I've created a simple record counter:

 

record(calc, record2) {

  field(VAL, "1")

  field(SCAN, ".1 second")

  field(CALC, "A+1")

  field(INPA, "record2")

}

 

... and changing access rights policy (so that IOC will generate ACL callbacks to the client).

 

ASG(DEFAULT) {

   INPA(record2)

   RULE(0, READ)

   RULE(0, WRITE) {

     CALC("A%2")

   } 

}

 

Then I've written a C client with preemptive context which blocks (sleeps) in a connection callback.

Client blocked at first connection callback and server reported 18 created channels.

Looking at the "casr 10" I noticed that server's send queue size for this particular blocking client is increasing up to its limit of 16k.

When this limit is reached the IOC is unable to create any new channel from other clients (e.g. using caget utility). 

So any badly written client (C or Java) can block (TCP buffers become full), but should not affect other server's clients.

This was tested with 3.14.8.2.

Of course I can’t know why or when any particular client application, or client library implementation, might not be reading its input queue. There can be many reasons. We can conclude that the server should not have such vulnerabilities, and so I have a committed a patch (which redirects access security callbacks through the event thread).

I tested this with 3.14.9-pre2 and the IOC does not block anymore. Jeff, your patch seems to work.

 

I guess the deadlocking issue was present when CAJ didnot implement flow control (first versions). Now CAJ has it implemented (and search algorithm as well) and I guess the IOC deadlock cannot happen anymore.

 

While testing I've also found a (dead)lock in CA library.

Simple lines as:

 

for (p = 0; p < 1000; p++) {

        ca_create_channel (pvs[n].name,

                           pCB,

                           &pvs[n],

                           0,

                           &pvs[n].chid);

        /*epicsThreadSleep(0.1);*/

        }

 

can produce a (dead)lock.

Lock happens randomly and can not be determined by a number of issued requests.

If I uncomment thread sleep, the lock disappears.

I can provide a stack trace.

 

Cheers,

Matej

 

 

 

 


Replies:
Re: Status Report Matej Sekoranja

Navigate by Date:
Prev: RE: EPICS on Tru64unix and HP-UX Jeff Hill
Next: Re: EPICS on Tru64unix and HP-UX Kazuro FURUKAWA
Index: 2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: EPICS R3.14.9 experiences Andrew Johnson
Next: Re: Status Report Matej Sekoranja
Index: 2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·