EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: workQPanic! Oh no!
From: Tim Mooney <[email protected]>
To: [email protected], [email protected]
Date: Wed, 1 Sep 1999 14:13:55 -0500 (CDT)
re...,

> I know that this is probably a VxWorks question, but what exactly does
> 
>   workQPanic: Kernel work queue overflow.
> 
> mean? We note a perfect correlation with IOC crashes ;). Running
> R3.13.0beta11 and VxWorks 5.2 on a MVME162-532A, 16 MB, not obviously near
> the memory limit, not totally CPU limited either. Running lots of
> sequences, for what that's worth.
> 
>   -- Mark
> 
> --
> Mark M. Ito, Thomas Jefferson National Accelerator Facility

Too many interrupts.  The perfect correlation with IOC crashes is not
coincidental.

John Winans got off on a rant on this subject several years ago.  Here's the
part of his reply that was on point, or at least in the vicinity of the point:

>>  ...The only workQPanic stuff I am
>> aware of is based on the fact that VW uses a ring buffer to collect a
>> list of junk to do 'next'.  It backically contains a list of semaphores
>> that need to be 'given' now.  The reason they have this set up this way
>> is because they are idiots and don't know enough to write code that
>> scales properly.  You see, this ring buffer is a fixed size (engenius
>> eh?) and it can fill up if it decides to defer the giving of these
>> semaphores.
>> 
>> Well, the only time these semaphore 'give' operations are deferred is
>> when a semGive() is in an interrupt handler.  Thus this message means
>> that there were too many semGive()s that happened in a narrow time in
>> one or more interrupt handlers.  Where a "narrow time" is one where not
>> enough time has elapsed to let the cpu run the code to complete all the
>> semGive()s.
>> 
>> Personally, I think the boyz at VW are idiots because it is not very
>> hard to replacethe code that currently looks like this:
>> 
>> if (ringAroundTheCollarBufferIsFull)
>> 	panic;
>> else
>> 	addMoreCrapTo(ringAroundTheCollarBuffer, crap);
>> 
>> with code that is more sensable and truely useful in the real world that
>> might look more like this:
>> 
>> if (ringAroundTheCollarBufferIsFull)
>> {
>> 	flushTheDamnThing(ringAroundTheCollarBuffer);
>> 	consoleWrite("WARNING: ring buffer overflow\");
>> }
>> addMoreCrapTo(ringAroundTheCollarBuffer, crap);
>> 
>> Yah, it is a bad thing to do list-processing and large amounts of work
>> at an interrupt level, but it sure beats asserting a panic()!!!!
>> 
>> 
>> Now, the way I deal with this misfeature is to figure out what IRQs might
>> be generated at the time that it dies... it is usually fairly easy
>> because you can normally narrow things down to some single device that
>> is being used or saturated at the panic moment... that driver is the
>> one that is generating too many IRQs.
>> 
>> The most common occurrance of this that I have seen is caused by using
>> raw (non-debounced) binary inputs on a card whose driver is configured
>> to generate interrupts on input transitions.  It is pretty much a
>> guarentee to panic the beast.  The second easy way to do it would be to
>> enable IRQ processing of global events on Frank Lenksus's global event
>> boards... and then generating a boat-load of events with the event code
>> numbers of those enabled IRQs... I wrote the code... I know it will
>> fail if not configured properly.  I have caused it to do so while bench
>> testing.
>> 
>> Groovus?
>> 
>> 
>> --John Winans

Tim Mooney ([email protected]) (630)252-5417
Beamline Controls & Data Acquisition Group
Advanced Photon Source, Argonne National Lab



Navigate by Date:
Prev: One CA server question and one CA server problem saa
Next: Arrays and portable channel access server Tony Cox - (415)926-3105
Index: 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: workQPanic! Oh no! Mark M. Ito
Next: One CA server question and one CA server problem saa
Index: 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·