Ric Claus wrote:
>
> We (SLAC's PEP-II RF group) are having a problem with hanging/crashing
> IOCs and wonder if anyone has some suggestions. We're running EPICS
> R3.13.0.beta4 with dm version 2.3. The processor is a National
> Instruments VXIcpu-030 with 8 MB of RAM. When everything is up and
> running normally we see about 2 MB of free memory. Task stacks are not
> close to the edge. There is no VXI interrupt activity, although perhaps
> some are being genterated by the Allen Bradley scanner module.
Early versions of the NI CPU030 had serious hardware flaws. These
problems were fixed by several board swap iterations here.
We eventually ended up with a version that had no hard faults
however the system would not run under heavy ethernet load
without eventually crashing. NI was blaming WRS and vise versa.
We lived with the problem for a year or so. They eventually gave us
another hardware/software upgrade and our reliability problems
went away.
LBL had a similar experience except that they started using
the board after the initial bugs were worked out. They did
experience the reliability problems but didnt wait as long for
them to be fixed because their project started up not long
before when the NI discovered the cause of the reliability problem.
As I recall BOEING also saw an improvement after the upgrade.
Apparently the reliability problem was fixed by WRS SPR 4143
which is intended to fix a system crash resulting from a locked up
lanse ethernet chip. As I recall a fix for this SPR was
included with REV D6 of the board which most likely also
included other hardware/BSP changes. I am not sure about this
rev level so you should check with NI. We are checking the
rev levels of our boards but have not so far located where NI
hides the rev sticker.
> When an IOC crashes or hangs, there is generally nothing to see on the
> console port record. Once, however, I saw tNetTask trying to say
> something, but it never made it out.
>
> One possibility I wonder about is that CA clients cause the creation
> of "CA client" and "CA event" tasks that consume a lot (800 Mb) of
> memory.
Its true that memory is allocated for each client. There is also memory
allocated for each channel that is created and also for each monitor
that is created. If you are seeing 800 Mb then the client that
is attaching must be setting up many CA monitors.
> Perhaps these crashes are due to one too many dm sessions
> being fired up in people's offices.
I routinely run the IOC out of memory when I am testing channel access.
I have not seen any problems of this sort. If there were enough clients
to get the CA server to its 100k bytes free limit then you would see
messages
on the console.
It should be easy to test this by creating a dm screen with many
channels
and running it many times.
> Is there a way to limit the
> number of CA clients?
Since, depending on the number of channels/monitors created, different
clients use different amounts of memory it is not that useful to limit
only the number of clients.
Should we consider adding a configurable hard limit on the
total number each of {clients,channels,monitors} in the future?
CA will not allow clients to connect if there is less than 100k free.
The WRS routine that returns the number of free bytes was (last time
I checked) written very inefficiently (it was summing up all of the
blocks on the free list). Therefore, the ca server checks the number
of free bytes periodically (every EPICS_CA_BEACON_PERIOD), and not every
time that it calls malloc().
The CA gateway could also be used to concentrate several clients
into one effective client (as seen by the IOC).
> What do these tasks do with that much memory?
I suspect that most of the memory in your case is being used for a
monitor
queue. The new CA server (which is _not_ the IOC server in 3.13) is
better about memory consumption for the monitor queue.
> When IOC resources
> run out, does CA currently stop allowing new client connections or
> does it let the IOC die?
It stops allowing new clients when the IOC's free memory drops below
100k (as polled every EPICS_CA_BEACON_PERIOD).
Jeff
--
______________________________________________________________________
Jeffrey O. Hill Internet [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos, NM 87545 USA FAX 505 665 5107
- References:
- hanging IOCs Ric Claus
- Navigate by Date:
- Prev:
Web Pages about PC port Kay-Uwe Kasemir
- Next:
Re: R3.13 Jeff Hill
- Index:
1994
1995
1996
<1997>
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: hanging IOCs Marty Kraimer
- Next:
Re: hanging IOCs Ned Arnold
- Index:
1994
1995
1996
<1997>
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|