EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Buffer problems
From: Tim Mooney <[email protected]>
To: Mark Rivers <[email protected]>
Cc: Andrew Johnson <[email protected]>, [email protected]
Date: Thu, 25 Mar 2004 11:34:25 -0600
Mark Rivers wrote:
Andrew,


I'm not quite sure what causes ENOBUFS errors when sending CA beacons; I have seen similar errors on IOCs here that have never run out of DataPool or SysPool buffers, and I don't yet know what they're complaining about (any ideas Jeff?). They do appear


to be benign though, if my memory is correct - the loss of a beacon

can cause some


unnecessary CA searches to occur, but nothing should break.


Things are not benign for us, though the CA error messages may just be
symptomatic of deeper problems.  We have lost NFS and CA connectivity,
and we have to reboot.  It runs for a while, and then it happens again.
I suspect there is a machine on the network there that is infected with
some virus, and periodically blasting each machine on the network.

Disconnecting the beamline switch from the network has fixed the
problem, but of course that means I can't diagnose it from here at
Argonne.

One of our crates is seeing a similar problem, and it not benign for them either. The processor is an mv5100 running Tornado 2.02, using Andrew's mv5100-asd6_nodns BSP with boot flags = 0x1000, (which increases the number of network buffers) under EPICS 3.13.9.

This crate crashes roughly once a week, with error messages like

0x1e465488 (save_restore): save_file: Backup file (autosave/auto_settings.savB)
is bad.  Writing a new one.
0x1e465488 (save_restore): save_file - unable to open file
autosave/auto_settings.savB0x1e465488 (save_restore): *** *** *** *** *** ***
*** *** *** *** *** *** *** *** *** ***
0x1e465488 (save_restore): save_restore:save_file: Can't write new backup
file.  I quit.
0x1e465488 (save_restore): *** *** *** *** *** *** *** *** *** *** *** *** ***
*** *** ***
sendto failed: S_errno_ENOBUFS
0x1e64ae80 (CA_online): ../online_notify.c: CA beacon error was "S_errno_ENOBUFS"
0x1e64ae80 (CA_online): ../online_notify.c: CA beacon error was "S_errno_ENOBUFS"
CAC: error = "S_errno_ENOBUFS" sending UDP msg to 164.54.115.255:5064
CAC: error = "S_errno_ENOBUFS" sending UDP msg to 164.54.255.255:5064

After the most recent crash, the user ran the command
netStackEndPoolShow "fei" (a command Andrew added to -asd6 and higher BSP's,
which which prints the status of the network interface memory buffer pool.)
The command yielded

type        number
---------   ------
FREE    :    256
DATA    :      0
HEADER  :      0
SOCKET  :      0
PCB     :      0
RTABLE  :      0
HTABLE  :      0
ATABLE  :      0
SONAME  :      0
ZOMBIE  :      0
SOOPTS  :      0
FTABLE  :      0
RIGHTS  :      0
IFADDR  :      0
CONTROL :      0
OOBDATA :      0
IPMOPTS :      0
IPMADDR :      0
IFMADDR :      0
MRTABLE :      0
TOTAL   :    256
number of mbufs: 256
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
sendtCLUSTER POOL TABLE
o_______________________________________________________________________________
size     clusters  free      usage
 fail-------------------------------------------------------------------------------
1540ed:      320       S_errno_ENOBUFS
256       6802377

So, no obvious reason for the problem, but the crash takes everything
down, sometimes corrupting save-restore files, and requires a least
a reboot to recover from.  I'm not sure if the save-restore errors
shown above always precede the S_errno_ENOBUFS condition, and if they
do, I don't know whether they're a cause or just the first symptom.

At the moment, the only thing I know how to do for these guys is to
get them on the new version of save-restore, so that at least their
settings file won't get trashed, and so that they can get running
again with just a reboot.

--
Tim Mooney ([email protected]) (630)252-5417
Beamline Controls & Data Acquisition Group
Advanced Photon Source, Argonne National Lab


Replies:
Re: Buffer problems Tim Mooney
References:
RE: Buffer problems Mark Rivers

Navigate by Date:
Prev: RE: Buffer problems Jeff Hill
Next: Re: Buffer problems Tim Mooney
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Buffer problems john sinclair
Next: Re: Buffer problems Tim Mooney
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·