Pedro,
>From your message I am inferring the following.
1) There is a failure in your vxWorks IP kernel subsystem
or some part of your network infrastructure because ping
is no longer working.
2) A CA thread is reporting that it found that one of its
network circuits wasn't working, and therefore disconnected
by closing the network socket. Probably almost at the same
time another CA thread that was watching for incoming network
data in the select() system called reported that one of the
socket file descriptors that it was watching was closed by
the other CA thread and was therefore invalid. It was not
possible to cause the other thread to drop out of select
without closing the socket in this circumstance. I choose
at the time not to suppress the message because in other
circumstance these messages are useful if CA is failing or
its data structures have been corrupted. These messages do
not occur in EPICS R3.14.
> The only
> way to bring the system back to life was to install the old version of
> the software that run on a previous verion of EPICS (3.12.2).
3) Do I infer correctly that rebooting the R3.13.4 system did not
restore proper operation? I don't ask this question to imply
that rebooting should be a normal part of system operation.
I am only trying to understand more about your symptoms. If
rebooting does not help then this might be a large clue about
why you are intermittently losing network connectivity.
Was a network switch recently upgraded? Was a network switch
reset without powering off the systems that use it? We have
seen problems with certain network gear that had 10/100 Mb
auto negotiation if the switch lost power and an IOC interfaced
with a 10/100 auto negotiated LAN tap was not powered up after
the switch.
It is difficult to make further guesses about the cause
of the problem.
When this occurs again I would run the
following commands from the serial port in order to verify
that certain EPICS, CA, and or vxWorks data structures have
not been corrupted.
dbcar 0, 10
casr 10
dbl
iosFdShow
inetstatShow
ifShow
ipstatShow
udpstatShow
tcpstatShow
mbufShow
In particular, look carefully at the error counts reported
by ifShow and the free mbuf count reported by mbufShow.
We have also seen vxWorks IP kernel failures occurring when
there are threads in the system using the network that run
at a higher priority than tNetTask.
Did moving back to the 3.12.2 system also involve using a
different version of vxWorks? You might check to see if any
patches available for vxWorks and the network interface
driver are installed.
Jeff
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Pedro
> Gigoux
> Sent: Monday, September 09, 2002 9:10 AM
> To: EPICS Tech-Talk
> Subject: CA problem
>
> Hi,
>
> We have a system that keeps loosing the connection to the DM screens
> after a few minutes of operation. The following message shows up on
the
> screen after that happens:
>
> ecs-MK> dbCa:exceptionCallback stat Network connection lost channel
> unknown
>
> ecs-MK>
> CA.Client.Diagnostic..............................................
> Message: "Network connection lost"
> Severity: "Warning" Context: "10.2.2.104:5064"
> Source File: ../iocinf.c Line Number: 1488
> ....................................................................
> CAC: unexpected select fail: 851971=S_iosLib_INVALID_FILE_DESCRIPTOR
>
> After this message it's not possible to ping the machine where the
crate
> boots from (or any other machine). Tried changing the CPU and
transition
> module but the symptom stayed the same. Also tried changing the
ethernet
> cable and switch port with no luck.
>
> Except for the network problem the system seems to be running fine and
> I'm able to access the console (serial port) without problems. The
only
> way to bring the system back to life was to install the old version of
> the software that run on a previous verion of EPICS (3.12.2). We used
to
> run under 3.13.4. The program in the crate has been running for
several
> weeks flawlesly, and runs in our twin facility with no problems at
all.
>
> Any ideas?
>
> Pedro.
- Replies:
- CA problem Allan Honey
- References:
- CA problem Pedro Gigoux
- Navigate by Date:
- Prev:
CA problem Pedro Gigoux
- Next:
RE: "bad UDP msg ..." error message Jeff Hill
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
<2002>
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
CA problem Pedro Gigoux
- Next:
CA problem Allan Honey
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
<2002>
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|