Aloha
Finally, 8 years after deploying
R3.13.0beta12, we at the Keck observatory are attempting to deploy
R3.13.10.
During about 80 hours of testing
we have had 3 events of the form “ring buffer overflow” within a CA_event
task. On two of those occasions the Vxworks 5.3.1 system (pPC), on which the
error occurred appeared to lose it’s connections to all it’s clients,
even other EPICS empowered IOC systems. The events lasted for 7-12 minutes,
after which most connections were re-established (at least one channel to
another IOC did not reconnect!). On the third such event a CA_event task was
suspended and the following was in the iocLog:
tdc2:1040 Fri
Jan 20 13:42:39 2006
filename="../taskwd.c" line number=159
tdc2:1040 Fri
Jan 20 13:42:39 2006 task
36a5718 CA_event suspended
tdc2:1040 Fri
Jan 20 13:43:28 2006
interrupt: ./taskwd.c" line number=159
tdc2:1040 Fri
Jan 20 13:43:28 2006
callbackRequest ring buffer full
It so happened that I was
monitoring the IOC via a telnet connection, thereby logging the following,
which was not in the iocLog:
instruction
access
Exception next instruction
address: 0x30303030
Machine Status Register:
0x4000b030
Condition Register:
0x42000040
Task: 0x36a5718
"CA_event"^G
filename="../taskwd.c"
line number=159
task 36a5718 CA_event
suspended
interrupt: callbackRequest
ring buffer full
Seeing as the instruction address
0x30303030 is outside of the address space… life does not seem too
good.
I was not physically present
during this event so I did not get a chance to perform a task trace before the
IOC was rebooted.
I really hate to think that there
is an errant pointer somewhere.
Any thoughts would be
appreciated.
Thanks,
Al Honey
Kevin Tsubota