Aloha
Finally, 8 years after deploying R3.13.0beta12, we at the
Keck observatory are attempting to deploy R3.13.10.
During about 80 hours of testing we have had 3 events of the
form “ring buffer overflow” within a CA_event task. On two of those
occasions the Vxworks 5.3.1 system (pPC), on which the error occurred appeared
to lose it’s connections to all it’s clients, even other EPICS
empowered IOC systems. The events lasted for 7-12 minutes, after which most
connections were re-established (at least one channel to another IOC did not
reconnect!). On the third such event a CA_event task was suspended and the
following was in the iocLog:
tdc2:1040 Fri Jan
20 13:42:39 2006
filename="../taskwd.c" line number=159
tdc2:1040 Fri Jan
20 13:42:39 2006 task 36a5718 CA_event
suspended
tdc2:1040 Fri Jan
20 13:43:28 2006 interrupt: ./taskwd.c"
line number=159
tdc2:1040 Fri Jan
20 13:43:28 2006 callbackRequest ring buffer
full
It so happened that I was monitoring the IOC via a telnet
connection, thereby logging the following, which was not in the iocLog:
instruction access
Exception next instruction address: 0x30303030
Machine Status Register: 0x4000b030
Condition Register: 0x42000040
Task: 0x36a5718 "CA_event"^G
filename="../taskwd.c" line number=159
task 36a5718 CA_event suspended
interrupt: callbackRequest ring buffer full
Seeing as the instruction address 0x30303030 is outside of
the address space… life does not seem too good.
I was not physically present during this event so I did not
get a chance to perform a task trace before the IOC was rebooted.
I really hate to think that there is an errant pointer
somewhere.
Any thoughts would be appreciated.
Thanks,
Al Honey
Kevin
Tsubota