Andrew and Martin (and EPICS community),
After experiencing the repeated crashes and examining a memory dump to the extent necessary to determine where they occurred (according to gdb, it was in camessage.c, line #1344, EPICS v3.14.12.3), I modified the databases to not use
fanout records and moved on with my work - and committed the programming sin of not keeping a copy of the failed configuration. Sure enough, when I tried to recreate the failure scenario yesterday - no crash occurred. I know the cpp processes were the same
(the only changes there were a couple of printf debug statements) but besides modifying the .db files I had also touched the .dbd file and the st.cmd file (we use a customized version) and probably a couple of other things. The only conclusion I can
find is that I’ve missed some small, but significant change I made which somehow contributed to the problem.
I apologize for raising a concern and then not being able to back it up. The schedule here prohibits me from spending much more time researching this issue, but I will continue to pursue it as time permits and will let you know if I
can get the core to recur.
-Mark Poff
-----Original Message-----
From: Konrad, Martin [mailto:[email protected]]
Sent: Thursday, October 09, 2014 4:04 PM
To: Poff, Mark A; [email protected]
Subject: Re: Fanout to Sub Records in a Different IOC Causes Core Dump
Hi Mark,
> If you can't, take a look at how much stack space your
> Process1_routine subroutine needs, running out of stack space is one
> possible cause for this kind of strange behaviour in addition to the
> usual kinds of problems. A full stack trace for all threads when the
> crash happens would also help us help you (type 'thread apply all bt'
> in gdb).
Keep in mind that the stack trace could be corrupted. You might want to run with a trivial subroutine for testing.
In addition to that it's always a good idea to write a short test program that calls the subroutine function with some test values. You can run this test program with
valgrind --tool=memcheck <your_test_program>
to see if it is writing to memory it should not write to.
Regards,
Martin
--
Martin Konrad
Control System Engineer
Facility for Rare Isotope Beams
Michigan State University
640 South Shaw Lane
East Lansing, MI 48824-1321, USA
Tel. 517-908-7253
Email: [email protected]