We've been hit by a problem reported several times on tech-talk, last
time in 2013, with the final message that identifies it's cause here:
http://www.aps.anl.gov/epics/tech-talk/2013/msg00836.php
Mistakenly relying on Google search (instead of search on tech-talk
directly) we only found a thread from 2011 with no resolution. So we
debugged this again (after Mark and David already did in 2013), arriving
at the same conclusion: ca_context_destroy leads to destruction of an
object of the class ipAddrToAsciiEnginePrivate, its destructor calling
this->thread.exitWait(). Depending on your OS type, configuration, and
version, this may hang until the call to gethostbyaddr finally times
out, if the host that serves your PV does not have a DNS entry.
I think we can agree that this is not how things should be. Whatever the
purpose of starting the reverse name resolution (in the background
thread) may be, there are certainly lots of CA client applications that
can live without this feature, as witnessed by caget working flawlessly
(terminating without any delays) when I comment out the call to
ca_context_destroy.
(There is, by the way, nothing in the docs suggesting that CA servers
must have a valid DNS name or else programs may hang indefinitely inside
ca_context_destroy.)
I can see three ways to move forward from here:
(1) Remove the call to ca_context_destroy from the CA utilities. I don't
like this very much: their source code should serve as demonstration of
good practice when programming a CA client and thus should include
proper cleanup of the client context.
(2) Apply more forceful OS-specific ways of getting rid of the name
resolution thread (even when it is blocked on a call to gethostbyaddr).
Doing this properly would mean to adding some sort of "thread killing"
method to the epicsThread class, something which has been proposed
before and rejected for various good reasons.
(3) Let the user choose whether they want to have the extra features
enabled by the host name lookup, or whether they rather want to ensure
quick termination of their programs or threads. This could be made
configurable by an environment variable, for instance.
I think the third solution is preferable since it is backward compatible
(no API or ABI change) and can be applied without changing the source
code or even re-compiling (if dynamically linked) of the client
applications.
Cheers
Ben
--
"Make it so they have to reboot after every typo." ― Scott Adams
Attachment:
signature.asc
Description: OpenPGP digital signature
- Replies:
- Re: caget delays Hartman, Steven M.
- Re: caget delays Michael Davidsaver
- Navigate by Date:
- Prev:
Re: Fwd: Wrong beacon source IP address Ralph Lange
- Next:
Re: Fwd: Wrong beacon source IP address Michael Davidsaver
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
<2015>
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Fwd: Re: Wrong beacon source IP address Ralph Lange
- Next:
Re: caget delays Hartman, Steven M.
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
<2015>
2016
2017
2018
2019
2020
2021
2022
2023
2024
|