Client exit delays caused by reverse name lookup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
Low
|
mdavidsaver | ||
3.14 |
Fix Released
|
Undecided
|
Unassigned | ||
3.15 |
Fix Released
|
Undecided
|
Unassigned | ||
3.16 |
Fix Released
|
Low
|
mdavidsaver |
Bug Description
From benjamin.franksen _AT_ helmholtz-
We've been hit by a problem reported several times on tech-talk, last
time in 2013, with the final message that identifies it's cause here:
http://
Mistakenly relying on Google search (instead of search on tech-talk
directly) we only found a thread from 2011 with no resolution. So we
debugged this again (after Mark and David already did in 2013), arriving
at the same conclusion: ca_context_destroy leads to destruction of an
object of the class ipAddrToAsciiEn
this->thread.
version, this may hang until the call to gethostbyaddr finally times
out, if the host that serves your PV does not have a DNS entry.
I think we can agree that this is not how things should be. Whatever the
purpose of starting the reverse name resolution (in the background
thread) may be, there are certainly lots of CA client applications that
can live without this feature, as witnessed by caget working flawlessly
(terminating without any delays) when I comment out the call to
ca_context_destroy.
(There is, by the way, nothing in the docs suggesting that CA servers
must have a valid DNS name or else programs may hang indefinitely inside
ca_context_
I can see three ways to move forward from here:
(1) Remove the call to ca_context_destroy from the CA utilities. I don't
like this very much: their source code should serve as demonstration of
good practice when programming a CA client and thus should include
proper cleanup of the client context.
(2) Apply more forceful OS-specific ways of getting rid of the name
resolution thread (even when it is blocked on a call to gethostbyaddr).
Doing this properly would mean to adding some sort of "thread killing"
method to the epicsThread class, something which has been proposed
before and rejected for various good reasons.
(3) Let the user choose whether they want to have the extra features
enabled by the host name lookup, or whether they rather want to ensure
quick termination of their programs or threads. This could be made
configurable by an environment variable, for instance.
I think the third solution is preferable since it is backward compatible
(no API or ABI change) and can be applied without changing the source
code or even re-compiling (if dynamically linked) of the client
applications.
Cheers
Ben
Related branches
- Andrew Johnson: Approve
- Ralph Lange: Approve
-
Diff: 603 lines (+339/-81)4 files modifiedsrc/libCom/misc/ipAddrToAsciiAsynchronous.cpp (+171/-81)
src/libCom/misc/ipAddrToAsciiAsynchronous.h (+4/-0)
src/libCom/test/Makefile (+3/-0)
src/libCom/test/ipAddrToAsciiTest.cpp (+161/-0)
Changed in epics-base: | |
status: | New → Confirmed |
importance: | Undecided → Low |
Changed in epics-base: | |
milestone: | none → 3.14.branch |
status: | Confirmed → Fix Committed |
From Steven Hartman <hartmansm _AT_ ornl.gov>:
We also stumbled across this issue a few months ago. (A beneficial side effect, we identified and corrected a DNS misconfiguration.)
I agree with your recommendation of choice three, some mechanism to enable/disable the name lookup.