[2.3a1] named stuck on reload, DNS broken
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
BIND |
Fix Released
|
Undecided
|
Unassigned | ||
MAAS |
Fix Released
|
Critical
|
Blake Rouse | ||
2.2 |
Fix Released
|
Critical
|
Blake Rouse | ||
2.4 |
Fix Committed
|
Undecided
|
Blake Rouse | ||
2.6 |
Fix Released
|
Critical
|
Blake Rouse | ||
2.7 |
Fix Released
|
Critical
|
Blake Rouse | ||
bind9 (Ubuntu) |
In Progress
|
Medium
|
Unassigned | ||
Xenial |
In Progress
|
Medium
|
Unassigned | ||
Bionic |
In Progress
|
Medium
|
Unassigned | ||
Disco |
Won't Fix
|
Medium
|
Unassigned | ||
Eoan |
Won't Fix
|
Medium
|
Unassigned | ||
maas (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
Bionic |
In Progress
|
Undecided
|
Blake Rouse |
Bug Description
[Impact]
* systemd thinks the service is running, but it does not respond to any commands or requests. Also, it doesn't respond to signals other than kill -9. service restarts hang, rndc hangs.
* being that the deadlock is in bind9 that ships in bionic the issue needs to be backported to 2.4.3 so MAAS from the archive in bionic can handle the deadlock and get bind9 unstuck.
* change in MAAS watches for this case to occur with bind9, then MAAS will force kill the service and restart it.
[Test Case]
* very hard to reproduce but the issue occurs when bind9 deadlocks, it response to nothing over the network or rndc. SIGTERM does not kill it only SIGKILL works to force kill the process and get systemd to restart it.
[Regression Potential]
* possible that bind9 will not be started correctly or possible that bind9 will be placed into a forever restart loop
Related branches
- MAAS Lander: Needs Fixing
- Adam Collard (community): Approve
-
Diff: 876 lines (+344/-48)15 files modifieddebian/extras/99-maas-common-sudoers (+6/-0)
src/maasserver/dns/config.py (+4/-4)
src/maasserver/dns/tests/test_config.py (+5/-2)
src/maasserver/region_controller.py (+38/-5)
src/maasserver/service_monitor.py (+3/-0)
src/maasserver/tests/test_region_controller.py (+49/-8)
src/provisioningserver/dns/actions.py (+11/-5)
src/provisioningserver/dns/config.py (+2/-2)
src/provisioningserver/dns/tests/test_actions.py (+4/-4)
src/provisioningserver/dns/tests/test_config.py (+6/-2)
src/provisioningserver/utils/service_monitor.py (+101/-9)
src/provisioningserver/utils/shell.py (+2/-1)
src/provisioningserver/utils/tests/test_service_monitor.py (+102/-6)
src/provisioningserver/utils/tests/test_shell.py (+9/-0)
versions.cfg (+2/-0)
- MAAS Maintainers: Pending requested
-
Diff: 3421 lines (+1314/-676) (has conflicts)36 files modified.eslintrc.js (+3/-0)
Makefile (+7/-0)
debian/changelog (+11/-1)
debian/copyright (+0/-4)
dev/null (+0/-631)
jest.config.js (+6/-0)
package.json (+16/-0)
src/maasserver/models/signals/scriptresult.py (+6/-0)
src/maasserver/models/signals/tests/test_scriptresult.py (+13/-0)
src/maasserver/static/js/angular/controllers/pods_list.js (+5/-0)
src/maasserver/static/js/angular/controllers/settings.js (+114/-0)
src/maasserver/static/js/angular/controllers/tests/test_pods_list.js (+5/-0)
src/maasserver/static/js/angular/controllers/tests/test_settings.js (+264/-0)
src/maasserver/static/js/angular/controllers/tests/test_zones_list.js (+4/-0)
src/maasserver/static/js/angular/directives/machines_table.js (+31/-9)
src/maasserver/static/js/angular/directives/script_status.js (+3/-0)
src/maasserver/static/js/angular/directives/tests/test_machines_table.js (+30/-0)
src/maasserver/static/js/angular/entry.js (+1/-7)
src/maasserver/static/js/bundle/maas-min.js (+4/-0)
src/maasserver/static/js/bundle/maas-min.js.map (+5/-1)
src/maasserver/static/js/bundle/vendor-min.js (+4/-0)
src/maasserver/static/js/bundle/vendor-min.js.map (+5/-1)
src/maasserver/static/partials/dashboard.html (+1/-1)
src/maasserver/static/partials/machines-table.html (+24/-2)
src/maasserver/static/partials/networks-list.html (+2/-2)
src/maasserver/static/partials/node-events.html (+1/-1)
src/maasserver/static/partials/nodes-list.html (+66/-2)
src/maasserver/static/partials/pods-list.html (+5/-0)
src/maasserver/static/partials/subnet-details.html (+1/-1)
src/maasserver/static/partials/switches-table.html (+1/-1)
src/maasserver/static/partials/zones-list.html (+6/-0)
src/maasserver/testing/html-loader.js (+0/-0)
src/metadataserver/user_data/templates/snippets/maas_run_remote_scripts.py (+8/-0)
src/metadataserver/user_data/templates/snippets/tests/test_maas_run_remote_scripts.py (+20/-0)
src/provisioningserver/utils/version.py (+4/-0)
yarn.lock (+638/-12)
- Blake Rouse (community): Approve
-
Diff: 738 lines (+260/-43)15 files modifieddebian/extras/99-maas-common-sudoers (+3/-2)
src/maasserver/dns/config.py (+4/-4)
src/maasserver/dns/tests/test_config.py (+5/-2)
src/maasserver/region_controller.py (+25/-4)
src/maasserver/service_monitor.py (+3/-0)
src/maasserver/tests/test_region_controller.py (+48/-8)
src/provisioningserver/dns/actions.py (+11/-5)
src/provisioningserver/dns/config.py (+2/-2)
src/provisioningserver/dns/tests/test_actions.py (+4/-4)
src/provisioningserver/dns/tests/test_config.py (+6/-2)
src/provisioningserver/service_monitor.py (+3/-0)
src/provisioningserver/utils/service_monitor.py (+56/-3)
src/provisioningserver/utils/shell.py (+2/-1)
src/provisioningserver/utils/tests/test_service_monitor.py (+79/-6)
src/provisioningserver/utils/tests/test_shell.py (+9/-0)
- Lee Trager (community): Approve
- MAAS Lander: Approve
-
Diff: 738 lines (+260/-43)15 files modifieddebian/extras/99-maas-common-sudoers (+3/-2)
src/maasserver/dns/config.py (+4/-4)
src/maasserver/dns/tests/test_config.py (+5/-2)
src/maasserver/region_controller.py (+25/-4)
src/maasserver/service_monitor.py (+3/-0)
src/maasserver/tests/test_region_controller.py (+48/-8)
src/provisioningserver/dns/actions.py (+11/-5)
src/provisioningserver/dns/config.py (+2/-2)
src/provisioningserver/dns/tests/test_actions.py (+4/-4)
src/provisioningserver/dns/tests/test_config.py (+6/-2)
src/provisioningserver/service_monitor.py (+3/-0)
src/provisioningserver/utils/service_monitor.py (+56/-3)
src/provisioningserver/utils/shell.py (+2/-1)
src/provisioningserver/utils/tests/test_service_monitor.py (+79/-6)
src/provisioningserver/utils/tests/test_shell.py (+9/-0)
- Blake Rouse (community): Approve
-
Diff: 332 lines (+196/-5)4 files modifiedsrc/maasserver/dns/config.py (+6/-0)
src/maasserver/dns/tests/test_config.py (+19/-0)
src/maasserver/region_controller.py (+46/-3)
src/maasserver/tests/test_region_controller.py (+125/-2)
- Mike Pontillo (community): Approve
-
Diff: 332 lines (+196/-5)4 files modifiedsrc/maasserver/dns/config.py (+6/-0)
src/maasserver/dns/tests/test_config.py (+19/-0)
src/maasserver/region_controller.py (+46/-3)
src/maasserver/tests/test_region_controller.py (+125/-2)
tags: | added: server-next |
Changed in bind9 (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in maas: | |
status: | Triaged → In Progress |
assignee: | nobody → Blake Rouse (blake-rouse) |
milestone: | none → 2.3.0 |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
milestone: | 2.3.0 → 2.3.0alpha2 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
tags: | removed: server-next |
Changed in bind9 (Ubuntu): | |
assignee: | nobody → Blake Rouse (blake-rouse) |
Changed in bind9 (Ubuntu Eoan): | |
assignee: | Blake Rouse (blake-rouse) → Dan Streetman (ddstreet) |
Changed in bind9 (Ubuntu Disco): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in bind9 (Ubuntu Bionic): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in bind9 (Ubuntu Disco): | |
importance: | Undecided → Medium |
Changed in bind9 (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in bind9 (Ubuntu Eoan): | |
status: | Triaged → In Progress |
Changed in bind9 (Ubuntu Disco): | |
status: | New → In Progress |
Changed in bind9 (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in bind9 (Ubuntu Eoan): | |
importance: | High → Medium |
Changed in bind9 (Ubuntu Xenial): | |
assignee: | nobody → Dan Streetman (ddstreet) |
importance: | Undecided → Medium |
status: | New → In Progress |
tags: | added: sts |
Changed in bind9 (Ubuntu Bionic): | |
assignee: | Dan Streetman (ddstreet) → Eric Desrochers (slashd) |
Changed in bind9 (Ubuntu Disco): | |
assignee: | Dan Streetman (ddstreet) → Eric Desrochers (slashd) |
Changed in bind9 (Ubuntu Eoan): | |
assignee: | Dan Streetman (ddstreet) → Eric Desrochers (slashd) |
Changed in bind9 (Ubuntu Xenial): | |
assignee: | Dan Streetman (ddstreet) → Eric Desrochers (slashd) |
Changed in bind9 (Ubuntu Xenial): | |
assignee: | Eric Desrochers (slashd) → nobody |
Changed in bind9 (Ubuntu Bionic): | |
assignee: | Eric Desrochers (slashd) → nobody |
Changed in bind9 (Ubuntu Disco): | |
assignee: | Eric Desrochers (slashd) → nobody |
Changed in bind9 (Ubuntu Eoan): | |
assignee: | Eric Desrochers (slashd) → nobody |
no longer affects: | maas (Ubuntu Xenial) |
no longer affects: | maas (Ubuntu Eoan) |
no longer affects: | maas (Ubuntu Disco) |
Changed in maas (Ubuntu Bionic): | |
assignee: | nobody → Blake Rouse (blake-rouse) |
status: | Confirmed → In Progress |
Changed in bind: | |
status: | New → Fix Released |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Changed in bind9 (Ubuntu Disco): | |
status: | In Progress → Won't Fix |
Assuming the debug symbols I grabbed[1] for my install of bind9 on Xenial match yours (I have bind9 version 1:9.10. 3.dfsg. P4-8ubuntu1. 7 installed per "apt-cache policy bind9"), I did the following to grab a traceback:
$ sudo apt-get install bind9-dbgsym libdns162-dbgsym libisc160-dbgsym
$ gdb /usr/sbin/named core
(gdb) set pagination off
(gdb) thread apply all bt
... [2] ...
Looking at the backtrace in [2], the interesting parts to me are threads 8, 11 and 20, which are possibly involved in a deadlock[3]. Looks like one of the threads is reloading the configuration (something we would expect MAAS to do), and the other is calling dns_resolver_ shutdown( ) via view_flushandde tach().
[1]: https:/ /wiki.ubuntu. com/Debug% 20Symbol% 20Packages paste.ubuntu. com/25292729/ unix/sysv/ linux/x86_ 64/lowlevellock .S:135 pthread_ mutex_lock (mutex= mutex@entry= 0x7f94d4014fe8) at ../nptl/ pthread_ mutex_lock. c:135 viewp@entry= 0x7f9504389780) at ../../. ./lib/dns/ view.c: 597 9750) at ../../. ./lib/dns/ validator. c:3891 destroy (validatorp= validatorp@ entry=0x7f95194 62628) at ../../. ./lib/dns/ validator. c:3915 625d0) at ../../. ./lib/dns/ resolver. c:4722 0x7f952be3b010) at ../../. ./lib/isc/ task.c: 1130 b010) at ../../. ./lib/isc/ task.c: 1302 a700) at pthread_ create. c:333 unix/sysv/ linux/x86_ 64/clone. S:109
[2]: http://
[3]:
Thread 8 (Thread 0x7f95226aa700 (LWP 3203)):
#0 __lll_lock_wait () at ../sysdeps/
#1 0x00007f952a351efe in __GI___
#2 0x00007f952b7a0794 in dns_view_weakdetach (viewp=
#3 0x00007f952b7993de in destroy (val=0x7f950438
#4 0x00007f952b79927b in dns_validator_
#5 0x00007f952b76b9d1 in validated (task=<optimized out>, event=0x7f95194
#6 0x00007f952a9a6360 in dispatch (manager=
#7 run (uap=0x7f952be3
#8 0x00007f952a34f6ba in start_thread (arg=0x7f95226a
#9 0x00007f9529a993dd in clone () at ../sysdeps/
Thread 11 (Thread 0x7f9520ea7700 (LWP 3206)): cond_wait@ @GLIBC_ 2.3.2 () at ../sysdeps/ unix/sysv/ linux/x86_ 64/pthread_ cond_wait. S:185 beginexclusive (task0=<optimized out>) at ../../. ./lib/isc/ task.c: 1717 <optimized out>, server= server@ entry=0x7f952be 44010, first_time= first_time@ entry=isc_ boolean_ false) at ../../. ./bin/named/ server. c:5651 0x7f952be44010) at ../../. ./bin/named/ server. c:7162 0x7f952be44010) at ../../. ./bin/named/ server. c:7183 reloadcommand (server= 0x7f952be44010, args=args@ entry=0x7f94fc1 20af0 "reload", text=text@ entry=0x7f9520e a6590) at ../../. ./bin/named/ server. c:7416 docommand (message=<optimized out>, text=text@ entry=0x7f9520e a6590) at ../../. ./bin/named/ control. c:102 51010, event=<optimized out>) at ../../. ./bin/named/ controlconf. c:458 0x7f952be3b010) at ../../. ./lib/isc/ task.c: 1130 b010) at ../../. ./lib/isc/ task.c: 1302 7700) at pthread_ create. c:333 unix/sysv/ linux/x86_ 64/clone. S:109
#0 pthread_
#1 0x00007f952a9a516b in isc__task_
#2 0x0000557c34997dc1 in load_configuration (filename=
#3 0x0000557c3499a826 in loadconfig (server=
#4 0x0000557c3499ad48 in reload (server=
#5 ns_server_
#6 0x0000557c34975db5 in ns_control_
#7 0x0000557c34978b97 in control_recvmessage (task=0x7f952be
#8 0x00007f952a9a6360 in dispatch (manager=
#9 run (uap=0x7f952be3
#10 0x00007f952a34f6ba in start_thread (arg=0x7f9520ea
#11 0x00007f9529a993dd in clone () at ../sysdeps/
...