Ceph dashboard is not enabled and hook fails "dashboard-relation-changed"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Dashboard Charm |
Fix Committed
|
High
|
Unassigned | ||
Quincy.2 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Run fails on juju wait timeout because ceph dashboard dies:
-------
ceph-mon/0* waiting executing 0/lxd/1 10.246.65.28 Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count (6)
ceph-dashboard/0* blocked idle 10.246.65.28 Dashboard is not enabled
logrotated/15 active idle 10.246.65.28 Unit is ready.
ceph-mon/1 active executing 2/lxd/1 10.246.65.55 Unit is ready and clustered
ceph-dashboard/1 error idle 10.246.65.55 hook failed: "dashboard-
logrotated/20 active idle 10.246.65.55 Unit is ready.
ceph-mon/2 active executing 4/lxd/1 10.246.65.52 Unit is ready and clustered
ceph-dashboard/2 error idle 10.246.65.52 hook failed: "dashboard-
logrotated/22 active idle 10.246.65.52 Unit is ready.
-------
Ceph dashboard log:
-------
2021-11-24 17:26:00 INFO unit.ceph-
2021-11-24 17:26:01 ERROR unit.ceph-
Traceback (most recent call last):
File "./src/charm.py", line 378, in _run_cmd
output = subprocess.
File "/usr/lib/
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/
raise CalledProcessEr
subprocess.
-------
Later:
-------
2021-11-24 17:26:11 ERROR unit.ceph-
Traceback (most recent call last):
File "./src/charm.py", line 597, in <module>
main(
File "/var/lib/
_emit_
File "/var/lib/
event_
File "/var/lib/
framework.
File "/var/lib/
self.
File "/var/lib/
custom_
File "/var/lib/
self.
File "/var/lib/
framework.
File "/var/lib/
self.
File "/var/lib/
custom_
File "./src/charm.py", line 427, in _configure_
self.
File "./src/charm.py", line 539, in _configure_tls
ceph_
File "/var/lib/
subprocess.
File "/usr/lib/
raise CalledProcessEr
subprocess.
2021-11-24 17:26:11 ERROR juju.worker.
-------
Testruns where this happened (3 today):
https:/
https:/
https:/
With artifacts respectively:
https:/
https:/
https:/
We also had a run with the same configuration where this did not happen and ceph-dashboard is happy:
https:/
with artifacts: https:/
Future occurrences can be found here: https:/
description: | updated |
tags: | added: cdo-qa foundations-engine |
Changed in charm-ceph-dashboard: | |
status: | Triaged → In Progress |
Changed in charm-ceph-dashboard: | |
status: | Fix Committed → Fix Released |
Changed in charm-ceph-dashboard: | |
status: | Fix Released → Fix Committed |
It looks like this is happening on the non-leader units, which suggests there's a bit of a race. The leader unit will check to see if the dashboard is enabled or not, and if its not enabled it will enable it. However, all units will attempt to apply the charm options [0] to the dashboard, regardless of whether or not the dashboard module is enabled - which is where things run into problems.
From the logs, the leader unit (ceph-dashboard/0, machine 0/lxd/1) successfully executes this sequence of events at 04:51:11 whereas ceph-dashboard/1 fails this sequence of events at 04:49:31. I think the non-leader units need to check whether or not the dashboard is enabled prior to setting the configuration or defer the event.
[0] - https:/ /opendev. org/openstack/ charm-ceph- dashboard/ src/branch/ master/ src/charm. py#L426