Upgrade to Jewel impacts cluster by taking entire node offline
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
High
|
Billy Olsen | ||
OpenStack Ceph Charm (Retired) |
Won't Fix
|
High
|
Unassigned | ||
charms.ceph |
Fix Released
|
High
|
Unassigned | ||
ceph (Juju Charms Collection) |
Invalid
|
Undecided
|
Unassigned | ||
ceph-osd (Juju Charms Collection) |
Invalid
|
Critical
|
Billy Olsen |
Bug Description
When upgrading from Hammer->Jewel, all of the OSDs are stopped on a single node for the duration of the recursive chown on /var/lib/ceph. In production environments, this recursive chown is non-trivial and takes hours to complete. While all OSDs are down on the node the Ceph cluster is essentially running missing one node.
If the user set noout on the cluster, then the OSDs will not be marked as out but may have a large amount of backfilling to do when restarted. During this period, the cluster is at greater risk to an outage of another OSD/node. When one considers that nodes are becoming more dense with larger disks, this certainly fails to scale at production levels (10-20 OSDs per node @ ~4-8 TB/OSD are becoming common).
The upgrade process should intelligently decide to make use of the `setuser match path /var/lib/
Related branches
- Jorge Niedbalski (community): Approve
-
Diff: 296 lines (+190/-30)2 files modifiedcharmhelpers/core/host.py (+158/-30)
tests/core/test_host.py (+32/-0)
- Jorge Niedbalski (community): Approve
-
Diff: 49 lines (+28/-0)2 files modifiedcharmhelpers/core/host.py (+14/-0)
tests/core/test_host.py (+14/-0)
Changed in ceph-osd (Juju Charms Collection): | |
milestone: | none → 17.01 |
importance: | Undecided → Critical |
Changed in ceph (Juju Charms Collection): | |
status: | New → Invalid |
Changed in charm-ceph-osd: | |
assignee: | nobody → Billy Olsen (billy-olsen) |
importance: | Undecided → Critical |
status: | New → In Progress |
Changed in ceph-osd (Juju Charms Collection): | |
status: | In Progress → Invalid |
Changed in charm-ceph-osd: | |
importance: | Critical → High |
milestone: | none → 17.05 |
Changed in charm-ceph: | |
importance: | Undecided → Medium |
importance: | Medium → High |
status: | New → Triaged |
Changed in charms.ceph: | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in charm-ceph: | |
milestone: | none → 17.05 |
Changed in charm-ceph: | |
milestone: | 17.05 → 17.08 |
Changed in charm-ceph-osd: | |
milestone: | 17.05 → 17.08 |
tags: | added: stable-backport |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
Changed in charm-ceph: | |
milestone: | 17.08 → 17.11 |
Changed in charms.ceph: | |
status: | Triaged → Fix Released |
Changed in charm-ceph: | |
milestone: | 17.11 → 18.02 |
Reviewed: https:/ /review. openstack. org/430062 /git.openstack. org/cgit/ openstack/ charms. ceph/commit/ ?id=c421aa74290 9f78c8b7c9a4548 874795b70dad87
Committed: https:/
Submitter: Jenkins
Branch: master
commit c421aa742909f78 c8b7c9a45488747 95b70dad87
Author: Billy Olsen <email address hidden>
Date: Mon Feb 6 21:59:51 2017 -0700
Roll osd ownership changes through node
Change the OSD upgrade path so that the file ownership change
for the OSD directories are run one OSD at a time rather than
all of the OSDs at once.
Partial-Bug: #1662591
Change-Id: I3a1cf05207c070 a8699e7ba749a05 87b619d4679