Similar to issues in lp# 1611082, we've recently upgraded ceph and ceph-osd via changing source from cloud:trusty-liberty to cloud:trusty-mitaka.
While I had other issues surrounding the upgrades such as unattended upgrades kicking off between ceph-osd upgrade and smooshed nova-compute charm upgrade causing all sorts of issues, I ran into ceph directory permissions after a reboot of one of the ceph-osd nodes after the upgrade had completed successfully.
I saw during the ceph-osd upgrade that the charm properly took each ceph-osd daemon offline, then did a recursive chown on the /var/lib/ceph/osd/$OSD directory to ceph:ceph, and then restarted the ceph-osd daemon for that particular OSD. This stepped through on all four disks on all 7 nodes w/out a problem.
However, when I rebooted one of the nodes, I found that ceph-osd logs were showing assert failures due to permissions issues on files in /var/lib/ceph/osd/*/current directory. over 10k files had gotten re-owned to root after that chown but before the reboot a day later.
It appears that the restart of the ceph-osd daemon happening at the charm level is not using the ceph user, but still the root user, until the system is rebooted.
Recreatable issue:
Deploy cs:trusty/ceph with source= cloud:trusty- liberty and num_units=3 and osd-devices= /srv/osd cloud:trusty- mitaka cloud:trusty- liberty cloud:trusty- mitaka
juju config ceph source=
(will not upgrade, because it thinks it's firefly -> jewel)
juju config ceph source=
(will not upgrade/downgrade because it thinks it's jewel -> hammer)
juju config ceph source=
(will process upgrade)
After the mons are done upgrading, it sets the osd disks down (goes from Primary to Stray, then back to Primary after chown is done) one at a time while it upgrades the OSDs.
The ceph-osd process will not stop/start, it just goes stray for a bit.
You must then run "service ceph-mon stop id=0" (or 1 or 2) and then "service ceph-mon start id=0"
The osd will not start back up and will leave the following in the log:
2018-02-08 00:50:36.276517 7f71713e0800 0 set uid:gid to 64045:64045 (ceph:ceph) f375ff6f10edd6c 8f9c7d060d0) , process ceph-osd, pid 6534 /var/lib/ ceph/osd/ ceph-0) backend generic (magic 0x2fc12fc1) ebackend( /var/lib/ ceph/osd/ ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option ebackend( /var/lib/ ceph/osd/ ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option ebackend( /var/lib/ ceph/osd/ ceph-0) detect_features: splice is supported ebackend( /var/lib/ ceph/osd/ ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2018-02-08 00:50:36.276572 7f71713e0800 0 ceph version 10.2.9 (2ee413f77150c0
2018-02-08 00:50:36.277828 7f71713e0800 0 pidfile_write: ignore empty --pid-file
2018-02-08 00:50:36.287126 7f71713e0800 0 filestore(
2018-02-08 00:50:36.287974 7f71713e0800 0 genericfilestor
2018-02-08 00:50:36.287989 7f71713e0800 0 genericfilestor
2018-02-08 00:50:36.288016 7f71713e0800 0 genericfilestor
2018-02-08 00:50:36.290737 7f71713e0800 0 genericfilestor
2018-02-08 00:50:36.293968 7f71713e0800 1 leveldb: Recovering log #8
2018-02-08 00:50:36.309589 7f71713e0800 1 leveldb: Level-0 table #10: started
2018-02-08 00:50:36.323265 7f71713e0800 1 leveldb: Level-0 table #10: 256941 bytes OK
2018-02-08 00:50:36.324833 7f71713e0800 1 leveldb: Delete type=3 #6
2018-02-08 00:50:36.324950 7f71713e0800 1 leveldb: Delete type=0 #8
2018-02-08 00:50:36.328880 7f71713e0800 -1 FileJournal: :_open_ any: aio not supported without directio; disabling aio /var/lib/ ceph/osd/ ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled ceph/osd/ ceph-0/ journal fd 18: 1073741824 bytes, block size 4096 bytes, directio = 0, aio = 0 FileJournal. h: In function 'virtual FileJournal: :~FileJournal( )' thread 7f71713e0800 time 2018-02-08 00:50:36.333147 FileJournal. h: 440: FAILED assert(fd == -1)
2018-02-08 00:50:36.328932 7f71713e0800 0 filestore(
2018-02-08 00:50:36.332939 7f71713e0800 1 journal _open /var/lib/
2018-02-08 00:50:36.336407 7f71713e0800 -1 os/filestore/
os/filestore/
ceph version 10.2.9 (2ee413f77150c0 f375ff6f10edd6c 8f9c7d060d0) __ceph_ assert_ fail(char const*, char const*, int, char const*)+0x8b) [0x5620fd5ee06b] ctStore: :journal_ replay( un...
1: (ceph::
2: (()+0x3097d4) [0x5620fcf117d4]
3: (()+0x6e393a) [0x5620fd2eb93a]
4: (JournalingObje