OSA run: units related to ceph got an empty ceph.conf

Bug #1453940 reported by Andreas Hasenack
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Landscape Server
Fix Released
High
Chris Glass
15.07
Fix Released
High
Chris Glass
Cisco-odl
Fix Released
High
Chris Glass
Trunk
Fix Released
High
Chris Glass
ceph (Juju Charms Collection)
Fix Released
High
Liam Young
cinder (Juju Charms Collection)
Fix Released
High
Liam Young
cinder-ceph (Juju Charms Collection)
Fix Released
High
Liam Young
glance (Juju Charms Collection)
Fix Released
High
Liam Young
nova-compute (Juju Charms Collection)
Fix Released
High
Liam Young

Bug Description

This is almost identical to https://bugs.launchpad.net/charms/+source/cinder/+bug/1453934. The same thing happened, but this time with glance.

After an OSA run, where we relate glance to ceph, glance image upload failed. Further inspection showed that /etc/ceph/ceph.conf on all 3 glance units was basically empty:
"""
###############################################################################
# [ WARNING ]
# glance configuration file maintained by Juju
# local changes may be overwritten.
###############################################################################
[global]
log to syslog =
 err to syslog =
 clog to syslog =
"""

After I manually destroyed the ceph-glance relation, and created it back again, then /etc/ceph/ceph.conf on the 3 glance units had valid configuration:
"""
###############################################################################
# [ WARNING ]
# glance configuration file maintained by Juju
# local changes may be overwritten.
###############################################################################
[global]
auth_supported = cephx
 #keyring = /etc/ceph/$cluster.$name.keyring
 keyring = /etc/ceph/ceph.$name.keyring
 mon host = 10.1.4.142 10.1.4.152 10.1.4.158
log to syslog = false
 err to syslog = false
 clog to syslog = false
"""

And glance started to work then.

Related branches

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Chris Holcombe (xfactor973) wrote :

I'm going to try and reproduce this and see where it leads me

Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI - adding bug link as it's a separate issue in the same chunk of c-h code that generates this ceph.conf file.
https://bugs.launchpad.net/charm-helpers/+bug/1468511

Revision history for this message
Chris Holcombe (xfactor973) wrote :

Repro worked! Tracking down what is going wrong. It's interesting that the ceph client keyring is created successfully but the conf file is not.

Revision history for this message
Chris Holcombe (xfactor973) wrote :

I think there's a logic bug in there. It looks like if you add-relation, let it fail, remove-relation, let it clean up and then add-relation again it gets further than the fails on librados permissions.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Can we please get the following info on this (and any other potential ubuntu openstack charm bugs)?:

* What Ubuntu-OpenStack combo was deployed? ie. trusty-icehouse, trusty-kilo, etc.
* `juju get <service>` output on the relevant services
* `juju status --format yaml` output [OK, ALREADY ATTACHED]

This information is necessary in order to attempt to reproduce the occurrence.

Thank you.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI: Trusty-Juno is where the reporter observed this behavior, as indicated in the all-machines log.

unit-neutron-gateway-0[4945]: 2015-05-11 16:50:08 INFO unit.neutron-gateway/0.install logger.go:40 Ign http://ubuntu-cloud.archive.canonical.com trusty-updates/juno InRelease

Revision history for this message
Alberto Donato (ack) wrote :

We've seen this issue with trusty/kilo.

Revision history for this message
Liam Young (gnuoy) wrote :

I have not been able to reproduce the bug as it's a race between ceph and glance. If glance is ready before ceph has posted all the required relation data then the Ceph context is empty and when ceph.conf is rendered the data is missing. Although this is not ideal from what I can see this should only be a transative state and when ceph posts the full set of data hooks should fire on the glance unit and rerender the ceph.conf fully populated so the bug is that ceph.conf is not being rendered once the context is complete.

I think I found the cause of the bug which has another symptom which is easy to reproduce:

1) Deploy ceph and relate to glance (I used this bundle: http://paste.ubuntu.com/12123358/)
2) juju status --format=tabular | grep 'ceph/' | wc -l
   3
2) juju run --unit glance/0 "grep mon /var/lib/charm/glance/ceph.conf" 2>/dev/null
   mon host = 10.5.42.205 10.5.42.206 10.5.42.207
3) juju add-unit ceph
4) Wait for ceph/3 to finish deploying
5) juju status --format=tabular | grep 'ceph/' | wc -l
   4
6) juju run --unit glance/0 "grep mon /var/lib/charm/glance/ceph.conf" 2>/dev/null
   mon host = 10.5.42.205 10.5.42.206 10.5.42.207

This list of mon hosts has not changed. The reason for this is that the glance charm will not render the ceph.conf unless it has a broker_rsp from the unit it is talking to, in this case ceph/3. But the only unit who replies with a broker_resp is the leader which is (probably) not ceph3.

Changed in glance (Juju Charms Collection):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Liam Young (gnuoy) wrote :

Looks like there are three bugs conspiring to cause this:

1) Ceph config is registered with incomplete context when identity relation fires
2) CephContext does not stop looping through relids when it has a complete context
   which means if ceph/0 has a full set of data but ceph/1 does not. The returned
   context is incomplete
3) When once a glance unit has issued a CephRequests it can never issue another
   one because the relation data making the request is never cleared or changed
   so a subsequent request sets the same relation data as was there previously
   being no hooks fire on the ceph side.
4) When once the ceph leader has responded to a CephRequests it can never issue another
   response because the relation data making the request is never cleared or changed
   so a subsequent response sets the same relation data as was there previously
   being no hooks fire on the client side.

Changed in ceph (Juju Charms Collection):
status: New → In Progress
Changed in cinder (Juju Charms Collection):
status: New → In Progress
Changed in cinder-ceph (Juju Charms Collection):
status: New → In Progress
Changed in nova-compute (Juju Charms Collection):
status: New → In Progress
Changed in ceph (Juju Charms Collection):
importance: Undecided → High
Changed in cinder (Juju Charms Collection):
importance: Undecided → High
Changed in cinder-ceph (Juju Charms Collection):
importance: Undecided → High
Changed in nova-compute (Juju Charms Collection):
importance: Undecided → High
Changed in ceph (Juju Charms Collection):
assignee: nobody → Liam Young (gnuoy)
Changed in cinder (Juju Charms Collection):
assignee: nobody → Liam Young (gnuoy)
Changed in cinder-ceph (Juju Charms Collection):
assignee: nobody → Liam Young (gnuoy)
Changed in nova-compute (Juju Charms Collection):
assignee: nobody → Liam Young (gnuoy)
Changed in ceph (Juju Charms Collection):
milestone: none → 15.10
Changed in cinder (Juju Charms Collection):
milestone: none → 15.10
Changed in cinder-ceph (Juju Charms Collection):
milestone: none → 15.10
Changed in glance (Juju Charms Collection):
milestone: none → 15.10
Changed in nova-compute (Juju Charms Collection):
milestone: none → 15.10
tags: added: backport-potential openstack sts
summary: - OSA run: glance related to ceph got an empty ceph.conf
+ OSA run: units related to ceph got an empty ceph.conf
David Britton (dpb)
tags: added: landscape-release-29
David Britton (dpb)
tags: added: landscape
Liam Young (gnuoy)
Changed in ceph (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in cinder (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in glance (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in nova-compute (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in cinder-ceph (Juju Charms Collection):
status: In Progress → Fix Committed
Liam Young (gnuoy)
Changed in cinder (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in ceph (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in glance (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in nova-compute (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in cinder-ceph (Juju Charms Collection):
status: Fix Committed → Fix Released
David Britton (dpb)
tags: removed: landscape-release-29
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.