workload-status is wrong

Bug #1517940 reported by Edward Hope-Morley
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
ceph (Juju Charms Collection)
Fix Released
High
Liam Young
Nominated for Trusty by Adam Collard

Bug Description

ubuntu@hopem-bastion:~/ceph-rados-gateway$ juju status ceph-radosgw| grep -E -A3 "^ workload-status:"
        workload-status:
          current: blocked
          message: 'Missing relations: mon'
          since: 19 Nov 2015 14:39:32Z
--
        workload-status:
          current: blocked
          message: 'Missing relations: mon'
          since: 19 Nov 2015 14:39:32Z
--
        workload-status:
          current: blocked
          message: 'Missing relations: mon'
          since: 19 Nov 2015 14:39:33Z
ubuntu@hopem-bastion:~/ceph-rados-gateway$ juju status ceph-radosgw| grep -A1 " mon:"
      mon:
      - ceph

Revision history for this message
James Page (james-page) wrote :

Did this resolve? i.e. was it a transitory problem?

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Nope. All units had been idle for a while.

James Page (james-page)
Changed in ceph-radosgw (Juju Charms Collection):
milestone: 16.01 → 16.04
Revision history for this message
Ursula Junque (ursinha) wrote :

This just happened to me while deploying openstack. I've been waiting for ~20 mins and it remains like this, ceph-mon is ready and ceph-radosgw units are blocked missing mon relations. I'm using cs:trusty/ceph-radosgw-19 and cs:~chris.macnaughton/trusty/ceph-mon-7.

Ursula Junque (ursinha)
tags: added: landscape
Ursula Junque (ursinha)
tags: added: kanban-cross-team
tags: removed: kanban-cross-team
Revision history for this message
Liam Young (gnuoy) wrote :

I can't reproduce this. I've tried adding the relations to ceph-radosgw in various orders but it always seems to work. If someone hits this again please update the bug with:

* Juju version
* Output of "juju status-history <unit>" for each ceph-radosgw unit
* Logs from /var/log/juju on the ceph-radosgw units
* Relation data being passed from ceph to the gateway node. You can capture this with juju run, something like:

ceph_units=$(juju status ceph --format=oneline | awk '/ceph/ {gsub(/:/,""); print $2}' | paste -sd " ")
ceph_radosgw_units=$(juju status ceph-radosgw --format=oneline | awk '/ceph/ {gsub(/:/,""); print $2}' | paste -sd " ")
for rgw_unit in $ceph_radosgw_units; do
    echo "${rgw_unit}:"
    for ceph_unit in $ceph_units; do
        echo " $ceph_unit"
        juju run --unit ${rgw_unit} "relation-get -r \$(relation-ids mon) - $ceph_unit"
        echo ""
    done
done

Please make sure you obfuscate any sensitive data from the above command.

Changed in ceph-radosgw (Juju Charms Collection):
status: New → Incomplete
Revision history for this message
Adam Collard (adam-collard) wrote :

Liam - note that we only started seeing this when using the ceph-mon charm[0]

[0] https://jujucharms.com/u/chris.macnaughton/ceph-mon/trusty/

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have a deployment up with this problem. I'm gathering the info you requested, but if you are around it can still be debugged live.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

juju 1.25.3

status-history for ceph-radosgw/0 (only one unit): http://pastebin.ubuntu.com/15090506/
/var/log/juju/* for ceph-radosgw/0: ceph-radosgw-0-juju-logs.tar.bz2 (attached)
juju status: attached as juju-status.txt
juju status tabular: attached as juju-status-tabular.txt

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This topology has ceph-mon, ceph-osd and ceph-radosgw. I changed your script to get the relation data between ceph-mon and ceph-radosgw (http://pastebin.ubuntu.com/15090591/), and here is the (uninteresting?) output:

ceph-radosgw/0:
 ceph-mon/0
private-address: 10.96.4.38

 ceph-mon/1
private-address: 10.96.5.54

 ceph-mon/2
private-address: 10.96.4.130

Changed in ceph-radosgw (Juju Charms Collection):
status: Incomplete → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And in case it's not obvious at first, It's not just about the workload-status message being wrong, the service is really non-functional.

Liam Young (gnuoy)
Changed in ceph-radosgw (Juju Charms Collection):
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Liam Young (gnuoy) wrote :

I can reproduce this by deploying the full stack with one too few ceph-mons and then add the remaining unit once the other services are deployed and joined. ie

$ cat cephradosgw.yaml
openstack-services:
  services:
    mysql:
      branch: lp:~openstack-charmers/charms/trusty/percona-cluster/next
      constraints: mem=1G
      options:
        dataset-size: 50%
    ceph-mon:
      charm: cs:~chris.macnaughton/trusty/ceph-mon-10
      num_units: 2
      constraints: mem=1G
      options:
        monitor-count: 3
        fsid: 6547bd3e-1397-11e2-82e5-53567c8d32dc
        monitor-secret: AQCXrnZQwI7KGBAAiPofmKEXKxu5bUzoYLVkbQ==

    ceph-osd:
      branch: lp:~openstack-charmers/charms/trusty/ceph-osd/next
      num_units: 3
      constraints: mem=1G
      options:
        osd-devices: /dev/vdb
        osd-reformat: "yes"
        ephemeral-unmount: /mnt
    keystone:
      branch: lp:~openstack-charmers/charms/trusty/keystone/next
      constraints: mem=1G
      options:
        admin-password: openstack
        admin-token: ubuntutesting
    ceph-radosgw:
      branch: lp:~openstack-charmers/charms/trusty/ceph-radosgw/next
  relations:
    - [ keystone, mysql ]
    - [ ceph-radosgw, keystone ]
    - [ ceph-radosgw, ceph-mon ]
    - [ ceph-mon, ceph-osd ]
trusty-liberty:
  inherits: openstack-services
  series: trusty
  overrides:
    openstack-origin: cloud:trusty-liberty
    source: cloud:trusty-liberty

$ juju-deployer -c cephradosgw.yaml trusty-liberty
$ juju add-unit ceph-mon

Revision history for this message
Liam Young (gnuoy) wrote :

So, it looks like we have two bugs at play here.

The first is that mon-relation-changed calls notify_radosgws() which calls radosgw_relation() but it only supplies a relid. It does not include a remote unit. Given we are not in the context of the radosgw relation the settings = relation_get(rid=relid) returns the wrong data, which means the create-pool request from radosgw is missed.

The second, which I'm still looking at, is that at the point when the last mon-relation-changed hook fires cepg.is_quorom is false

James Page (james-page)
Changed in ceph-radosgw (Juju Charms Collection):
status: Confirmed → In Progress
affects: ceph-radosgw (Juju Charms Collection) → ceph (Juju Charms Collection)
Changed in ceph (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
James Page (james-page) wrote :

I've merged the two proposed changes for ceph and ceph-mon; however I think the same problem might exist in the client_relation_changed bits - these never get re-execed in the event that the hook events fires prior to quorum.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Could we have a backport of these fixes into 16.01, or rather, the charm store?

James Page (james-page)
Changed in ceph (Juju Charms Collection):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.