nova-compute scale out always results in service restart if related with ceph

Bug #1694963 reported by Edward Hope-Morley
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Fix Released
High
Edward Hope-Morley

Bug Description

When related with Ceph, the nova-compute charm performs a check to see whether its local copy of the client key matches that from ceph. If it doesnt then it will overwrite it and restart nova-compute. Trouble is that this is always happening since the check is broken as it does not strip whitespace from key extracted from the virsh secret before comparing.

UPDATE: the above description is not the complete problem - see comment #5 below

Related branches

description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Changed in charm-nova-compute:
assignee: nobody → Edward Hope-Morley (hopem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.openstack.org/469844
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=19cc5ff0942b6932587f1148d151a3a085926a39
Submitter: Jenkins
Branch: master

commit 19cc5ff0942b6932587f1148d151a3a085926a39
Author: Edward Hope-Morley <email address hidden>
Date: Thu Jun 1 12:27:15 2017 +0100

    Fix ceph client virsh secret check

    Previously we were not stripping whitespace or newlines
    from the value returned from virsh secret-get-value which
    resulted in the key comparison always returning False and
    services being restarted.

    Change-Id: If465183860dc9d6e13c81d363791c06c0e5adb76
    Closes-Bug: 1694963

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/17.02)

Fix proposed to branch: stable/17.02
Review: https://review.openstack.org/470186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/17.02)

Reviewed: https://review.openstack.org/470186
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=22f2d4e0fee4e229ac3de1c40202f2a2539d987e
Submitter: Jenkins
Branch: stable/17.02

commit 22f2d4e0fee4e229ac3de1c40202f2a2539d987e
Author: Edward Hope-Morley <email address hidden>
Date: Thu Jun 1 12:27:15 2017 +0100

    Fix ceph client virsh secret check

    Previously we were not stripping whitespace or newlines
    from the value returned from virsh secret-get-value which
    resulted in the key comparison always returning False and
    services being restarted.

    Includes charm-helpers rev 708 required to get amulet
    unstuck.

    Change-Id: If465183860dc9d6e13c81d363791c06c0e5adb76
    Closes-Bug: 1694963
    (cherry picked from commit 19cc5ff0942b6932587f1148d151a3a085926a39)

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I'm re-opening this bug as it appears that the previous fix is not sufficient to resolve the problem.

When nova-compute is related with ceph it requests resources such as keys and pools by using the ceph broker api. When requests are responded to, the unit that sent the request gets a response containing an echo of the request id it sent out along with its request. Now the problem we have is that these responses go to all nova-compute units so each time a new nova-compute unit relates with ceph and sends a request, its response is send to all units and they currently have no way to know whether they have already received their own response so they blindly all restart nova-compute (see code at [1]).

So e.g. a request looks like:

    ~$ juju run --unit nova-compute/0 'relation-get -r `relation-ids ceph` - nova-compute/0'
    broker_req: '{"api-version": 1, "request-id": "f1c63e45-4b5e-11e7-8a92-fa163e37c682",
    "ops": []}'
    private-address: 10.5.59.164

and a response looks like:

    ~$ juju run --unit nova-compute/0 'relation-get -r `relation-ids ceph` - ceph/2'
    auth: cephx
    broker-rsp-nova-compute-0: '{"request-id": "f1c63e45-4b5e-11e7-8a92-fa163e37c682",
      "exit-code": 0}'
    broker-rsp-nova-compute-1: '{"request-id": "457b8757-4b5f-11e7-a324-fa163e2db5b7",
      "exit-code": 0}'
    broker-rsp-nova-compute-2: '{"request-id": "31c08f4b-4b5f-11e7-9756-fa163e7e11e1",
      "exit-code": 0}'
    broker-rsp-nova-compute-3: '{"request-id": "cfc17751-4b63-11e7-976d-fa163e6cc737",
      "exit-code": 0}'
    broker-rsp-nova-compute-4: '{"request-id": "0dab0cb5-4b6b-11e7-8e4b-fa163e053776",
      "exit-code": 0}'
    broker-rsp-nova-compute-4: '{"request-id": "3ed7d4d9-4b7b-11e7-a2b9-fa163e70844e",
      "exit-code": 0}'
    broker_rsp: '{"request-id": "3ed7d4d9-4b7b-11e7-a2b9-fa163e70844e", "exit-code": 0}'
    ceph-public-address: 10.5.59.150
    key: AQAmvzdZ4ekkFhAAn1KrnC5qT9MA7HAl0Ymvnw==
    private-address: 10.5.59.150

In this case the request from the new unit nova-compute-4 has be responded to all compute units which results in all of them restarting the nova-compute service.

[1] https://github.com/openstack/charm-nova-compute/blob/master/hooks/nova_compute_hooks.py#L402

Changed in charm-nova-compute:
status: Fix Committed → Triaged
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/471869

Changed in charm-nova-compute:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.openstack.org/471869
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=cff1cfa5a420963411adae9c6d21be56225d18e7
Submitter: Jenkins
Branch: master

commit cff1cfa5a420963411adae9c6d21be56225d18e7
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 7 18:39:10 2017 +0100

    Prevent repeated nova-compute service restart

    Once a ceph broker request has completed and the
    nova-compute service has been restarted as a result,
    ensure that this does not get retriggered when the
    ceph relation fires as a result of a peer unit's data
    coming onto the wire and no new information is provided
    to the local unit.

    Change-Id: Ie359a0ec9af7edfb9d453dcf4dbd9880af324d37
    Closes-Bug: 1694963

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/17.02)

Fix proposed to branch: stable/17.02
Review: https://review.openstack.org/479340

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/17.02)

Reviewed: https://review.openstack.org/479340
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=a697c61377c356e1fc95099bc49ab71edb8db136
Submitter: Jenkins
Branch: stable/17.02

commit a697c61377c356e1fc95099bc49ab71edb8db136
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 7 18:39:10 2017 +0100

    Prevent repeated nova-compute service restart

    Once a ceph broker request has completed and the
    nova-compute service has been restarted as a result,
    ensure that this does not get retriggered when the
    ceph relation fires as a result of a peer unit's data
    coming onto the wire and no new information is provided
    to the local unit.

    Change-Id: Ie359a0ec9af7edfb9d453dcf4dbd9880af324d37
    Closes-Bug: 1694963
    (cherry picked from commit cff1cfa5a420963411adae9c6d21be56225d18e7)

James Page (james-page)
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.