ceilometer group partitioning coordination with tooz+redis+sentinel fails to failover to new master

Bug #1434043 reported by Chris Dent
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Undecided
Unassigned
tooz
Status tracked in Kilo
Kilo
Fix Released
Medium
Chris Dent
Liberty
Fix Released
Medium
Chris Dent

Bug Description

When using tooz configured with multiple sentinels to coordinate group membership for the central (and other) agents the coordinator fails to update to use a new master redis server.

This appears to be happening because there's no retry logic when there is a ToozConnectionError which would (eventually) lead to tooz.driver.redis:_make_client being called to query the sentinels for the new master.

There's a question about where the retry logic should go: in ceilometer.coordination? in the tooz redis driver?

When the redis sentinel code was first created there was a (now proven to be mistaken) belief that there already was retry logic in ceilometer. However since the sentinel handling is quite specific in the way it works, and tooz is a tool for lots of stuff besides ceilometer, it should probably go in there.

There are some (now out of date) notes that led to this discovery at: https://tank.peermore.com/tanks/cdent-rhat/TestCeiloRedisPackstack#things-dont-work

Julien Danjou (jdanjou)
Changed in python-tooz:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tooz (master)

Fix proposed to branch: master
Review: https://review.openstack.org/165890

Changed in python-tooz:
assignee: nobody → Chris Dent (chdent)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/166291

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tooz (master)

Change abandoned by Chris Dent (<email address hidden>) on branch: master
Review: https://review.openstack.org/165890
Reason: Abandoned in favor of: I8fd672e664d98097944a7c984cadab5fb08dd2d6

Thanks to harlowj for pointing me in the right direction.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tooz (master)

Reviewed: https://review.openstack.org/166291
Committed: https://git.openstack.org/cgit/openstack/tooz/commit/?id=54d6bb1c94270d2794ecefbcaf3f8832010e3d58
Submitter: Jenkins
Branch: master

commit 54d6bb1c94270d2794ecefbcaf3f8832010e3d58
Author: Chris Dent <email address hidden>
Date: Fri Mar 20 15:50:56 2015 +0000

    Use a sentinel connection pool to manage failover

    When configured to use sentinel with the redis driver, allow the
    redis-py client to manage the connection to the currently elected
    master.

    'master_for' will return a StricRedis client which is bound to a
    connection pool that queries the Sentinel[s] when providing
    connections from the pool.

    This means that failover handling is automatic as long as the
    sentinels can be reached and they have elected a new master.

    Change-Id: I8fd672e664d98097944a7c984cadab5fb08dd2d6
    Closes-Bug: #1434043

Changed in python-tooz:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tooz (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/167598

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tooz (stable/kilo)

Reviewed: https://review.openstack.org/167598
Committed: https://git.openstack.org/cgit/openstack/tooz/commit/?id=01859a03e84e7b5de8dc08bbdff0d458e4130c51
Submitter: Jenkins
Branch: stable/kilo

commit 01859a03e84e7b5de8dc08bbdff0d458e4130c51
Author: Chris Dent <email address hidden>
Date: Fri Mar 20 15:50:56 2015 +0000

    Use a sentinel connection pool to manage failover

    When configured to use sentinel with the redis driver, allow the
    redis-py client to manage the connection to the currently elected
    master.

    'master_for' will return a StricRedis client which is bound to a
    connection pool that queries the Sentinel[s] when providing
    connections from the pool.

    This means that failover handling is automatic as long as the
    sentinels can be reached and they have elected a new master.

    Change-Id: I8fd672e664d98097944a7c984cadab5fb08dd2d6
    Closes-Bug: #1434043
    (cherry picked from commit 54d6bb1c94270d2794ecefbcaf3f8832010e3d58)

tags: added: in-stable-kilo
Changed in python-tooz:
milestone: none → 0.13.2
status: Fix Committed → Fix Released
Julien Danjou (jdanjou)
Changed in python-tooz:
milestone: 0.13.2 → 0.14.0
no longer affects: ceilometer/kilo
Julien Danjou (jdanjou)
Changed in python-tooz:
milestone: 0.14.0 → 0.13.2
no longer affects: ceilometer/liberty
Chris Dent (cdent)
Changed in ceilometer:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.