Redis switch over breaks the coordination for Gnocchi
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tooz |
In Progress
|
Undecided
|
Gabor Orosz |
Bug Description
Reproduction:
Given three OpenStack controller, each of them has a single Redis instance running and configured in a master-slave mode. HAproxy is being used to direct the sessions to the actual master instance. Pacemaker is being used to manage the Redis instances HAproxy its network namespace and the Virtual IP. Gnocchi is running on all the controllers and it is configured to use Redis as a coordination backend through the Virtual IP.
1. Trigger a graceful switch over for the Redis service by banning the current Redis master instance. Pacemaker will invoke a demote on the master instance and a slave node will be promoted to become a new master.
2. As a result of that, the gnocchi-metricd workers get disconnected and some of them start reporting the following kind of errors after they managed to re-establish the connection to Redis:
2019-08-
Traceback (most recent call last):
File "/usr/lib/
work()
File "/usr/lib/
return self.callback(
File "/usr/lib/
return f(*args, **kwargs)
File "/usr/lib/
self.coord.
File "/usr/lib/
result = super(RedisDriver, self).run_
File "/usr/lib/
MemberLeftGroup
File "/usr/lib/
return list(map(lambda cb: cb(*args, **kwargs), self))
File "/usr/lib/
return list(map(lambda cb: cb(*args, **kwargs), self))
File "/usr/lib/
self.ring.
File "/usr/lib/
raise UnknownNode(node)
UnknownNode: Unknown node `fc3584da-
The very same issue is being reported under the following ticket for Gnocchi:
https:/
However, our troubleshooting and investigation indicates that this is a bug in Tooz library's Hashring and Coordination implementation.
Fix proposed to branch: master /review. opendev. org/678842
Review: https:/