inconsistent suffix hashes after ssync replication of a tombstone
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
When an object dir has a .data and a .meta file (so object_
cut-paste from commit message of patch that will be proposed:
Consider two replicas of the same object whose ondisk files
have diverged due to failures:
A has t2.ts
B has t1.data, t4.meta
(The DELETE at t2 did not make it to B. The POST at t4 was
rejected by A.)
After ssync replication the two ondisk file sets will not be
consistent:
A has t2.ts (ssync cannot POST t4.meta to this node)
B has t2.ts, t4.meta (ssync should not delete t4.meta,
Consequenty the two nodes will report different hashes for the
object's suffix, and replication will repeat, always with the
inconsistent outcome. This scenario is reproduced by the probe
test added in this patch.
(Note that rsync replication does result in (t2.ts, t4.meta)
on both nodes.)
The solution is to change the way that suffix hashes are
calculated. Currently the names of *all* files found in each
object dir are added to the hash. With this patch the
timestamps of only those files that could be used to
construct a valid diskfile are added to the hash. File
extensions are appended to the timestamp so that in most
'normal' situations the result of the hashing is the same
as before this patch. That avoids a storm of hash mismatches
when this patch is deployed in an existing cluster.
In the problem case described above, t4.meta is no longer
added to the hash, since it is not useful for constructing
a diskfile. (Note that t4.meta is not deleted because it
may become useful should a t3.data be replicated in future).
fix proposed here https:/ /review. openstack. org/#/c/ 267788