Replicator/reconstructor can't rehash partitions on full drives

Bug #1491676 reported by Caleb Tennis
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Medium
Unassigned

Bug Description

If a disk is completely full, but is missing a tombstone, the reconstructor will try and push one over using some combination of REPLICATE and DELETE (both of which fail):

Sep 3 00:46:43 localhost object-server: ERROR __call__ error with DELETE /d4/484/AUTH_user1/ssbench_000072_default_policy/large_000088 : #012Traceback (most recent call last):#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/server.py", line 956, in __call__#012 res = method(req)#012 File "/usr/local/lib/python2.7/dist-packages/swift/common/utils.py", line 2671, in wrapped#012 return func(*a, **kw)#012 File "/usr/local/lib/python2.7/dist-packages/swift/common/utils.py", line 1208, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/server.py", line 899, in DELETE#012 disk_file.delete(req_timestamp)#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/diskfile.py", line 1849, in delete#012 with self.create() as deleter:#012 File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/diskfile.py", line 1784, in create#012 raise DiskFileNoSpace()#012DiskFileNoSpace

Sep 3 00:46:43 localhost object-server: ERROR __call__ error with REPLICATE /d4/484/195-2a8-66b-9ee : [Errno 28] No space left on device: '/srv/node/d4/objects-1/484/tmprajHAV.tmp'

This may not be a bug, per se, it just seems like if it's doing two operations here it should be smarter and only do one and not the 2nd if the no space left error is occurring.

Note that nothing exists for AUTH_user1/ssbench_000072_default_policy/large_000088 in /d4/484. I'm not sure why it's actually trying to DELETE it (maybe to create a tombstone?). The actual object itself was deleted hours ago and all that's left are other tombstones in the system.

Tags: ec
Revision history for this message
Caleb Tennis (ctennis) wrote :

A side effect of this bug is that full EC drives never become unfull.

I have a node with 8 drives, and I took 5 of them out of the ring in order to force more data onto the others. This filled up the other 3. Then I put the 5 back into the ring. 12 hours later the data distribution looks like:

ubuntu@ip-172-30-3-43:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 8115168 1524844 6155048 20% /
none 4 0 4 0% /sys/fs/cgroup
udev 2018460 12 2018448 1% /dev
tmpfs 404684 500 404184 1% /run
none 5120 0 5120 0% /run/lock
none 2023420 60 2023360 1% /run/shm
none 102400 0 102400 0% /run/user
/dev/xvdb 8378368 6138984 2239384 74% /srv/node/d0
/dev/xvdf 8378368 8378316 52 100% /srv/node/d4
/dev/xvdd 8378368 5780160 2598208 69% /srv/node/d2
/dev/xvdc 8378368 5820984 2557384 70% /srv/node/d1
/dev/xvdg 8378368 8378348 20 100% /srv/node/d5
/dev/xvde 8378368 5988340 2390028 72% /srv/node/d3
/dev/xvdh 8378368 6922720 1455648 83% /srv/node/d6
/dev/xvdi 8378368 8378328 40 100% /srv/node/d7

The 3 "full" drives are still full, due to the errors noted above in this ticket. The ring has rebalanced so the # of partitions assigned to each drive are approximately equal.

Changed in swift:
status: New → Incomplete
Revision history for this message
Ganesh Maharaj Mahalingam (ganesh-mahalingam) wrote :

I found the same scenario when i was trying to reproduce https://bugs.launchpad.net/swift/+bug/1491675. Are there recommendations on recovering those drives from such a situation?

Revision history for this message
clayg (clay-gerrard) wrote :

related to lp bug #1359160

Changed in swift:
status: Incomplete → Confirmed
importance: Undecided → Medium
Tim Burke (1-tim-z)
summary: - Reconstructor has some troubles with tombstones on full drives
+ Replicator/reconstructor can't rehash partitions on full drives
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.