empty 5GB container DB

Bug #1691648 reported by Hugo Kou
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Medium
Unassigned

Bug Description

The frequent PUT/DELETE container size keeps growing. The size remain big even all objects were DELETED (empty container). Should replicator vacuum it according specific conditions ?

[root@prdd1slzswcon04 94a91c0b60f945ba2054b057cd4f1979]# sqlite3 94a91c0b60f945ba2054b057cd4f1979.db
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from object;
sqlite>

[root@prdd1slzswcon04 94a91c0b60f945ba2054b057cd4f1979]# curl -g -I -XHEAD "http://10.125.229.161:6001/d693/152228/AUTH_ems_prod/7488"
HTTP/1.1 204 No Content
X-Backend-Timestamp: 1414598357.76436
X-Container-Object-Count: 0
X-Put-Timestamp: 1414598357.76549
X-Backend-Put-Timestamp: 1414598357.76549

### Original empty DB size ###
-rw------- 1 root root 5.7G May 17 21:42 94a91c0b60f945ba2054b057cd4f1979.db

### Vacuum Test ###
-rw------- 1 root root 19K May 17 21:44 94a91c0b60f945ba2054b057cd4f1979.db

* Without Vacuum brings several potential issues. Includes https://bugs.launchpad.net/swift/+bug/1691566

* Wasting network bandwidth.
* Longer DB lock time.

Hugo

Revision history for this message
Matthew Oliver (matt-0) wrote :

The idea of vacuuming has come up before. In the past we haven't bothered with vacuum because the normal life cycle of a Swift container in most clusters is heavy PUT focused and vacumming was an overhead that isn't worth it.

I don't think anyone is against it per say.. but it would be nice to find out how much overhead we get, and see if the benefits was worth it.

Having said that, if we were to do it, doing it only when we are going to already need to do a bunch of I/O rather then ticking on some timer would be better in my opinion. Like you say, it would be nice to send a vacuumed database when we need to push the whole database to a new node, say on a rebalance, or on a rsync_then_merge. If we do it before we send however, I could imagine an IO spike on the node replicating the DB (vacuuming before we send) and the when writing the DB on the recipient's end, so that would be the overhead I'm speaking of.

I would think it would look something like the attached patch

NOTE: the patch is untested, seems to run, but is as a demonstration.. but could be starting point to test the effects of vacuuming on a cluster. (you can turn it on/off on db replicaors).

clayg (clay-gerrard)
Changed in swift:
importance: Undecided → Medium
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.