junk data when segmented uploading interrupted
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
python-swiftclient |
New
|
Undecided
|
Unassigned |
Bug Description
junk data will occur when segmented uploading interrupted
to reproduce:
1. Segmented uploading some big object, this will happen frequently if the object is >10G
root@proxy1 ~]# bash callswift.sh upload c data -S 1024000
2. cancel the uploads, this will resulting some segments are stored in the cluster already while the whole object is still not accessible from the client. And if you successfully uploaded the object in some future time, new segments will be stored there (new timestamps). Those old segments are big waste of disk space.
root@proxy1 ~]# bash callswift.sh delete c data
Object 'c/data' not found
[root@proxy1 ~]# bash callswift.sh list c_segments
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
data/1388140244
Changed in python-swiftclient: | |
assignee: | nobody → Ritesh (rsritesh) |
Changed in python-swiftclient: | |
assignee: | Ritesh (rsritesh) → nobody |
I thought the 'junk' segments were kept in swift to allow for 'resuming' uploads when an SLO upload failed. This is not the case. The wasteful behaviour that Zhou Yuan describes is accurate. It is doubly wasteful: It keeps unused data in swift, and it doesn't use already uploaded segments to speed up retries.
A new CLI option should be created for SLO segmented uploads, which controls what happens on partial uploads. The junk should either be deleted, or used by subsequent uploads to speed up retries. If the hash and name of a segment is the same as one that already exists, the segment should not be uploaded again.