"testflinger-cli artifacts" should check checksum after artifacts be downloaded
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Testflinger |
New
|
Medium
|
Paul Larson |
Bug Description
we got a random failure issue , I guess it's caused by timing
https:/
+ testflinger-cli artifacts 766584d4-
Downloading artifacts tarball...
Artifacts downloaded to artifacts.tgz
+ tar -xzf artifacts.tgz
gzip: stdin: unexpected end of file
but it passed in most of time , ex. that in previous build
https:/
+ cp artifacts/
cp: cannot stat 'artifacts/
+ /bin/true
+ rm -rf 'artifacts*'
+ testflinger-cli artifacts bf8aef7a-
Downloading artifacts tarball...
No artifacts tarball found for that job id.
+ sleep 30
+ testflinger-cli artifacts bf8aef7a-
Downloading artifacts tarball...
from build #38, "testflinger-cli artifacts" downloaded artifacts.tgz with success return code, but the artifacts.tgz is truncated.
Refer to jenkins script , it did waiting for job be done.
https:/
"
#testflinge
TEST_
cp artifacts/
rm -rf artifacts*
#retry getting the artifacts after a delay if it fails
testflinger-cli artifacts ${JOB_ID} || (sleep 30 && testflinger-cli artifacts ${JOB_ID})
"
So, "testflinger-cli artifacts" should check the checksum of artifacts.tgz then return success to make sure the downloaded tarball is correct.
Before this issue be fixed, the workaround could be check if artifacts.tgz as expect manually, if not then download again.
Changed in testflinger: | |
assignee: | nobody → Paul Larson (pwlars) |
Changed in testflinger: | |
importance: | Undecided → Medium |
Yes, there's a possible window where your job can be finished executing all the commands, but the artifacts tarball is not yet available on the server. This is because the agent automatically takes care of that part for you. In fact, it's also smart enough to retry it if the upload fails for any reason. At the time, I was thinking it would be better to return control to the user as soon as the test run is complete, and let the agent take care of all this in the background. But if you have something monitoring the test run and trying to collect the results (like jenkins, etc), you want to get those artifacts before finishing. The way I deal with this in our jenkins jobs is to do something like this:
testflinger-cli artifacts ${{JOB_ID}} || (sleep 30 && testflinger-cli artifacts ${{JOB_ID}})
I'm not sure about making poll hang indefinitely until the artifacts are confirmed to be uploaded - it's actually possible that the agent has moved on to processing other test jobs if that happens, but maybe we can try once and if it fails, warn the user that they may need to wait for their artifacts to be available for download. I'm not sure if that really solves your problem though, because I think you may still need to handle the possibility that your artifacts are not there yet. But it might be ok for 99% of the cases, because in reality, the upload rarely if ever fails on the first try.
what do you think?