better handling when 'ctr image pull' fails

Bug #1884282 reported by Kevin W Monroe
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Charmed Kubernetes Testing
Triaged
Medium
Unassigned

Bug Description

I've noticed cdk-addons failures across all supported releases for what seems to be network errors:

https://jenkins.canonical.com/k8s/job/build-release-cdk-addons-amd64-1.17/269/console
sudo lxc exec image-processor -- ctr image tag docker.io/****/registry-arm64:2.6 upload.rocks.canonical.com:5000/cdk/****/registry-arm64:2.6
ctr: image "docker.io/****/registry-arm64:2.6": not found

https://jenkins.canonical.com/k8s/job/build-release-cdk-addons-amd64-1.18/268/console
sudo lxc exec image-processor -- ctr image tag quay.io/cephcsi/cephfsplugin:v1.0.0 upload.rocks.canonical.com:5000/cdk/cephcsi/cephfsplugin:v1.0.0
ctr: image "quay.io/cephcsi/cephfsplugin:v1.0.0": not found

https://jenkins.canonical.com/k8s/job/build-release-cdk-addons-amd64-1.19/117/console
sudo lxc exec image-processor -- ctr image tag docker.io/kubernetesui/metrics-scraper:v1.0.4 upload.rocks.canonical.com:5000/cdk/kubernetesui/metrics-scraper:v1.0.4
ctr: image "docker.io/kubernetesui/metrics-scraper:v1.0.4": not found

These upstream images do indeed exist, and there doesn't seem to be any pattern about which image causes the failure.

We always continue on an "image not found" error on the pull. Our assumption was that this error would only happen if an unsupported arch was requested, so we'd just move on to the next image:

https://github.com/charmed-kubernetes/jenkins/blob/master/jobs/build-snaps/build-release-cdk-addons.groovy#L177

However, now it seems like there's a different pull error and the above "grep not found" is not breaking out of our loop. This leads to us trying to tag a nonexistent image a few lines later:

https://github.com/charmed-kubernetes/jenkins/blob/master/jobs/build-snaps/build-release-cdk-addons.groovy#L196

We'll need to figure out why the pull failure is occurring -- perhaps retrying the pull until we get something, or die after a sane number of tries.

summary: - retry ctr image pull on failure
+ better handling when 'ctr image pull' fails
George Kraft (cynerva)
Changed in charmed-kubernetes-testing:
importance: Undecided → Medium
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.