internal libpod error open /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus: no such file or directory

Bug #1803937 reported by Rabi Mishra
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

I just noticed this in the gate.

2018-11-19 05:48:00.238 14608 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] "2018-11-19 05:45:06,816 ERROR: 8688 -- Failed running docker-puppet.py for ironic",
2018-11-19 05:48:00.239 14608 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] "2018-11-19 05:45:06,816 ERROR: 8688 -- container create failed: container_linux.go:336: starting container process caused \"process_linux.go:293: applying cgroup configuration for process caused \\\"open /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus: no such file or directory\\\"\"",
2018-11-19 05:48:00.239 14608 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] ": internal libpod error",
2018-11-19 05:48:00.239 14608 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] "",

http://logs.openstack.org/57/614457/10/check/tripleo-ci-centos-7-containers-multinode/19ea5bb/logs/undercloud/home/zuul/install-undercloud.log.txt.gz#_2018-11-19_05_48_00_238

logstash query shows this failure happens from time to time.

Looks like some kind of race, but digging a little it seems there are some races in runc[1], which seems fixed sometime back. May be we're using an older runc?

[1] https://github.com/opencontainers/runc/pull/1683

Revision history for this message
Rabi Mishra (rabi) wrote :

It does not look like related to the runc pull request mentioned above. The fix seems to be there in libpod[1]. I guess this needs further investigation.

[1] https://github.com/containers/libpod/blob/master/vendor.conf#L51

Revision history for this message
Rafael Folco (rafaelfolco) wrote :
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

So apparently we're having some hits in the CI on that issue - for instance about 6 times over the 7 last days. I'd not mark that as "urgent" nor "blocker", but still, I'd put that on the "important" list.

More info about the logs via the logstash:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22applying%20cgroup%20configuration%20for%20process%20caused%5C%22

(play with the time in order to get the error in plain)

Might need to open a dedicated issue on the libpod github?

tags: added: alert
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → stein-2
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :
tags: removed: alert
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.