At termination, LXC rootfs is not always unmounted before rmtree() is called
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Pádraig Brady | ||
Essex |
Fix Released
|
Undecided
|
Pádraig Brady | ||
nova (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Precise |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
nova version used:
commit 20c6bb6c9000fa0
Merge: aedaf10 0876cf5
Author: Jenkins <email address hidden>
Date: Wed Aug 29 14:33:01 2012 +0000
Merge "Do not run pylint by default"
Symptom:
The rootfs of LXC instance is not unmounted before rmtree() is called in the nova/virt/
I've seen this problem in Essex and in Folsom.
It does not happen always, though.
I suspect there is timing issues between unmount() and rmtree().
This bug eventually leads to "no free nbd device".
Example:
After terminating instance i-00000005, I still see that its rootfs is mounted to /dev/nbd15:
$ mount
/dev/nbd15 on /usr/local/
Since it is not unmounted before rmtree() is called, nova-compute complains.
Here is the log of nova-compute:
2012-09-04 09:11:46 INFO nova.virt.
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:46 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:47 DEBUG nova.utils [req-52c4813e-
2012-09-04 09:11:47 DEBUG nova.network.
2012-09-04 09:11:47 INFO nova.virt.
2012-09-04 09:11:47 ERROR nova.virt.
I can manually unmount it and release /dev/nbd15 to finish the
termination process.
Without doing that, nbd15 is permanently occupied by the terminated instance.
tags: | added: lxc |
Changed in nova: | |
milestone: | none → folsom-rc1 |
Changed in nova: | |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | folsom-rc1 → 2012.2 |
Changed in nova (Ubuntu): | |
status: | New → Fix Released |
tags: |
added: verification-done removed: verification-needed |
It's possible that umount returns before actually completing the umount
(communication with the qemu-nbd processes etc.). Though that would
be a bug (as we don't specify the -z option). Now there is a similar bug in
FUSE that we had to work around in the libguestfs implementation, so it's possible.
What kernel version/distro are you using?
Also I'm a bit confused that when you ran the mount command,
the mount point showed up. If it was a race, then mount point
would probably be unmounted before you ran that command.
There are definitely no mentions of the following in the log?
"Failed to unmount container filesystem"
If not and this only happens intermittently, then nova/nova/ virt/disk/ api.py doesn't find the corresponding device.
logically either umount silently fails somethimes with nbd,
or sometimes the _device_for_path() function in
/usr/local/
Both surprising TBH.