major upgrade: unstopped containers cannot be managed after docker update
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Invalid
|
Undecided
|
Damien Ciabrini |
Bug Description
We're using docker's option --live-restore to prevent daemon restart from killing all running containers. When docker rpm is updated, the yum scriptlets will stop the docker daemon and restart it with the new version from the rpm. During this update, the containers themselves are not stopped.
If the rpm update introduces changes in docker internals, on docker restart, the daemon is not able to manage the running containers anymore. Even docker stop and docker rm might not be enough to stop the running containerized processes.
The --live-restore option is being handled appropriately during minor update: if we detect that docker is going to be updated, we forcibly stop all running containers beforehand. However we do not do the same during major upgrades.
The following error has been seen when deploying non-HA overcloud and updating docker from docker-
. After a yum update, the mysql container is marked as Exited (255) in docker ps:
[root@overcloud
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4058e814383e 192.168.
. However the mysql process is still running:
[root@overcloud
42434 19001 18984 0 13:42 ? 00:00:00 /bin/sh /usr/bin/
42434 19661 19001 0 13:42 ? 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=
. Subsequent attempts at restarting the container fail:
[root@overcloud
Error response from daemon: Unknown runtime specified oci
Error: failed to start containers: mysql
Changed in tripleo: | |
status: | Triaged → In Progress |
Changed in tripleo: | |
status: | In Progress → Incomplete |
Changed in tripleo: | |
status: | Incomplete → Invalid |
After more tests, this is probably happening because the docker update comes from a different channel:
docker- 1.13.1- 47.2.gitf43d177 .el7.x86_ 64 comes from delorean- queens- testing 2:1.13. 1-53.git774336d .el7.centos. x86_64 comes from EPEL (extras/7/x86_64)
docker-
That probably the source of the incompatibility, especially the fact that the EPEL version could not figure out what container engine my running containers have been configured to run with.