major upgrade: unstopped containers cannot be managed after docker update

Bug #1758069 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
Undecided
Damien Ciabrini

Bug Description

We're using docker's option --live-restore to prevent daemon restart from killing all running containers. When docker rpm is updated, the yum scriptlets will stop the docker daemon and restart it with the new version from the rpm. During this update, the containers themselves are not stopped.

If the rpm update introduces changes in docker internals, on docker restart, the daemon is not able to manage the running containers anymore. Even docker stop and docker rm might not be enough to stop the running containerized processes.

The --live-restore option is being handled appropriately during minor update: if we detect that docker is going to be updated, we forcibly stop all running containers beforehand. However we do not do the same during major upgrades.

The following error has been seen when deploying non-HA overcloud and updating docker from docker-1.13.1-47.2.gitf43d177.el7.x86_64 to docker-2:1.13.1-53.git774336d.el7.centos.x86_64:

. After a yum update, the mysql container is marked as Exited (255) in docker ps:
[root@overcloud-controller-0 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4058e814383e 192.168.24.1:8787/tripleomaster/centos-binary-mariadb:5-5 "kolla_start" 4 minutes ago Exited (255) 8 seconds ago mysql

. However the mysql process is still running:

[root@overcloud-controller-0 ~]# ps -ef | grep mysql
42434 19001 18984 0 13:42 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe
42434 19661 19001 0 13:42 ? 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --open-files-limit=-1 --pid-file=/var/lib/mysql/mariadb.pid --socket=/var/lib/mysql/mysql.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1

. Subsequent attempts at restarting the container fail:
[root@overcloud-controller-0 ~]# docker start mysql
Error response from daemon: Unknown runtime specified oci
Error: failed to start containers: mysql

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
status: In Progress → Incomplete
Revision history for this message
Damien Ciabrini (dciabrin) wrote :

After more tests, this is probably happening because the docker update comes from a different channel:

docker-1.13.1-47.2.gitf43d177.el7.x86_64 comes from delorean-queens-testing
docker-2:1.13.1-53.git774336d.el7.centos.x86_64 comes from EPEL (extras/7/x86_64)

That probably the source of the incompatibility, especially the fact that the EPEL version could not figure out what container engine my running containers have been configured to run with.

Changed in tripleo:
status: Incomplete → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Damien Ciabrini (<email address hidden>) on branch: master
Review: https://review.openstack.org/555318

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.