Kernel panic during 'docker -D stop <container_name>'

Bug #1485954 reported by Dennis Dmitriev
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Aleksander Mogylchenko
6.0.x
Invalid
Critical
Andrew Woodward
6.1.x
Invalid
Critical
Andrew Woodward

Bug Description

Reproduced on CI, 'test ha_one_controller_backup_restore': http://jenkins-product.srt.mirantis.net:8080/view/7.0_swarm/job/7.0.system_test.ubuntu.known_issues/62
ISO: #185

Steps to reproduce:
            1. Create cluster in HA mode
            2. Add 1 node with controller role
            3. Add 1 node with compute role
            4. Deploy the cluster
            5. Backup master using command 'dockerctl backup'
            6. Restore master using command 'dockerctl restore /var/backup/fuel/<backup_folder>/<archive_name.lrz>

Expected result: Fuel master node is restored
Actual result: Fuel master node got kernel panic while executing 'dockerctl restore ...'

Manual reproduce:
 - backup: http://paste.openstack.org/show/420268/
 - restore: http://paste.openstack.org/show/420271/
                    (see screenshot attached to the bug)

On the ISO#156 this test case was working.

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

[root@nailgun ~]# fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: e24ca066bf6160bc1e419aaa5d486cad1aaa937d
auth_required: true
build_id: 2015-08-17_17-24-26
build_number: '185'
feature_groups:
- mirantis
fuel-agent_sha: 57145b1d8804389304cd04322ba0fb3dc9d30327
fuel-library_sha: 0062e69db17f8a63f85996039bdefa87aea498e1
fuel-nailgun-agent_sha: e01693992d7a0304d926b922b43f3b747c35964c
fuel-ostf_sha: 17786b86b78e5b66d2b1c15500186648df10c63d
fuelmain_sha: c0e6a17f014d86c8075ee896055a61fbe27e52b0
nailgun_sha: 4710801a2f4a6d61d652f8f1e64215d9dde37d2e
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 4c74a60aa60c06c136d9197c7d09fa4f8c8e2863
release: '7.0'
release_versions:
  2015.1.0-7.0:
    VERSION:
      api: '1.0'
      astute_sha: e24ca066bf6160bc1e419aaa5d486cad1aaa937d
      build_id: 2015-08-17_17-24-26
      build_number: '185'
      feature_groups:
      - mirantis
      fuel-agent_sha: 57145b1d8804389304cd04322ba0fb3dc9d30327
      fuel-library_sha: 0062e69db17f8a63f85996039bdefa87aea498e1
      fuel-nailgun-agent_sha: e01693992d7a0304d926b922b43f3b747c35964c
      fuel-ostf_sha: 17786b86b78e5b66d2b1c15500186648df10c63d
      fuelmain_sha: c0e6a17f014d86c8075ee896055a61fbe27e52b0
      nailgun_sha: 4710801a2f4a6d61d652f8f1e64215d9dde37d2e
      openstack_version: 2015.1.0-7.0
      production: docker
      python-fuelclient_sha: 4c74a60aa60c06c136d9197c7d09fa4f8c8e2863
      release: '7.0'

description: updated
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Matthew Mosesohn (raytrac3r)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
importance: High → Critical
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

It looks like a package update around August 11 (rebase to CentOS 6.6?) caused this regression. It happens when trying to stop any given docker container, but it is not very consistent. You could stop 8 containers, then the 9th breaks... or it could be the 2nd or 3rd one. I'm not in favor of reverting all CentOS packages, but instead finding which ones are related and impact this.

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Reproduced during manual stopping containers one-by-one, on different containers (sometimes on nginx, sometimes on rsyslogd, sometimes on nailgun containers):

[root@nailgun ~]# docker -D stop fuel-core-7.0-nginx
+ docker -D stop fuel-core-7.0-nginx
fuel-core-7.0-nginx
[root@nailgun ~]# docker -D stop fuel-core-7.0-rabbitmq
+ docker -D stop fuel-core-7.0-rabbitmq
fuel-core-7.0-rabbitmq
[root@nailgun ~]# docker -D stop fuel-core-7.0-astute
+ docker -D stop fuel-core-7.0-astute

# here is a kernel panic

summary: - Kernel panic during 'dockerctl restore /var/backup/fuel/...lrz'
+ Kernel panic during 'docker -D stop <container_name>'
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Command Line : downgrade kernel lvm2 lvm2-libs device-mapper device-mapper-libs device-mapper-event device-mapper-event-libs
Transaction performed with:
    Installed rpm-4.8.0-38.el6_6.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3
    Installed yum-3.2.29-60.el6.centos.noarch @anaconda-CentOS-201410241409.x86_64/6.3
    Installed yum-plugin-fastestmirror-1.1.30-30.el6.noarch @anaconda-CentOS-201410241409.x86_64/6.3
Packages Altered:
    Downgrade device-mapper-1.02.90-2.el6.mira3.x86_64 @oldnailgun
    Downgraded 1.02.90-2.el6_6.3.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3
    Downgrade device-mapper-event-1.02.90-2.el6.mira3.x86_64 @oldnailgun
    Downgraded 1.02.90-2.el6_6.3.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3
    Downgrade device-mapper-event-libs-1.02.90-2.el6.mira3.x86_64 @oldnailgun
    Downgraded 1.02.90-2.el6_6.3.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3
    Downgrade device-mapper-libs-1.02.90-2.el6.mira3.x86_64 @oldnailgun
    Downgraded 1.02.90-2.el6_6.3.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3
    Downgrade lvm2-2.02.111-2.el6.mira3.x86_64 @oldnailgun
    Downgraded 2.02.111-2.el6_6.3.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3
    Downgrade lvm2-libs-2.02.111-2.el6.mira3.x86_64 @oldnailgun
    Downgraded 2.02.111-2.el6_6.3.x86_64 @anaconda-CentOS-201410241409.x86_64/6.3

If I downgrade, stop works just fine. Since kernel downgrade didn't help Dennis, it must be device-mapper or lvm2. I will continue researching.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Correction, kernel downgrade is sufficient. I was able to get docker to behave with the following packages:
kernel-2.6.32-504.1.3.el6.mos64.x86_64 (from August 9)
device-mapper-1.02.90-2.el6_6.3.x86_64 (current master)
lvm2-2.02.111-2.el6_6.3.x86_64 (current master)

Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → MOS Linux (mos-linux)
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Are we sure that we have not seen this before? According to reports, this was seen even on old kernels:
https://github.com/docker/docker/issues/9856#issuecomment-119490919

Changed in fuel:
assignee: MOS Linux (mos-linux) → Aleksander Mogylchenko (amogylchenko)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/214479

Revision history for this message
Pavel Boldin (pboldin) wrote :
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

I'm running tests against 2.6.32-573 (from Centos 6.7, is in centos 6.6/cr repo):
~]# uname -a
Linux kh-env-3-fuel-70-185.mol.local 2.6.32-573.1.1.el6.x86_64 #1 SMP Sat Jul 25 17:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

And was not able to reproduce this problem yet (took me about half an hour to reproduce it on 2.6.32-504.30.3)

Revision history for this message
Artem Silenkov (asilenkov) wrote :

I could confirm that I've managed to fix this by using 6.7 Centos release. It is 2.6.32-573 kernel.

Revision history for this message
Artem Silenkov (asilenkov) wrote :

I've pushed some experimental patch just to make sure we use appropriate one. Still not sure.
If tests are good we need to reapply fix-do_tcp_sendpages.patch to this.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to packages/centos6/kernel (7.0)

Related fix proposed to branch: 7.0
Change author: Artem Silenkov <email address hidden>
Review: https://review.fuel-infra.org/10605

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Artem Silenkov (asilenkov) wrote :

Patch was not successfull sadly. Still panic though different process tainted.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on packages/centos6/kernel (7.0)

Change abandoned by Artem Silenkov <email address hidden> on branch: 7.0
Review: https://review.fuel-infra.org/10605

Changed in fuel:
status: In Progress → Confirmed
Revision history for this message
Alexey Galkin (agalkin) wrote :

I too encountered with this bug on:

{"build_id": "2015-08-12_15-16-47", "build_number": "164", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-12_15-16-47", "build_number": "164", "api": "1.0", "fuel-library_sha": "22f848670e49d89fc04aaed4d8efd1b07360cbe7", "nailgun_sha": "fff6bda090fac15c48b27cca7832a70f8e381101", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "26fc025e0fc5791b62e5ed8561a6016bf8a406bc", "astute_sha": "e1d3a435e5df5b40cbfb1a3acf80b4176d15a2dc", "fuel-ostf_sha": "58220583f10fa47f12291488ef77854809c68310", "release": "7.0", "fuelmain_sha": "67e5214c0dc5d4ba6da4ae651cef9934800459a9"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "22f848670e49d89fc04aaed4d8efd1b07360cbe7", "nailgun_sha": "fff6bda090fac15c48b27cca7832a70f8e381101", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "26fc025e0fc5791b62e5ed8561a6016bf8a406bc", "astute_sha": "e1d3a435e5df5b40cbfb1a3acf80b4176d15a2dc", "fuel-ostf_sha": "58220583f10fa47f12291488ef77854809c68310", "release": "7.0", "fuelmain_sha": "67e5214c0dc5d4ba6da4ae651cef9934800459a9"}

I just used the fuel in normal mode without any backup or restore master node.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-main (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/215588

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/215588
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=36f7344d4cf71c07d31c14b7f0ea5575769973dd
Submitter: Jenkins
Branch: master

commit 36f7344d4cf71c07d31c14b7f0ea5575769973dd
Author: Artem Silenkov <email address hidden>
Date: Fri Aug 21 14:44:33 2015 +0300

    Downgrade kernel version on ISO

    New kernel 504-30.3 triggers kernel panic event when operating
    with cgroups. Old kernel is free from this regression
    so we have decided to downgrade.

    Kernel version in upstream is newer so we must nail it explicitly
    in order to use it instead upstream one.

    Change-Id: I1e713b53df6a0b9ab3ca147b9dd917cebf5d95eb
    Related-Bug: #1485954

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on packages/centos6/kernel (7.0)

Change abandoned by Aleksandr Mogylchenko <email address hidden> on branch: 7.0
Review: https://review.fuel-infra.org/10620

Revision history for this message
Roman Vyalov (r0mikiam) wrote :

In public mirror we have different version for kernel and kernel-headers. this task still in progress

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/centos6/kernel (7.0)

Fix proposed to branch: 7.0
Change author: Aleksandr Mogylchenko <email address hidden>
Review: https://review.fuel-infra.org/10736

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

On ISO #224, downgraded kernel 2.6.32-504.1.3.el6.mos64.x86_64 resolved the issue with restarting docker containers.

Also I checked the upstream kernel 2.6.32-573.3.1.el6.x86_64 from http://mirror.centos.org/centos/6/updates/x86_64/Packages/, the issue is not reproduced with that kernel.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/217051

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/214479
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=dfb6354b3c4b33e177843dd5bbc1fa9b81d0ab42
Submitter: Jenkins
Branch: master

commit dfb6354b3c4b33e177843dd5bbc1fa9b81d0ab42
Author: Dennis Dmitriev <email address hidden>
Date: Fri Aug 21 01:58:18 2015 +0300

    Refactor nova-network cases to neutron

    - use neutron in tests migrate_vm_backed_with_ceph and
      check_ceph_partitions_after_reboot
    - add primitives for requesting OVS database
    - add showing "steps" from docstring into test cases
    - refactor fuel master backup/restore helper methods

    Change-Id: I0c9727eb7f2f4067f615386790b12fb0220c548c
    Closes-Bug:#1484155
    Closes-Bug:#1483767
    Related-Bug:#1485954

Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

I'm closing this bug since the original issue is fixed.

For the record, there is a separate initiative to update to 2.6.32-504.16.2 kernel, which is not affected by this problem and at the same time provides the support of needed hardware:
https://review.openstack.org/#/c/217051/
https://review.fuel-infra.org/#/c/10736/

Custom ISO is being tested at the moment.

Changed in fuel:
status: In Progress → Fix Released
Changed in fuel:
status: Fix Released → Won't Fix
status: Won't Fix → Fix Committed
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/centos6/kernel (7.0)

Reviewed: https://review.fuel-infra.org/10736
Submitter: Artem Silenkov <email address hidden>
Branch: 7.0

Commit: 4333a7138495d377f7a68b5a932869ca0135ed82
Author: Aleksandr Mogylchenko <email address hidden>
Date: Tue Aug 25 15:46:10 2015

Downgrade kernel version to 2.6.32-504.16.2

Due to bugs in cgroups:
https://bugs.launchpad.net/fuel/+bug/1485954
https://github.com/docker/docker/issues/14181

it is possible to cause kernel panic just by restaring docker container.
Since it was decided against updating to 2.6.32-573, we need older kernel with
Dell R630 & Gen9 support.

After a possible update to 6.7 native kernel will orverride this one.

Closes-Bug: #1485954
Change-Id: I148d5067fd66dca8e017ce28f6bff3ae186c7982

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/217051
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=d359037cc4eb4115d291ae16a010b0ee545dfc71
Submitter: Jenkins
Branch: master

commit d359037cc4eb4115d291ae16a010b0ee545dfc71
Author: Aleksandr Mogylchenko <email address hidden>
Date: Wed Aug 26 12:28:50 2015 +0300

    CESA-2015:0864 kernel

    This kernel is not affected by docker/cgroups regression and has support
    of needed hardware, thus more preferrable for the release.

    Change-Id: If5019d6b3b99936e813b9ba8ca032540e9fd52e3
    Closes-Bug: #1485954

Revision history for this message
Bartłomiej Piotrowski (bpiotrowski) wrote :

Verified on 7.0-259.

root@nailgun ~]# dockerctl restore /var/backup/fuel/backup_2015-08-31_1203/fuel_backup_2015-08-31_1203.tar.lrz
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
Shut down
Stopping containers...
Stopping nginx...
fuel-core-7.0-nginx
Stopping rabbitmq...
fuel-core-7.0-rabbitmq
Stopping astute...
fuel-core-7.0-astute
Stopping rsync...
fuel-core-7.0-rsync
Stopping keystone...
fuel-core-7.0-keystone
Stopping postgres...
fuel-core-7.0-postgres
Stopping rsyslog...
fuel-core-7.0-rsyslog
Stopping nailgun...
fuel-core-7.0-nailgun
Stopping cobbler...
fuel-core-7.0-cobbler
Stopping ostf...
fuel-core-7.0-ostf
Stopping mcollective...
fuel-core-7.0-mcollective
Output filename is: /var/backup/fuel/restore-2015-08-31_1203//fuel_backup.tar
Decompressing...
100% 77.58 / 77.58 MB
Average DeCompression Speed: 77.000MB/s
Output filename is: /var/backup/fuel/restore-2015-08-31_1203//fuel_backup.tar: [OK] - 81346560 bytes
Total time: 00:00:00.56
Starting containers...
(...)

tags: added: on-verification
Revision history for this message
Bartłomiej Piotrowski (bpiotrowski) wrote :

Plus uname output:

[root@nailgun ~]# uname -a
Linux nailgun.test.domain.local 2.6.32-504.16.2.el6.x86_64 #1 SMP Thu Aug 27 13:36:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Andrew Woodward (xarses) wrote :

Customer found on 6.0. Backports required for 6.1 and 6.0

Roman Rufanov (rrufanov)
tags: added: customer-found support
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

This is not a correct workflow, since original problem was found in kernels after 2.6.32-504.24 only. MOS 6.0 and MOS 6.1 were on older kernels, thus targeting the same bug to older releases makes no sense. New investigation should be performed.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Andrew - we are unable to reproduce this issue with kernels included into MOS 6.0 and 6.1. Please provide more information on kernel versions which have this issue.

Revision history for this message
Dmitry Klenov (dklenov) wrote :

@Andrew Woodward, @Dennis Dmitriev

Folks, can you please provide more data about bug repro on 6.0 and 6.1?

Revision history for this message
Dmitry Klenov (dklenov) wrote :

Closing as invalid.

@Denis, please reopen if you have new info.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on packages/centos6/kernel (7.0)

Change abandoned by Aleksandr Mogylchenko <email address hidden> on branch: 7.0
Review: https://review.fuel-infra.org/9814

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.