[dockerctl] Race condition in check_ready function for container 'nailgun'

Bug #1545825 reported by Oleg S. Gelbukh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Oleg S. Gelbukh
8.0.x
Fix Committed
Medium
Oleg S. Gelbukh
Mitaka
Won't Fix
Medium
Registry Administrators

Bug Description

When container 'nailgun' started with dockerctl, it incorrectly detects that the container is ready, i.e. Puppet successfully applied inside the container.

Steps to reproduce:
1. Install Fuel Admin node using ISO 7.0
2. Install fuel-octane from the source code:

    yum install -y git python-pip python-paramiko
    git clone https://github.com/openstack/fuel-octane
    cd fuel-octane && git checkout stable/7.0 && pip install --no-deps -e .

3. Create backup of Fuel Admin node configuration

    octane fuel-backup --to /tmp/backup.tar.gz

4. Copy backup.tar.gz file to external location.
5. Install Fuel Admin node using ISO 8.0
6. Install fuel-octane from the source code:

    yum install -y git python-pip python-paramiko
    git clone https://github.com/openstack/fuel-octane
    cd fuel-octane && git checkout stable/8.0 && pip install --no-deps -e .

7. Copy backup file to 8.0 Admin node's /tmp dir
8. Restore backup:

    octane fuel-restore -v --debug --from /tmp/backup.tar.gz -p <admin_pass>

Expected result:

Restore is successfully finished

Actual result:

Restore failed at openstack fixture upload phase due to ProgrammaticError from SQL server:

  2016-02-15 12:37:16 ERROR octane.util.subprocess fuel[22974] stderr: 500 Server Error: Internal Server Error ((psycopg2.ProgrammingError) column releases.components_metadata does not exist
  2016-02-15 12:37:16 ERROR octane.util.subprocess fuel[22974] stderr: LINE 1: ..._metadata AS releases_vmware_attributes_metadata, releases.c...
  2016-02-15 12:37:16 ERROR octane.util.subprocess fuel[22974] stderr: ^
  2016-02-15 12:37:16 ERROR octane.util.subprocess fuel[22974] stderr: [SQL: 'SELECT releases.id AS releases_id, releases.name AS releases_name, releases.version AS releases_version, releases.can_update_from_versions AS releases_can_update_from_versions, releases.description AS releases_description, releases.operating_system AS releases_operating_system, releases.state AS releases_state, releases.networks_metadata AS releases_networks_metadata, releases.attributes_me$adata AS releases_attributes_metadata, releases.volumes_metadata AS releases_volumes_metadata, releases.modes_metadata AS releases_modes_metadata, releases.r$les_metadata AS releases_roles_metadata, releases.network_roles_metadata AS releases_network_roles_metadata, releases.wizard_metadata AS releases_wizard_meta$ata, releases.deployment_tasks AS releases_deployment_tasks, releases.vmware_attributes_metadata AS releases_vmware_attributes_metadata, releases.components_$etadata AS releases_components_metadata, releases.modes AS releases_modes, releases.extensions AS releases_extensions \nFROM releases'])```

Root cause:

Command 'dockerctl start nailgun' starts the 'nailgun' container and immediately reports it's readiness. However, at the moment of check, puppet is not yet started in the container, not finished it's work.

Thus, the database sync is still not done when the request is sent, and SQL server replies with ProgrammaticError, since the column in question is not yet created in the DB schema.

Regression was introduced with the following change that fixed bug in systemd support in dockerctl: https://github.com/openstack/fuel-library/commit/c3fc592bbf235985e17eca20b92ae0c1185aa7c8#diff-1ce68b99aae786dd6a4fb77c5c52dd67R139

Workaround:

Insert delay in dockerctl when docker-nailgun.service unit checked.

Solution:

Add check for the nailgun API response in dockerctl functions.

Changed in fuel:
milestone: none → 9.0
status: New → Confirmed
importance: Undecided → Medium
milestone: 9.0 → 8.0-updates
assignee: nobody → Fuel Octane Dev Team (fuel-octane)
Changed in fuel:
milestone: 8.0-updates → 9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/280608

Changed in fuel:
assignee: Fuel Octane Dev Team (fuel-octane) → Oleg S. Gelbukh (gelbuhos)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-octane (master)

Reviewed: https://review.openstack.org/280608
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=0144ae69625f189e9e693554c1b7c8694eb19ae4
Submitter: Jenkins
Branch: master

commit 0144ae69625f189e9e693554c1b7c8694eb19ae4
Author: Oleg Gelbukh <email address hidden>
Date: Tue Feb 16 11:02:20 2016 +0000

    Start container before systemd unit for nailgun and keystone

    Dockerctl improperly detects readyness of docker container. Thus,
    we need to start the container before corresponding systemd unit
    (service).

    Run container first ensures that the systemd unit properly finishes
    its work.

    Add wait function to check if container is ready as a fix for race
    condition in dockerctl's check_ready function (bug #1545825).

    Change-Id: Id34986f42dd9944969475d5e0c1bec9814746b5f
    Closes-bug: 1545825

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-octane (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/281853

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-octane (stable/8.0)

Reviewed: https://review.openstack.org/281853
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=a4f274f88620dfcafd2d4cd2f75acf8bff3f6e9a
Submitter: Jenkins
Branch: stable/8.0

commit a4f274f88620dfcafd2d4cd2f75acf8bff3f6e9a
Author: Oleg Gelbukh <email address hidden>
Date: Tue Feb 16 11:02:20 2016 +0000

    Start container before systemd unit for nailgun and keystone

    Dockerctl improperly detects readyness of docker container. Thus,
    we need to start the container before corresponding systemd unit
    (service).

    Run container first ensures that the systemd unit properly finishes
    its work.

    Add wait function to check if container is ready as a fix for race
    condition in dockerctl's check_ready function (bug #1545825).

    Change-Id: Id34986f42dd9944969475d5e0c1bec9814746b5f
    Closes-bug: 1545825
    (cherry picked from commit 0144ae69625f189e9e693554c1b7c8694eb19ae4)

tags: added: non-release team-upgrades
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.