Repo sync not complete before attempting to install packages

Bug #1543146 reported by Hugh Saunders
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openstack-ansible
Fix Released
Medium
Hugh Saunders
Kilo
Fix Committed
Medium
Darren Birkett
Liberty
Fix Committed
Medium
Darren Birkett

Bug Description

When there are multiple repo containers, lsync is configured to maintain synchronisation between them. However there is no check to ensure that the initial synchronisation has completed.

There is then a race between repo sync and the first python package that needs to be installed.

Example Failure:

===========================
**06:38:50** TASK: [galera_client | Install pip packages] **********************************
**06:38:50** failed: [jrpcaioiad-bfb_galera_container-28d13a84] => (item=MySQL-python) => {"attempts": 5, "cmd": "/usr/local/bin/pip install MySQL-python", "failed": true, "item": "MySQL-python"}
**06:38:50** msg: Task failed as maximum retries was encountered
**06:38:50** failed: [jrpcaioiad-bfb_galera_container-28d13a84] => (item=python-memcached) => {"attempts": 5, "cmd": "/usr/local/bin/pip install python-memcached", "failed": true, "item": "python-memcached"}
**06:38:50** msg: Task failed as maximum retries was encountered
**06:38:50** failed: [jrpcaioiad-bfb_galera_container-28d13a84] => (item=pycrypto) => {"attempts": 5, "cmd": "/usr/local/bin/pip install pycrypto", "failed": true, "item": "pycrypto"}
**06:38:50** msg: Task failed as maximum retries was encountered
===========================

summary: - Repo Sync not complete before attempting to install packages
+ Repo sync not complete before attempting to install packages
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/277430

Changed in openstack-ansible:
assignee: nobody → Hugh Saunders (hughsaunders)
status: New → Confirmed
importance: Undecided → Medium
milestone: none → mitaka-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-repo_server (master)

Fix proposed to branch: master
Review: https://review.openstack.org/279452

Changed in openstack-ansible:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible (kilo)

Change abandoned by Hugh Saunders (<email address hidden>) on branch: kilo
Review: https://review.openstack.org/277430
Reason: Abandon in favour of: https://review.openstack.org/#/c/279452

Changed in openstack-ansible:
milestone: mitaka-3 → 13.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-repo_server (master)

Reviewed: https://review.openstack.org/279452
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-repo_server/commit/?id=b457f3bda65a418884e1847759e6bfa26a7ed5ab
Submitter: Jenkins
Branch: master

commit b457f3bda65a418884e1847759e6bfa26a7ed5ab
Author: Hugh Saunders <email address hidden>
Date: Fri Feb 12 10:04:59 2016 +0000

    Disable slave repo servers while syncing

    Currently there is a race between the repo servers syncing and the first
    role that attempts to install a pip package. This change ensures that
    only the primary repo server is accessible until the slaves are synced.

    This is achieved by adding a hook into lsyncd that allows a command to
    be run before and after each sync. This command is an ssh command to
    connect to the relevant secondary container and stop/start nginx. As the
    nginx user is unprivileged, a sudoers file is added to allow nginx to be
    stopped and started.

    Notes on adding the hook into lsyncd:
     * There is an existing script in lsyncd/examples for postcmd. This
       works at a higher level by adding an event onto the stack for executing a
       command once the sync has finished. I experimented with that but
       events dont get fired for the initial recursive sync, only on
       subsequent changes. As it is the initial sync that causes the problem
       that this patch is addressing, I had to look at a lower level.

     * The lsync lua C lib has an exec function, but it is hidden from
       config scripts except through the spawn(...) function. However spawn
       requires an event so can't be used for the initial sync.

     * I ended up going outside the lsync framework and using lua's own
       os.execute() function for pre/post cmds.

    While this looks like a big patch, its actually a relatively small
    change to the default rsync script. See
    https://github.com/hughsaunders/lsyncd/compare/master...hughsaunders:rsync_prepost
    for a comparison.

    Bug: #1543146
    Change-Id: I045a4a6bf722d6f1e01d21fbbec733872acb87a5

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/293691

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/293691
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=cc5234571959515663e231792a5c639e85a3132f
Submitter: Jenkins
Branch: master

commit cc5234571959515663e231792a5c639e85a3132f
Author: Hugh Saunders <email address hidden>
Date: Wed Mar 16 20:02:48 2016 +0000

    Release note for Repo Sync Patch

    Patch in Question: https://review.openstack.org/#/c/279452/8
    Bug: #1543146

    Change-Id: Ifb2d8cda57862f92fb8b772716e07e090323318a

no longer affects: openstack-ansible/mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/316032

Revision history for this message
Darren Birkett (darren-birkett) wrote :

Fix proposed to branch: liberty
Review: https://review.openstack.org/#/c/316033

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (liberty)

Reviewed: https://review.openstack.org/316033
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=a60ad57f8cd35e57fb160ba5ea8fd190f6810ef3
Submitter: Jenkins
Branch: liberty

commit a60ad57f8cd35e57fb160ba5ea8fd190f6810ef3
Author: Darren Birkett <email address hidden>
Date: Fri May 13 11:46:15 2016 +0100

    Disable slave repo servers while syncing

    Currently there is a race between the repo servers syncing and the first
    role that attempts to install a pip package. This change ensures that
    only the primary repo server is accessible until the slaves are synced.

    This is achieved by adding a hook into lsyncd that allows a command to
    be run before and after each sync. This command is an ssh command to
    connect to the relevant secondary container and stop/start nginx. As the
    nginx user is unprivileged, a sudoers file is added to allow nginx to be
    stopped and started.

    Notes on adding the hook into lsyncd:
     * There is an existing script in lsyncd/examples for postcmd. This
       works at a higher level by adding an event onto the stack for
       executing a command once the sync has finished. I experimented with
       that but events dont get fired for the initial recursive sync, only
       on subsequent changes. As it is the initial sync that causes the
       problem that this patch is addressing, I had to look at a lower level.

     * The lsync lua C lib has an exec function, but it is hidden from
       config scripts except through the spawn(...) function. However spawn
       requires an event so can't be used for the initial sync.

     * I ended up going outside the lsync framework and using lua's own
       os.execute() function for pre/post cmds.

    While this looks like a big patch, its actually a relatively small
    change to the default rsync script. See
    https://github.com/hughsaunders/lsyncd/compare/master...hughsaunders:rsync_prepost
    for a comparison.

    Bug: #1543146
    Co-Authored-By: Hugh Saunders <email address hidden>
    Based on commit: b457f3bda65a418884e1847759e6bfa26a7ed5ab
    Change-Id: I045a4a6bf722d6f1e01d21fbbec733872acb87a5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (kilo)

Reviewed: https://review.openstack.org/316032
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=ac7b2d68433f9560a5d401f846349c2e37751f0f
Submitter: Jenkins
Branch: kilo

commit ac7b2d68433f9560a5d401f846349c2e37751f0f
Author: Darren Birkett <email address hidden>
Date: Fri May 13 11:46:15 2016 +0100

    Disable slave repo servers while syncing

    ****
    While this looks like a big patch, its actually a relatively small
    change to the default rsync script. I understand that Kilo is pre EOL,
    but this is an isolated change and removes one more potential way
    that a juno-kilo upgrade could go wrong.
    ****

    Commit message from original master patch:

    Currently there is a race between the repo servers syncing and the first
    role that attempts to install a pip package. This change ensures that
    only the primary repo server is accessible until the slaves are synced.

    This is achieved by adding a hook into lsyncd that allows a command to
    be run before and after each sync. This command is an ssh command to
    connect to the relevant secondary container and stop/start nginx. As the
    nginx user is unprivileged, a sudoers file is added to allow nginx to be
    stopped and started.

    Notes on adding the hook into lsyncd:
     * There is an existing script in lsyncd/examples for postcmd. This
       works at a higher level by adding an event onto the stack for
       executing a command once the sync has finished. I experimented with
       that but events dont get fired for the initial recursive sync, only
       on subsequent changes. As it is the initial sync that causes the
       problem that this patch is addressing, I had to look at a lower level.

     * The lsync lua C lib has an exec function, but it is hidden from
       config scripts except through the spawn(...) function. However spawn
       requires an event so can't be used for the initial sync.

     * I ended up going outside the lsync framework and using lua's own
       os.execute() function for pre/post cmds.

    While this looks like a big patch, its actually a relatively small
    change to the default rsync script. See
    https://github.com/hughsaunders/lsyncd/compare/master...hughsaunders:rsync_prepost
    for a comparison.

    Bug: #1543146
    Change-Id: I045a4a6bf722d6f1e01d21fbbec733872acb87a5
    Co-Authored-By: Hugh Saunders <email address hidden>
    Based on commit: b457f3bda65a418884e1847759e6bfa26a7ed5ab

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.