Fuel for OpenStack

Bug #1559136
Comment #21

Comment 21 for bug 1559136

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-08-23: Fix merged to fuel-library (stable/mitaka)

#21

Reviewed: https://review.openstack.org/324647
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=67e9b3d74f22a433da8def35a7c8bfb40f78ae89
Submitter: Jenkins
Branch: stable/mitaka

commit 67e9b3d74f22a433da8def35a7c8bfb40f78ae89
Author: Dmitry Mescheryakov <email address hidden>
Date: Wed May 25 10:48:50 2016 +0300

Enhance split-brain detection logic

    Previous split brain logic worked as follows: each slave checked
    that it is connected to master. If check fails, slave restarts. The
    ultimate flaw in that logic is that there is little guarantee that
    master is alive at the moment. Moreover, if master dies, it is very
    probable that during the next monitor check slaves will detect its
    death and restart, causing complete RabbitMQ cluster downtime.

    With the new approach master node checks that slaves are connected to
    it and orders them to restart if they are not. The check is performed
    after master node health check, meaning that at least that node
    survives. Also, orders expire in one minute and freshly started node
    ignores orders to restart for three minutes to give cluster time to
    stabilize.

    Also corrected the problem, when node starts and is already clustered.
    In that case OCF script forgot to start the RabbitMQ app, causing
    subsequent restart. Now we ensure that RabbitMQ app is running.

    The two introduced attributes rabbit-start-phase-1-time and
    rabbit-ordered-to-restart are made private. In order to allow master
    to set node's order to restart, both ocf_update_private_attr and
    ocf_get_private_attr signatures are expanded to allow passing
    node name.

    Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute,
    attrd_updater returns empty string instead of "(null)", when an
    attribute is not defined on needed node, but is defined on some other
    node. Correspondingly changed code to expect empty string, not a
    "(null)".

Closes-Bug: #1561894
Closes-Bug: #1559136

Change-Id: Ib72794361dac54817975163593ea7e07f7e8b4e1

Reviewed:  https://review.openstack.org/324647
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=67e9b3d74f22a433da8def35a7c8bfb40f78ae89
Submitter: Jenkins
Branch:    stable/mitaka

commit 67e9b3d74f22a433da8def35a7c8bfb40f78ae89
Author: Dmitry Mescheryakov <dmescheryakov@mirantis.com>
Date:   Wed May 25 10:48:50 2016 +0300

Enhance split-brain detection logic
    
    Previous split brain logic worked as follows: each slave checked
    that it is connected to master. If check fails, slave restarts. The
    ultimate flaw in that logic is that there is little guarantee that
    master is alive at the moment. Moreover, if master dies, it is very
    probable that during the next monitor check slaves will detect its
    death and restart, causing complete RabbitMQ cluster downtime.
    
    With the new approach master node checks that slaves are connected to
    it and orders them to restart if they are not. The check is performed
    after master node health check, meaning that at least that node
    survives. Also, orders expire in one minute and freshly started node
    ignores orders to restart for three minutes to give cluster time to
    stabilize.
    
    Also corrected the problem, when node starts and is already clustered.
    In that case OCF script forgot to start the RabbitMQ app, causing
    subsequent restart. Now we ensure that RabbitMQ app is running.
    
    The two introduced attributes rabbit-start-phase-1-time and
    rabbit-ordered-to-restart are made private. In order to allow master
    to set node's order to restart, both ocf_update_private_attr and
    ocf_get_private_attr signatures are expanded to allow passing
    node name.
    
    Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute,
    attrd_updater returns empty string instead of "(null)", when an
    attribute is not defined on needed node, but is defined on some other
    node. Correspondingly changed code to expect empty string, not a
    "(null)".
    
    Closes-Bug: #1561894
    Closes-Bug: #1559136
    
    Change-Id: Ib72794361dac54817975163593ea7e07f7e8b4e1