Pacemaker tries to start mysql eternally

Bug #1505735 reported by Ilya Shakhat
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Sergii Golovatiuk
6.1.x
Won't Fix
High
MOS Maintenance
7.0.x
Fix Released
High
Denis Meltsaykin

Bug Description

Cluster has mysql resource in stopped state:
Clone Set: clone_p_mysql [p_mysql]
     Stopped: [ node-2.domain.tld node-4.domain.tld node-5.domain.tld ]

Attempts to start mysql by enabling resource lead to nothing: pacemaker starts resource and in drops attempts in 5 minutes. According to logs:
  info: log_execute: executing - rsc:p_mysql action:start call_id:1251
  warning: child_timeout_callback: p_mysql_start_0 process (PID 27884) timed out
  warning: operation_finished: p_mysql_start_0:27884 - timed out after 300000ms

Attempt to start the resource manually:
root@node-4:~# export OCF_ROOT=/usr/lib/ocf
root@node-4:~# export OCF_RESOURCE_INSTANCE="p_mysql:1"
root@node-4:~# /bin/bash /usr/lib/ocf/resource.d/fuel/mysql-wss start
ocf-mysql-wss: INFO: PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL server not found. Sleeping for 5 seconds. 0 retries left
ocf-mysql-wss: INFO: MySQL is not running
ocf-mysql-wss: INFO: PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL server not found. Sleeping for 5 seconds. 0 retries left
ocf-mysql-wss: ERROR: MySQL is not running
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Checking if Primary Component
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:-1
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:-1
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Possible masters: node-4.domain.tld node-5.domain.tld
ocf-mysql-wss: INFO: Choosed master: node-5.domain.tld
ocf-mysql-wss: INFO: Waiting for master. 290 seconds left
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:-1
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:-1
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: GTID OK: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Galera GTID: 776d73ad-61e5-11e5-bdf7-2b63b2f996e8:3299466
ocf-mysql-wss: INFO: Possible masters: node-4.domain.tld node-5.domain.tld
ocf-mysql-wss: INFO: Choosed master: node-5.domain.tld
ocf-mysql-wss: INFO: Waiting for master. 280 seconds left

Revision history for this message
Ilya Shakhat (shakhat) wrote :

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "301"
  build_id: "301"
  nailgun_sha: "4162b0c15adb425b37608c787944d1983f543aa8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd"
  fuel-ostf_sha: "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c"
  fuelmain_sha: "a65d453215edb0284a2e4761be7a156bb5627677"

Revision history for this message
Ilya Shakhat (shakhat) wrote :

The execution of "mysql-wss" loops inside check_if_galera_pc() function:
    while [ $timeout -gt 0 ]; do
        NODES=$(nodes_in_cluster_online)
        MASTERS=$(get_possible_masters "$NODES")
        MASTER=$(choose_master "$MASTERS")
        if [ "$MASTER" = "$HOSTNAME" ]; then
            ocf_log info "I\'m Primary Component. Join me!"
            return 1
        fi

        if check_if_reelection_needed; then
            ocf_log info "My neighbour is Primary Component"
            return 0
        fi

        sleep 10
        (( timeout -= 10 ))
        ocf_log info "Waiting for master. ${timeout} seconds left"
    done

It's expected that the loop terminates rather if the host is primary controller, or if it has primary neighbour. However the code is wrong. Condition " if check_if_reelection_needed; then" will fail always, instead it should be:
        check_if_reelection_needed
        cirn_rc=$?
        if [ $cirn_rc -eq 1 ]; then

Changed in fuel:
milestone: none → 8.0
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/234314

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Ilya Shakhat (shakhat)
status: New → In Progress
Changed in fuel:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/234314
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=0768a542e3a4b48485e351e50ae9e51164389219
Submitter: Jenkins
Branch: master

commit 0768a542e3a4b48485e351e50ae9e51164389219
Author: Ilya Shakhat <email address hidden>
Date: Tue Oct 13 19:06:02 2015 +0300

    Check reelection need correctly

    Fix bash error that breaks check of election need.

    Change-Id: I19cec58c3d78e76dc48de34719843737cb947196
    Closes-Bug: #1505735

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Ilya Shakhat (shakhat) wrote :

Re-opening the bug since the fix is not correct (per late comments in https://review.openstack.org/#/c/234314/1/files/fuel-ha-utils/ocf/mysql-wss)

Changed in fuel:
status: Fix Committed → Triaged
assignee: Ilya Shakhat (shakhat) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/234999

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/234999
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=14ec4750e763d82ab89f0b315f0d4f1341e893d6
Submitter: Jenkins
Branch: master

commit 14ec4750e763d82ab89f0b315f0d4f1341e893d6
Author: Sergii Golovatiuk <email address hidden>
Date: Fri Oct 16 16:43:32 2015 +0200

    Remove exit code verification from MySQL OCF

    Without set -o pipefail exit code of pipeline operations reports of last
    operation. This change removes exit code verification to make it POSIX
    compliant

    Issue was introduced by Ia7670a4f1b8540f6ac9ca8ff66dabd36e7f8738a

    Change-Id: Ief67bb96c7806011e246597cf1f46b88613981ea
    Closes-Bug: 1505735
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/241140

tags: added: regression-8.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/241140
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=1b2e0f23582622f58ca92f54966447958e4b40f1
Submitter: Jenkins
Branch: stable/7.0

commit 1b2e0f23582622f58ca92f54966447958e4b40f1
Author: Sergii Golovatiuk <email address hidden>
Date: Fri Oct 16 16:43:32 2015 +0200

    Remove exit code verification from MySQL OCF

    Without set -o pipefail exit code of pipeline operations reports of last
    operation. This change removes exit code verification to make it POSIX
    compliant

    Issue was introduced by Ia7670a4f1b8540f6ac9ca8ff66dabd36e7f8738a

    Change-Id: Ief67bb96c7806011e246597cf1f46b88613981ea
    Closes-Bug: 1505735
    Signed-off-by: Sergii Golovatiuk <email address hidden>
    (cherry picked from commit 14ec4750e763d82ab89f0b315f0d4f1341e893d6)

tags: added: on-verification
tags: removed: on-verification
tags: added: on-verification
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Sergii Golovatiuk: I can't reproduce this bug on current version of MOS 7.0. Could you please provide steps-to-reproduce?

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Fix is verified for MOS 7.0. Package: fuel-ha-utils_7.0.0-7257.1.gita808cd9_all.deb

tags: removed: on-verification
Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

Verifyed on MOS 8.0 ISO#496
Steps for verification:

1. Move mysql resource in stopped state:
  pcs resource disable clone_p_mysql
  pcs status | grep -A 2 mysql
 Clone Set: clone_p_mysql [p_mysql]
     Stopped: [ node-3.test.domain.local node-4.test.domain.local node-5.test.domain.local ]

2. Start the resouce by enabling it:
  pcs resource enable clone_p_mysql

3. Check the state of the resource
pcs status | grep -A 2 mysql
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-3.test.domain.local node-4.test.domain.local node-5.test.domain.local ]
 Master/Slave Set: master_p_conntrackd [p_conntrackd]

4. Run OSTF: http://paste.openstack.org/show/485555/

Changed in fuel:
status: Fix Committed → Fix Released
tags: added: 7.0-mu-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/316085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/316803

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/317979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/317979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/316803

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/316085

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.1-updates as this is too large change to be accepted to stable branch

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.