Pacemaker Galera-MySQL restart race in OCF script

Bug #1281625 reported by Vladimir Kuklin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin

Bug Description

there is race condition in our galera OCF mysql-wss script which can fail the deployment as it does not correctly check mysql process status.

Here:

mysql_status() {
    if [ ! -e $OCF_RESKEY_pid ]; then
        ocf_log $1 "MySQL is not running"
        return $OCF_NOT_RUNNING;
    fi

Sometimes mysqld_safe process does not create pid at the moment OCF script checks for it's existence. It turns pacemaker crazy as it goes into infinite mysql start/stop loop and fails the deployment.

Tags: ha
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Kuklin (vkuklin)
status: Triaged → In Progress
Revision history for this message
Andrew Woodward (xarses) wrote :

I've seen this occur on a running deployment after doing some maintenance, it causes two mysqld_safe's to spawn if the first one doesn't start a pidfile (ubuntu)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/74431

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/74431
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=696219351246050311a82903b5eba898db672288
Submitter: Jenkins
Branch: master

commit 696219351246050311a82903b5eba898db672288
Author: Vladimir Kuklin <email address hidden>
Date: Tue Feb 18 20:01:48 2014 +0400

    Add sleep/retry cycle for galera OCF script

    Change-Id: I1a0e3758a785acf5013eb65e27b653e8063f8044
    Closes-bug: #1281625

Changed in fuel:
status: In Progress → Fix Committed
Andrew Woodward (xarses)
tags: added: ha
Andrew Woodward (xarses)
summary: - race condition in galera OCF script
+ Pacemaker Galera-MySQL restart race in OCF script
Revision history for this message
Chen Xin Jiang (jcxxin) wrote :

My code already contains the fix, but I'm still experiencing this problem. Here is the output for 'crm status',
Failed actions:
    p_mysql_monitor_0 (node=node-2.domain.tld, call=19, rc=1, status=Timed Out, last-rc-change=Thu Jun 5 19:04:02 2014
, queued=55000ms, exec=0ms
): unknown error
    openstack-heat-engine_monitor_20000 (node=node-3.domain.tld, call=115, rc=7, status=complete, last-rc-change=Thu Jun 5 19:05:18 2014
, queued=0ms, exec=0ms
): not running

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.