[fuel-library] Pacemaker resource 'ceilometer-alarm-evaluator' fails to start

Bug #1549062 reported by Andrey Bubyr
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Alex Schultz

Bug Description

Puppet fails trying to start corresponding resource:
-> Simple primitive 'p_ceilometer-alarm-evaluator' global status:
   node-1.test.domain.local: (FAIL)
...
2016-02-24 01:02:46 +0000 /Stage[main]/Ceilometer::Alarm::Evaluator/Service[ceilometer-alarm-evaluator]/ensure (err): change from stopped to running failed: Execution timeout after 600 seconds!

Manual attempt:
# pcs resource debug-start p_ceilometer-alarm-evaluator
Operation start for p_ceilometer-alarm-evaluator (ocf:fuel:ceilometer-alarm-evaluator) returned 6
 > stderr: ocf-ceilometer-alarm-evaluator: INFO: validate_port(): Port provided is empty

Current version of OCF script 'ceilometer-alarm-evaluator' uses undefined variable
OCF_RESKEY_amqp_server_port

That regression was brought up by this commit:

commit df082efdb3c5c7474ed15070401e7b0b607a1285
Author: Sergii Golovatiuk <email address hidden>
Date: Sat Oct 10 12:57:17 2015 +0200

    Refactor OCF functions

    - Move port validation *_check_port from all OCF script to OCF library
      (validate_port)
       - ceilometer_agent_central_check_port()
       - ceilometer_alarm_evaluator_check_port()
       - heat_engine_check_port()
       - nova_compute_check_port()
       - write bats unit tests for validate_port
    - Remove 'ocf-' from neutron OCF scripts names to make them typical to all
      OpenStack services
      - Rename ocf-neutron-dhcp-agent to neutron-dhcp-agent
      - Rename ocf-neutron-l3-agent to neutron-l3-agent
      - Rename ocf-neutron-ovs-agent to neutron-ovs-agent
    - Some minor bash typos and old comments clean-up

    Change-Id: Ic4bcecdbc05f6306be63dea211df96a104cb2d36
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Tags: area-library
Revision history for this message
Dmitry Klenov (dklenov) wrote :

@Andrey, can you please attach diagnostic snapshot?

Changed in fuel:
milestone: none → 9.0
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
tags: added: area-library
Changed in fuel:
status: New → Incomplete
Revision history for this message
Andrey Bubyr (abubyr) wrote :

I have no environments for this time, but the problem is definitely in undefined variable OCF_RESKEY_amqp_server_port.

It is used by this piece of code:
https://github.com/openstack/fuel-library/blob/master/files/fuel-ha-utils/ocf/ceilometer-alarm-evaluator#L145

Workarounds

1) When I tried to define default value for this variable like this:
OCF_RESKEY_amqp_server_port_default="5673"
...
: ${OCF_RESKEY_amqp_server_port=${OCF_RESKEY_amqp_server_port_default}}

@@ -37,6 +37,7 @@
 OCF_RESKEY_config_default="/etc/ceilometer/ceilometer.conf"
 OCF_RESKEY_user_default="ceilometer"
 OCF_RESKEY_pid_default="${HA_RSCTMP}/${__SCRIPT_NAME}/${__SCRIPT_NAME}.pid"
+OCF_RESKEY_amqp_server_port_default="5673"

 : ${HA_LOGTAG="ocf-ceilometer-alarm-evaluator"}
 : ${HA_LOGFACILITY="daemon"}
@@ -44,6 +45,7 @@
 : ${OCF_RESKEY_config=${OCF_RESKEY_config_default}}
 : ${OCF_RESKEY_user=${OCF_RESKEY_user_default}}
 : ${OCF_RESKEY_pid=${OCF_RESKEY_pid_default}}
+: ${OCF_RESKEY_amqp_server_port=${OCF_RESKEY_amqp_server_port_default}}

Then

/usr/sbin/pcs resource clear p_ceilometer-alarm-evaluator node-1.test.domain.local'
/usr/sbin/pcs resource enable p_ceilometer-alarm-evaluator
/usr/sbin/pcs resource debug-start p_ceilometer-alarm-evaluator
(OR /usr/sbin/crm resource restart p_ceilometer-alarm-evaluator)

And resource becomes up and running:
p_ceilometer-alarm-evaluator (ocf::fuel:ceilometer-alarm-evaluator): Started node-1.test.domain.local

2) Another workaround - just to comment out block with validate_port() function which uses this undefined variable
OCF_RESKEY_amqp_server_port

@@ -142,9 +142,9 @@
     check_binary $OCF_RESKEY_binary
     check_binary netstat

- if ! validate_port $OCF_RESKEY_amqp_server_port; then
- return ${OCF_ERR_CONFIGURED}
- fi
+ #if ! validate_port $OCF_RESKEY_amqp_server_port; then
+ # return ${OCF_ERR_CONFIGURED}
+ #fi

     # A config file on shared storage that is not available
     # during probes is OK.

Dmitry Klenov (dklenov)
Changed in fuel:
status: Incomplete → Confirmed
Changed in fuel:
importance: High → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/284471

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Alex Schultz (alex-schultz)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/284471
Reason: should be fixed with the switch to aodh, https://review.openstack.org/#/c/279127

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/284471
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=0a4d7e6fe745a5a7eefdd0e18f62b05ad1a5750c
Submitter: Jenkins
Branch: master

commit 0a4d7e6fe745a5a7eefdd0e18f62b05ad1a5750c
Author: Alex Schultz <email address hidden>
Date: Wed Feb 24 16:46:57 2016 -0700

    Remove port check from alarm-evaluator

    This change removes the port check from the ocf script for
    ceilometer-alarm-evaulator. This port check for amqp was improperly
    added as part of Ic4bcecdbc05f6306be63dea211df96a104cb2d36.

    Change-Id: I9999401787f31ce6cd9f30021011fd198f1f24ad
    Closes-Bug: #1549062

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Dmitriy Kruglov (dkruglov) wrote :

Verified on MOS 9.0, ISO build 257.
The issue is fixed.

ISO details:
cat /etc/fuel_build_id:
 257
cat /etc/fuel_build_number:
 257
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6342.noarch
 fuel-misc-9.0.0-1.mos8329.noarch
 fuel-mirror-9.0.0-1.mos133.noarch
 shotgun-9.0.0-1.mos88.noarch
 fuel-openstack-metadata-9.0.0-1.mos8671.noarch
 fuel-notify-9.0.0-1.mos8329.noarch
 fuel-ostf-9.0.0-1.mos928.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8671.noarch
 python-fuelclient-9.0.0-1.mos313.noarch
 fuel-9.0.0-1.mos6342.noarch
 fuel-utils-9.0.0-1.mos8329.noarch
 fuel-nailgun-9.0.0-1.mos8671.noarch
 rubygem-astute-9.0.0-1.mos741.noarch
 fuel-library9.0-9.0.0-1.mos8329.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-agent-9.0.0-1.mos273.noarch
 fuel-ui-9.0.0-1.mos2676.noarch
 fuel-setup-9.0.0-1.mos6342.noarch
 nailgun-mcagents-9.0.0-1.mos741.noarch
 python-packetary-9.0.0-1.mos133.noarch
 fuelmenu-9.0.0-1.mos269.noarch
 fuel-bootstrap-cli-9.0.0-1.mos273.noarch
 fuel-migrate-9.0.0-1.mos8329.noarch

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.