HA CI jobs do not collect cluster state's log files

Bug #1695237 reported by Damien Ciabrini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Low
Matt Young

Bug Description

Gate jobs that exercise HA deployments (e.g. gate-tripleo-ci-centos-7-ovb-ha) currently save pacemaker's internal log file /var/log/cluster/corosync.tx.gz. But other important text logs in /var/lib/pacemaker are not saved. For instance:

 * /var/lib/pacemaker/pengine/pe-input* log information about why cluster executed state transitions

 * /var/lib/pacemaker/cib/cib*.raw snapshot the state of the cluster after every state transition

We would need those files to be saved and available in job logs, otherwise it can be very difficult to diagnose why the cluster took a specific decision (e.g. start/stop resource)

Changed in tripleo:
status: New → Triaged
tags: added: low-hanging-fruit
Changed in tripleo:
importance: Undecided → Low
milestone: none → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Revision history for this message
Matt Young (halcyondude) wrote :

We are collecting /var/lib/pacemaker/cib/cib* presently [1], but do not seem to be collecting /var/lib/pacemaker/pengine/pe-input*

[1] https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/collect-logs/defaults/main.yml#L6

Changed in tripleo:
assignee: nobody → Matt Young (halcyondude)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Matt Young (halcyondude) wrote :

Are there other logs we should add as well while we're updating the logs for this issue?

Revision history for this message
Matt Young (halcyondude) wrote :

https://review.openstack.org/#/c/527554/ adds the location mentioned above...is this sufficient for this issue? I've tagged it as a partial fix.

Changed in tripleo:
milestone: queens-3 → queens-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/527554
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=13c7815bfb96ada93561c6f5bf2cd86490190ccf
Submitter: Zuul
Branch: master

commit 13c7815bfb96ada93561c6f5bf2cd86490190ccf
Author: Matt Young <email address hidden>
Date: Tue Dec 12 20:31:31 2017 -0500

    Add /var/lib/pacemaker/pengine/pe-input* to saved logs

    This commit adds the pengine/pe-input* logs to assist with
    debugging CI jobs

    Change-Id: I6fa7b951ff9c41fbb83a9bc8e5767524bfa364f8
    Partial-Bug: #1695237

Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.