HA: galera and ovn-dbs are needlessly restarted at each stack update

Bug #1906505 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Confirmed
Medium
Damien Ciabrini

Bug Description

With [1] we introduced coordinated restart of HA resources across the
pacemaker cluster nodes, for resource like galera and ovn-dbs that
don't support reloading their certificate when a new one is being
issued.

However we're seeing that on every stack update - even noop ones -
when the tripleo certmonger puppet module is called to assert to
state of the certificates, it ends up regenerating new certificate
unconditionally, even if the old ones aren't expired.

Dec 2 10:10:10 database-0 puppet-user[117078]: Debug: Prefetching certmonger_certificate resources for certmonger_certificate
Dec 2 10:10:10 database-0 certmonger[29459]: 2020-12-02 10:10:10 [117460] Setting "CERTMONGER_REQ_SUBJECT" to "CN=database-0.internalapi.redhat.local" for child.
Dec 2 10:10:10 database-0 certmonger[29459]: 2020-12-02 10:10:10 [117460] Setting "CERTMONGER_REQ_HOSTNAME" to "overcloud.internalapi.redhat.local
Dec 2 10:10:10 database-0 certmonger[29459]: database-0.internalapi.redhat.local" for child.
Dec 2 10:10:10 database-0 certmonger[29459]: 2020-12-02 10:10:10 [117460] Setting "CERTMONGER_REQ_PRINCIPAL" to "<email address hidden>" for child.
Dec 2 10:10:10 database-0 certmonger[29459]: 2020-12-02 10:10:10 [117460] Setting "CERTMONGER_OPERATION" to "SUBMIT" for child.

This in turns restarts galera and ovn on every stack update, even
when that is not needed.

[1] Ib2b62e33b34cf72edfdae6299cf432259bf960a2

Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by "Damien Ciabrini <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/771227
Reason: certs are now managed in ansible, so let's drop that

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/train)

Change abandoned by "Damien Ciabrini <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/771224

Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
Revision history for this message
Grzegorz Grasza (xek) wrote :

This was fixed in wallaby, when linux-system-roles were introduced

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/822244

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/822313

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/c/openstack/puppet-tripleo/+/822244
Committed: https://opendev.org/openstack/puppet-tripleo/commit/93a93c166254070f54876aaeec81479a332651ee
Submitter: "Zuul (22348)"
Branch: stable/train

commit 93a93c166254070f54876aaeec81479a332651ee
Author: Damien Ciabrini <email address hidden>
Date: Thu Jan 14 13:44:05 2021 +0100

    [train-only] certmonger: track change in service's private keys

    Certmonger_certificate resources are marked as changed
    by puppet one some of its properties changed, like e.g.
    the filename of the service's private key.

    However puppet-certmonger has no property to track that
    the content of the private key has changed, so puppet
    cannot trigger a certificate renewal when a user
    explicitely regenerated this service's private key.

    Add a file resource for tracking any content change in
    the service's private key file, and make sure that puppet
    notify puppet-certmonger to trigger a certificate renewal.

    Note: some services that already use a file resource for
    their private key are updated, because they configure a
    wrong/backward dependency that causes a dependency loop
    when we add a notify dependency to the file resource.

    Related-Bug: #1906505

    Change-Id: I48658413f69a68dcad0a2f24ea66fe027e987f26

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/822313
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/34fd20704a0dd366d9f72bed5df29b20052b266c
Submitter: "Zuul (22348)"
Branch: stable/train

commit 34fd20704a0dd366d9f72bed5df29b20052b266c
Author: Damien Ciabrini <email address hidden>
Date: Thu Jan 14 12:39:43 2021 +0100

    [train-only] Make principal realms configuratble in certs

    The certificates specs for certmonger are configured in hiera
    as 'service/host_fqdn'. Certmonger automatically happen
    a default realm to it to look like 'service/host_fqdn/REALM'.

    This discrepancy makes puppet think certificate resources
    differ each time puppet apply is run, so puppet-certmonger
    resubmit the certificates and this causes unecessary service
    restart, which can be costly (e.g. mariadb).

    All the principal to be configured with a user-defined realm,
    and use uppercased cloud-domain by default (i.e. what
    certmongers automatically happens by default).

    Change-Id: I0a217b4a457881367de27414faca347e50f2db72
    Related-Bug: #1906505
    Depends-On: https://review.opendev.org/c/openstack/puppet-tripleo/+/822244
    Co-Authored-By: Damien Ciabrini <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.