tripleo-ci-centos-9-scenario001-standalone failed during step5 because gnocchi couldn't connect to redis

Bug #1978997 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

2022-06-16 17:47:41.106170 | fa163e58-38f7-4108-1b3e-000000004f72 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | standalone | error={"changed": false, "msg": "Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade"}

2022-06-16 17:42:12,816 [8] CRITICAL root: Traceback (most recent call last):

  File "/usr/lib/python3.9/site-packages/gnocchi/incoming/__init__.py", line 117, in NUM_SACKS
    self._num_sacks = int(self._get_storage_sacks())
  File "/usr/lib/python3.9/site-packages/gnocchi/incoming/redis.py", line 66, in _get_storage_sacks
    return self._client.hget(self.CFG_PREFIX, self.CFG_SACKS)
  File "/usr/lib/python3.9/site-packages/redis/client.py", line 3010, in hget
    return self.execute_command('HGET', name, key)
  File "/usr/lib/python3.9/site-packages/redis/client.py", line 898, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 1192, in get_connection
    connection.connect()
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 567, in connect
    self.on_connect()
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 643, in on_connect
    auth_response = self.read_response()
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 739, in read_response
    response = self._parser.read_response()
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 324, in read_response
    raw = self._buffer.readline()
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 256, in readline
    self._read_from_socket()
  File "/usr/lib/python3.9/site-packages/redis/connection.py", line 201, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.

This container is still passed credentials for Ceph though I'm not sure that's related. If it needs the credentails, nothing about how they are configured changes (our client code to do this still runs during overcloud deployemnt, but we moved server deployment earlier). If there was something wrong with the shared ceph.conf and shared ceph.client.openstack.keyring then glance, cinder, nova (which also use the same file) would have had a problem earlier.

If it's Ceph related I can help but I'd like to get input from someone who works on gnocchi.

[1]

https://411deec051d768e42c33-554a55c2926ce345d8a8f0805ecfe993.ssl.cf5.rackcdn.com/846159/4/check/tripleo-ci-centos-9-scenario001-standalone/a9a4889/logs/undercloud/home/zuul/standalone_deploy.log

tags: added: alert promotion-blocker
Revision history for this message
Takashi Kajinami (kajinamit) wrote (last edit ):
Download full text (3.8 KiB)

There seems to be an issue with redis ant it is never promoted.

https://411deec051d768e42c33-554a55c2926ce345d8a8f0805ecfe993.ssl.cf5.rackcdn.com/846159/4/check/tripleo-ci-centos-9-scenario001-standalone/a9a4889/logs/undercloud/var/log/extra/pcs.txt
~~~
  * Container bundle: redis-bundle [198.72.124.73:5001/tripleomastercentos9/openstack-redis:pcmklatest]:
    * redis-bundle-0 (ocf:heartbeat:redis): Unpromoted standalone
~~~

ceilometer-manage is trying to connect redis because redis is configured as its incoming storage but can't access because of no promoted master.

https://411deec051d768e42c33-554a55c2926ce345d8a8f0805ecfe993.ssl.cf5.rackcdn.com/846159/4/check/tripleo-ci-centos-9-scenario001-standalone/a9a4889/logs/undercloud/var/log/containers/haproxy/haproxy.log
~~~
Jun 16 17:24:28 standalone haproxy[7]: Server redis_be/standalone.ctlplane.localdomain is DOWN, reason: Layer4 connection problem, info: "Connection refused at step 1 of tcp-check (connect port 6379)", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 16 17:24:28 standalone haproxy[7]: backend redis_be has no server available!
...
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58548 [16/Jun/2022:17:37:45.108] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58550 [16/Jun/2022:17:37:45.111] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58552 [16/Jun/2022:17:37:45.113] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58556 [16/Jun/2022:17:37:45.114] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58558 [16/Jun/2022:17:37:45.117] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58560 [16/Jun/2022:17:37:45.120] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58562 [16/Jun/2022:17:37:45.121] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58564 [16/Jun/2022:17:37:45.122] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58566 [16/Jun/2022:17:37:45.123] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58568 [16/Jun/2022:17:37:45.124] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58570 [16/Jun/2022:17:37:45.124] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58572 [16/Jun/2022:17:37:45.125] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58574 [16/Jun/2022:17:37:45.129] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58576 [16/Jun/2022:17:37:45.130] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0/0 0/0
Jun 16 17:37:45 standalone haproxy[7]: 192.168.24.3:58578 [16/Jun/2022:17:37:45.135] redis redis_be/<NOSRV> -1/-1/0 0 SC 2/1/0/0...

Read more...

Revision history for this message
chandan kumar (chkumar246) wrote :

In passing job https://ac8676a2e4bb22cb45b3-6ec72dc5947dcd41834023b354511109.ssl.cf5.rackcdn.com/842370/6/gate/tripleo-ci-centos-9-scenario001-standalone/93b1333/logs/undercloud/var/log/extra/podman/containers/redis-bundle-podman-0/podman_info.log

```
pacemaker.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-cli.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-cluster-libs.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-libs.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-remote.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-schemas.noarch 2.1.2-4.el9 @quickstart-centos-highavailability
```

and in failed one
https://411deec051d768e42c33-554a55c2926ce345d8a8f0805ecfe993.ssl.cf5.rackcdn.com/846159/4/check/tripleo-ci-centos-9-scenario001-standalone/a9a4889/logs/undercloud/var/log/extra/podman/containers/redis-bundle-podman-0/podman_info.log

```
pacemaker.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-cli.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-cluster-libs.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-libs.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-remote.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-schemas.noarch 2.1.3-2.el9 @quickstart-centos-highavailability
```

Revision history for this message
Matthias Runge (mrunge) wrote :

We have seen redis issues with gnocchi in the past. The connection to redis is unreliable and gnocchi will retry then. However, it is an issue if redis does not select a primary.

Revision history for this message
Matthias Runge (mrunge) wrote :

I wonder if the redis log message is a red herring here. In my more recent devstack setups I found that gnocchi does not work (well) with more recent sqlalchemy versions.

Revision history for this message
Matthias Runge (mrunge) wrote :

E.g see here: https://ac8676a2e4bb22cb45b3-6ec72dc5947dcd41834023b354511109.ssl.cf5.rackcdn.com/842370/6/gate/tripleo-ci-centos-9-scenario001-standalone/93b1333/logs/undercloud/var/log/containers/gnocchi/app.log

2022-06-15 21:45:49,096 [17] WARNING py.warnings: /usr/lib/python3.9/site-packages/gnocchi/indexer/sqlalchemy.py:482: SAWarning: relationship 'ResourceHistory.metrics' will copy column resource_history.id to column metric.resource_id, which conflicts with relationship(s): 'Metric.resource' (copies resource.id to metric.resource_id), 'Resource.metrics' (copies resource.id to metric.resource_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="metrics,resource"' to the 'ResourceHistory.metrics' relationship. (Background on this error at: https://sqlalche.me/e/14/qzyx)
  resource_type = session.query(ResourceType).get(name)

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

I'm finding similar issue in RDO. In case it helps, i'm adding redis logs for job passing and failing:

Last passing:

https://logserver.rdoproject.org/63/43563/2/check/rdoinfo-tripleo-master-testing-centos-9-scenario001-standalone/5aaa84a/logs/undercloud/var/log/containers/redis/redis.log.txt.gz

Failing:

https://logserver.rdoproject.org/73/42273/8/check/rdoinfo-tripleo-master-testing-centos-9-scenario001-standalone/e0f6746/logs/undercloud/var/log/containers/redis/redis.log.txt.gz

I see some logs in the first one which are not in the failing ones:

115:M 13 Jun 2022 13:55:24.315 * Discarding previously cached master state.
115:M 13 Jun 2022 13:55:24.315 # Setting secondary replication ID to 213924cf53ab965342abadfa215441591e17c616, valid up to offset: 1. New replication ID is d57b4a27c96225a5dc1b3835d0fad08e271c70a7
115:M 13 Jun 2022 13:55:24.316 * MASTER MODE enabled (user request from 'id=13 addr=/var/run/redis/redis.sock:0 laddr=/var/run/redis/redis.sock:0 fd=8 name= age=0 idle=0 flags=U db=0 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=40920 argv-mem=12 obl=0 oll=0 omem=0 tot-mem=61476 events=r cmd=slaveof user=default redir=-1')

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

I'm looking into it.

In response to #2, looking at chandar's jobs logs for the passing and failing job (resp. [1] and [2]), the passing job has redis effectively promoted in pacemaker [1]:

Jun 15 21:29:29.080 standalone.localdomain pacemaker-attrd [63860] (attrd_peer_update) notice: Setting master-redis[standalone]: (unset) -> 1 | from standalone

Whereas the failing job never sets master-redis to 1 in pacemaker.

This setting is driven by the behaviour of the redis resource-agent, I don't think pacemaker at fault here.

I do see that in the passing job [1], redis-bundle is restarted once because of the way we set up the resource in pacemaker at creation time. It's not in the failing job [2]. But again, I don't see why this would interfere with the redis resource agent setting the master-redis flag to 1.

I am going to try and replicate locally on a standalone environment to see what could make the redis resource agent misbehave.

[1] https://ac8676a2e4bb22cb45b3-6ec72dc5947dcd41834023b354511109.ssl.cf5.rackcdn.com/842370/6/gate/tripleo-ci-centos-9-scenario001-standalone/93b1333/logs/undercloud/var/log/pacemaker/pacemaker.log
[2] https://411deec051d768e42c33-554a55c2926ce345d8a8f0805ecfe993.ssl.cf5.rackcdn.com/846159/4/check/tripleo-ci-centos-9-scenario001-standalone/a9a4889/logs/undercloud/var/log/pacemaker/pacemaker.log

Revision history for this message
Takashi Kajinami (kajinamit) wrote :
Download full text (8.7 KiB)

So initially pacemaker started redis-bundle-podman-0, which represents the podman container.
This completed without any problem.
~~~
Jun 16 17:26:47.502 standalone.localdomain pacemaker-schedulerd[61881] (log_list_item) notice: Actions: Start redis-bundle-podman-0 ( standalone )
Jun 16 17:26:47.503 standalone.localdomain pacemaker-controld [61882] (te_rsc_command) notice: Initiating start operation redis-bundle-podman-0_start_0 locally on standalone | action 66
...
Jun 16 17:26:49.231 standalone.localdomain pacemaker-execd [61879] (log_finished) info: redis-bundle-podman-0 start (call 48, PID 92341) exited with status 0 (execution time 1.728s)
~~~

later pacemaer started starting the nested resources but at this moment it tried to restart the root resource ( redis-bundle-podman-0 ) because of resource definition change.

~~~
Jun 16 17:26:55.128 standalone.localdomain pacemaker-schedulerd[61881] (log_list_item) notice: Actions: Restart redis-bundle-podman-0 ( standalone ) due to resource definition change
Jun 16 17:26:55.129 standalone.localdomain pacemaker-schedulerd[61881] (log_list_item) notice: Actions: Start redis-bundle-0 ( standalone )
Jun 16 17:26:55.129 standalone.localdomain pacemaker-schedulerd[61881] (log_list_item) notice: Actions: Start redis:0 ( redis-bundle-0 )
~~~

Then it succeeded to stop the container.
~~~
Jun 16 17:26:56.030 standalone.localdomain pacemaker-controld [61882] (log_executor_event) notice: Result of stop operation for redis-bundle-podman-0 on standalone: ok | CIB update 150, graph action confirmed; call=51 key=redis-bundle-podman-0_stop_0 rc=0
...
Jun 16 17:26:56.030 standalone.localdomain pacemaker-controld [61882] (log_executor_event) notice: Result of stop operation for redis-bundle-podman-0 on standalone: ok | CIB update 150, graph action confirmed; call=51 key=redis-bundle-podman-0_stop_0 rc=0
~~~

Then all resources were started but the redis resource was not promoted at that time.
~~~
Jun 16 17:26:56.033 standalone.localdomain pacemaker-controld [61882] (te_rsc_command) notice: Initiating start operation redis-bundle-podman-0_start_0 locally on standalone | action 11
...
Jun 16 17:26:57.719 standalone.localdomain pacemaker-controld [61882] (log_executor_event) notice: Result of start operation for redis-bundle-podman-0 on standalone: ok | CIB update 152, graph action confirmed; call=52 key=redis-bundle-podman-0_start_0 rc=0
...
Jun 16 17:26:57.728 standalone.localdomain pacemaker-controld [61882] (te_rsc_command) notice: Initiating start operation redis-bundle-0_start_0 locally on standalone | action 71
...
Jun 16 17:26:58.264 standalone.localdomain pacemaker-controld [61882] (log_executor_event) notice: Result of start operation for redis-bundle-0 on standalone: ok | CIB update 161, graph action confirmed; call=8 key=redis-bundle-0_start_0 rc=0
...
Jun 16 17:26:58.314 standalone.localdomain pacemaker-schedulerd[61881] (rsc_action_default) info: Leave redis-bundle-podman-0...

Read more...

Revision history for this message
chandan kumar (chkumar246) wrote :

846287: [DNM] Downgrade pacemaker and resource-agents | https://review.opendev.org/c/openstack/tripleo-common/+/846287

Ronelle Landy (rlandy)
Changed in tripleo:
importance: High → Critical
Revision history for this message
Takashi Kajinami (kajinamit) wrote (last edit ):

So the problem is reproduced if we downgrade only resource-agents.
 https://review.opendev.org/c/openstack/tripleo-common/+/846351
 https://zuul.opendev.org/t/openstack/build/72c725792946461f9a16760c6437fb8e

On the oher hand, it is not reproduced when we downgrade pacemaker/pacemaker_remote
 https://review.opendev.org/c/openstack/tripleo-common/+/846352
 https://zuul.opendev.org/t/openstack/build/eed22d317f0e4c67b358a5f308b036af

So this is likely the problem with pacemaker_remote (as pacemaker package is not used in the container)

Good
~~~
pacemaker.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-cli.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-cluster-libs.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-libs.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-remote.x86_64 2.1.2-4.el9 @quickstart-centos-highavailability
pacemaker-schemas.noarch 2.1.2-4.el9 @quickstart-centos-highavailability
~~~

Bad
~~~
pacemaker.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-cli.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-cluster-libs.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-libs.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-remote.x86_64 2.1.3-2.el9 @quickstart-centos-highavailability
pacemaker-schemas.noarch 2.1.3-2.el9 @quickstart-centos-highavailability
~~~
~~~

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/846287
Committed: https://opendev.org/openstack/tripleo-common/commit/bf87555420bfd17d8b90f70ea9616476741b1819
Submitter: "Zuul (22348)"
Branch: master

commit bf87555420bfd17d8b90f70ea9616476741b1819
Author: Chandan Kumar (raukadah) <email address hidden>
Date: Fri Jun 17 14:51:49 2022 +0530

    Downgrade pacemaker and resource-agents

    The patch downgrades:
    pacemaker pacemaker-remote resource-agents
    in container builds to avoid
    errors at deployment step 5 with latest versions.

    Related-Bug: #1978997
    Signed-off-by: Chandan Kumar (raukadah) <email address hidden>
    Change-Id: Ie5288864cd6f346a5bb5b481b4aa5dbd1abb9f47

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/846474

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/846557

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Ok the failure to promote the Redis resource seems to come from a change in the default output of crm_attribute in pacemaker.x86_64 2.1.3-2.el9.

Prior to that version, trying to fetch a non-existing attribute from the CIB would return an empty string. With the newer pacemaker, we get the "(null)" string instead. This breaks the logics implemented in the redis resource agent.

This is probably a regression. I've filed https://bugzilla.redhat.com/show_bug.cgi?id=2099331 to track that pacemaker issue separately.

Revision history for this message
John Fulton (jfulton-org) wrote :

Still seeing symptoms of this bug. E.g.

  https://review.opendev.org/c/openstack/puppet-tripleo/+/845854/

Conjecture: the following needs to go through the line

  https://review.opendev.org/c/openstack/tripleo-common/+/846287

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-common/+/847222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/847222
Committed: https://opendev.org/openstack/tripleo-common/commit/0b6ae01e797f3f34c83d14fa03151b33ce6894bb
Submitter: "Zuul (22348)"
Branch: master

commit 0b6ae01e797f3f34c83d14fa03151b33ce6894bb
Author: Ronelle Landy <email address hidden>
Date: Wed Jun 22 17:16:13 2022 -0400

    Downgrade pacemaker, resource-agents - exact ver

    https://review.opendev.org/c/openstack/tripleo-common/+/846287
    downgrades pacemaker and resource-agents but does
    not specify the version. So the problem resurfaced when
    these rpms upgraded yet again.

    This patch specifies the downgrade version.

    Change-Id: If221c0c4cfe4b7a08568916400f4b50a72ab9e21
    Related-Bug: #1978997

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "John Fulton <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/846474
Reason: fixed by https://review.opendev.org/c/openstack/tripleo-common/+/847222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/wallaby)

Change abandoned by "John Fulton <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/846557
Reason: fixed by https://review.opendev.org/c/openstack/tripleo-common/+/847222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-common/+/847437

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/847437
Committed: https://opendev.org/openstack/tripleo-common/commit/e8c7d086a48429c4eade1518955fb765a74d464f
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit e8c7d086a48429c4eade1518955fb765a74d464f
Author: Chandan Kumar (raukadah) <email address hidden>
Date: Fri Jun 17 14:51:49 2022 +0530

    Downgrade pacemaker and resource-agents

    The patch downgrades:
    pacemaker pacemaker-remote resource-agents
    in container builds to avoid
    errors at deployment step 5 with latest versions.

    This patch also specifies the downgrade version.

    Related-Bug: #1978997
    Signed-off-by: Chandan Kumar (raukadah) <email address hidden>
    Change-Id: Ie5288864cd6f346a5bb5b481b4aa5dbd1abb9f47

tags: added: in-stable-wallaby
Ronelle Landy (rlandy)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-common/+/850676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/850676
Committed: https://opendev.org/openstack/tripleo-common/commit/d407c857a6e96683e9834136349d880d92f2f94d
Submitter: "Zuul (22348)"
Branch: master

commit d407c857a6e96683e9834136349d880d92f2f94d
Author: Takashi Kajinami <email address hidden>
Date: Fri Jul 22 02:11:46 2022 +0900

    Stop downgrading pacemaker

    This reverts the following two changes in a single commit.

    commit bf87555420bfd17d8b90f70ea9616476741b1819
    Downgrade pacemaker and resource-agents

    commit 0b6ae01e797f3f34c83d14fa03151b33ce6894bb
    Downgrade pacemaker, resource-agents - exact ver

    The bug[1] in pacemaker was already fixed and the new pacemaker package
    ( 2.1.4-2 ) with the fix was already released in CentOS Stream 9 repo.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=2099331

    Related-Bug: #1978997
    Change-Id: I480ff3878c2ed3ae8222fe3b8a47af790673bcc8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-common/+/865402

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Reopening this bug till wallaby cherry-pick merges.

Wallaby jobs are failing as the versions which we are trying to downgrade to are no longer available in the repos.

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0a6/865397/1/check/tripleo-ci-centos-9-content-provider/0a6c693/logs/container-builds/5d4ead5b-f90f-45f5-b774-4366676dd58e/base/redis/redis-build.log

~~~
No package pacemaker-2.1.2-4.el9 available.
No package pacemaker-remote-1.2-4.el9 available.
No package resource-agents-4.10.0-17.el9 available.
Error: No packages marked for downgrade.
~~~

Changed in tripleo:
status: Fix Released → In Progress
milestone: zed-1 → antelope-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/865402
Committed: https://opendev.org/openstack/tripleo-common/commit/f7cd739c38611ae914a7bf3e48ce9f4368c89caf
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit f7cd739c38611ae914a7bf3e48ce9f4368c89caf
Author: Takashi Kajinami <email address hidden>
Date: Fri Jul 22 02:11:46 2022 +0900

    Stop downgrading pacemaker

    This reverts the following two changes in a single commit.

    commit bf87555420bfd17d8b90f70ea9616476741b1819
    Downgrade pacemaker and resource-agents

    commit 0b6ae01e797f3f34c83d14fa03151b33ce6894bb
    Downgrade pacemaker, resource-agents - exact ver

    The bug[1] in pacemaker was already fixed and the new pacemaker package
    ( 2.1.4-2 ) with the fix was already released in CentOS Stream 9 repo.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=2099331

    Related-Bug: #1978997
    Change-Id: I480ff3878c2ed3ae8222fe3b8a47af790673bcc8
    (cherry picked from commit d407c857a6e96683e9834136349d880d92f2f94d)

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.