tripleo

Podman error when stopping heat_engine container with systemd

Bug #1821241 reported by Emilien Macchi on 2019-03-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	High	Emilien Macchi	tripleo stein-rc1

Bug Description

Originally reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1691408

Problem: Redeploying an Undercloud fails when paunch tries to stop containers.

How to reproduce:
1) Deploy an Undercloud on RHEL8 with Stein.
2) Run: $ sudo servicectl stop tripleo_heat_engine
3) Look at the logs: $ sudo journalctl -u tripleo_heat_engine

You'll see the error:

Mar 21 18:08:03 undercloud.localdomain systemd[1]: Stopping heat_engine container...
Mar 21 18:08:38 undercloud.localdomain systemd[1]: heat_engine container is not active.
Mar 21 18:09:01 undercloud.localdomain podman[253956]: d6c494d9657c73e5fa9ac946136bf085ad84be8d17db26725ca54a8e8cec759f
Mar 21 18:09:01 undercloud.localdomain podman[26123]: time="2019-03-21T18:09:01Z" level=error msg="Error forwarding signal 15 to container d6c494d9657c73e5fa9ac946136bf085ad8
4be8d17db26725ca54a8e8cec759f: can only kill running containers: container state improper"

It always happens with the same container (heat-engine).

Tags:

Emilien Macchi (emilienm) on 2019-03-21

Changed in tripleo:
milestone:	none → stein-rc1
importance:	Undecided → High
status:	New → Triaged
assignee:	nobody → Emilien Macchi (emilienm)

Revision history for this message

Emilien Macchi (emilienm) wrote on 2019-03-22:

I created a bug in libpod: https://github.com/containers/libpod/issues/2740

Bogdan Dobrelya (bogdando) on 2019-03-22

tags:

added: idempotency

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-22: Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.openstack.org/645550

Changed in tripleo:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-22: Fix merged to paunch (master)

Reviewed: https://review.openstack.org/645550
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=110a9496b2c87a4d41d863aa918c2c4535b32e22
Submitter: Zuul
Branch: master

commit 110a9496b2c87a4d41d863aa918c2c4535b32e22
Author: Emilien Macchi <email address hidden>
Date: Fri Mar 22 08:14:48 2019 -0400

systemd: switch KillMode to 'none'

    When running the services with KillMode=process, there is a race
    condition between ExecStop and the command specified in ExecStart.
    The ExecStop seems faster and the container is killed then cleaned up.
    However podman started by ExecStart is still running and systemd kills
    it as soon as the ExecStop finished.

Since we rely on Podman to manage the containers & processes, let's
switch to KillMode=none.

Credits to Giuseppe Scrivano for explaining the root cause.

Change-Id: Icbf2b81477902e3d7ff9e064bf2408c2fc7e510e
Closes-Bug: #1821241