mistral leaks ssh processes

Bug #1821854 reported by Luca Miccini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Invalid
Undecided
Unassigned
tripleo
Incomplete
High
Unassigned

Bug Description

Freshly deployed py3 undercloud (+ overcloud). On the undercloud I see:

42430 65974 0.8 0.5 911104 115236 ? Ss Mar26 9:11 \_ /usr/bin/python3 /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/api.log --server=api
42430 66057 0.2 0.5 922472 110752 ? S Mar26 2:12 \_ /usr/bin/python3 /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/api.log --server=api
root 66267 0.0 0.2 1541664 49540 ? Ssl Mar26 0:01 /usr/bin/podman start -a mistral_executor
42430 66404 0.3 0.8 988736 177372 ? Ss Mar26 3:45 \_ /usr/bin/python3 /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor
42430 146163 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146164 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146166 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146167 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146169 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146170 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146172 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 146173 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147080 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147081 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147191 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147192 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147194 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147195 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147197 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147198 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147200 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>
42430 147201 0.0 0.0 0 0 ? Zs Mar26 0:00 \_ [ssh] <defunct>

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → stein-rc1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

mistral-server should be doing a better job on ripping those ssh processes started by ansible invoked via workflows

Changed in mistral:
status: New → Confirmed
Revision history for this message
Dougal Matthews (d0ugal) wrote :

This isn't a Mistral bug. Mistral just runs the Ansible action that is in tripleo-common. AFAIK, Mistral has no good way to know that we are shelling out to ansible.

Therefore, I believe we need to handle this in the tripleo-common code.

Changed in mistral:
status: Confirmed → Invalid
Revision history for this message
Adriano Petrich (apetrich) wrote :

There are two code paths one uses oslo_concurrency.execute and another use subprocess.Popen and it depends on if the queue_mamed was passed to the action. To better understand where the issue I'd like to know which one should I deep dive into.

Unfortunately from the traceback above I'm unable to find what action was being called that might be leaking defunct ssh processes

Revision history for this message
Dougal Matthews (d0ugal) wrote :

I believe these zombie processes are left by ansible, not tripleo. Since we are running in a container without init they are not reaped for us automatically.

I found this bug/comment which aligns with that theory.

https://github.com/ansible/ansible/issues/49270#issuecomment-462306244

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/648674
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=8ec33d15a576f145e21189a77310be82a1080cbc
Submitter: Zuul
Branch: master

commit 8ec33d15a576f145e21189a77310be82a1080cbc
Author: Emilien Macchi <email address hidden>
Date: Fri Mar 29 12:37:20 2019 +0000

    Revert "Stop dumb-init usage"

    Context: https://bugzilla.redhat.com/show_bug.cgi?id=1693752
    We agreed that we need dumb-init upstream and downstream,
    one of the reasons is to handle zombie processes in the
    containers.

    e.g. ssh connections in mistral_executor container,
    created by Ansible playbook runs.

    This reverts commit 89cbba272daca16d87429db6400848b911dbfe2f.

    Related-Bug: 1821854
    Change-Id: I93f97f76b598da268a77ff356fb1bffddfddef5f

Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.