prlimit bound qemu-img call fails when running under wsgi configuration

Bug #1712463 reported by Brian Rosmaita
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
In Progress
High
Matthew Treinish
oslo.concurrency
Fix Released
Undecided
Matthew Treinish

Bug Description

When running under uwsgi, the tasks that call qemu-image fail. See comment #1 for a stacktrace and comment #2 for details.

Ignore the business about waitall() in comment #1, that's unrelated and is being moved to a different bug, namely, https://bugs.launchpad.net/glance/+bug/1742813

Tags: uwsgi
Changed in glance:
milestone: none → queens-1
assignee: nobody → Brian Rosmaita (brian-rosmaita)
status: Triaged → In Progress
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

Had to put a waitall() on the greenpool when the task was being spawned. That gave me some useful logs and I can watch the task going to failure. Not sure yet exactly why without the waitall() the tasks are just sitting there.

This is definitely a problem though:

Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: DEBUG oslo_concurrency.processutils [None req-81a7fc37-af25-499f-9190-70851c43eda5 admin admin] Running cmd (subprocess): /usr/local/bin/uwsgi -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- qemu-img info --output=json file:///tmp/glance_workdir/0d397d5e-0662-40c5-ba24-230d11f5788c {{(pid=65491) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:355}}
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: DEBUG oslo_concurrency.processutils [None req-81a7fc37-af25-499f-9190-70851c43eda5 admin admin] CMD "/usr/local/bin/uwsgi -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- qemu-img info --output=json file:///tmp/glance_workdir/0d397d5e-0662-40c5-ba24-230d11f5788c" returned: 1 in 0.016s {{(pid=65491) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:385}}
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: DEBUG oslo_concurrency.processutils [None req-81a7fc37-af25-499f-9190-70851c43eda5 admin admin] None
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: command: u'/usr/local/bin/uwsgi -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- qemu-img info --output=json file:///tmp/glance_workdir/0d397d5e-0662-40c5-ba24-230d11f5788c'
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: exit code: 1
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: stdout: u''
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: stderr: u'unable to load configuration from oslo_concurrency.prlimit\n' {{(pid=65491) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:424}}
Aug 29 11:47:02 br-virtual-machine <email address hidden>[65490]: DEBUG oslo_concurrency.processutils [None req-81a7fc37-af25-499f-9190-70851c43eda5 admin admin] u'/usr/local/bin/uwsgi -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- qemu-img info --output=json file:///tmp/glance_workdir/0d397d5e-0662-40c5-ba24-230d11f5788c' failed. Not Retrying. {{(pid=65491) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:433}}

The command being run in the subprocess seems kind of weird, i.e., /usr/local/bin/uwsgi

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

We're using oslo_concurrency.processutils to call qemu-img ; want to use the process limits facility that gives us because of a security issue (bug #1449062).

When you call processutils.execute() with a prlimit argument, it uses sys.executable to get the python interpreter to make the call with [0] and then passes -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 --' before the command.

In the screen-based devstack, sys.executable was /usr/bin/python and everything worked.
In the systemd-based devstack, sys.executable is /usr/local/bin/uwsgi, which doesn't understand the arguments and thinks it's being passed a config file.

From the command line:
$ /usr/local/bin/uwsgi -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- qemu-img info --output=json file:///tmp/glance_workdir/a4a3a99e-6ba3-4e7a-96d9-1b01f0bc097e
output:
unable to load configuration from oslo_concurrency.prlimit

$ /usr/bin/python -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- qemu-img info --output=json file:///tmp/glance_workdir/a4a3a99e-6ba3-4e7a-96d9-1b01f0bc097e
output:
{
"virtual-size": 1265664,
"filename": "///tmp/glance_workdir/a4a3a99e-6ba3-4e7a-96d9-1b01f0bc097e",
"format": "raw",
"actual-size": 1265664,
"dirty-flag": false
}

[0] https://github.com/openstack/oslo.concurrency/blob/aeaee20c8db756e66f59285520a4e5cacdb740f2/oslo_concurrency/processutils.py#L353

Revision history for this message
Matthew Treinish (treinish) wrote :

This definitely looks like a bug in the prlimit stuff to me. The assumption that sys.executable is always going to something that can run the prlimit module definitely isn't always the case. Besides the uwsgi case (which admittedly does some weird things) there is also the case where sys.executable is None.

There should be a more robust way to enforce rate limiting on a subprocess that doesn't involve calling a python module with python -m in a s subprocess. Off the top of my head something like multiprocessing.Process(target=prlimit.main, args=($cmd)) should work and not involve trying to divine the python executable path for every conceivable runtime to manually do that.

description: updated
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

It looks like there are two different bugs here:

(1) The taskflow executor is not working in the wsgi devstack.

(2) Tasks that use oslo_concurrency.processutils fail when glance is running as a wsgi application.

summary: - taskflow not running under wsgi configuration
+ failure when running under wsgi configuration
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote : Re: failure when running under wsgi configuration

Taskflow problem:

In the Tasks and Images controllers, when an API request comes in, a task is created and a spawn_n is called on the 'tasks_eventlet_pool'. In the pike devstack, these tasks remain stuck in pending. I added a waitall() right after the spawn_n() call, and the tasks run to completion. This happens whether the taskflow engine is being run in serial or parallel mode. It does not occur when glance is run in the screen-based devstack.

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

To be more specific about the preceding comment:

In the Tasks controller, when I put a pool.waitall() after this line, the import task is executed (it goes to 'failure', but that's because of the oslo_concurrency problem; the key thing is that it gets out of 'pending'):
http://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/tasks.py?h=stable/pike#n82

Changed in glance:
milestone: queens-1 → queens-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.concurrency (master)

Fix proposed to branch: master
Review: https://review.openstack.org/531493

Changed in oslo.concurrency:
assignee: nobody → Matthew Treinish (treinish)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.concurrency (master)

Reviewed: https://review.openstack.org/531493
Committed: https://git.openstack.org/cgit/openstack/oslo.concurrency/commit/?id=55e06261aa86c87c7c059fbddc97cdbaae06e8dd
Submitter: Zuul
Branch: master

commit 55e06261aa86c87c7c059fbddc97cdbaae06e8dd
Author: Matthew Treinish <email address hidden>
Date: Fri Jan 5 15:11:17 2018 -0500

    Add python_exec kwarg to processutils.execute()

    This commit adds a new kwarg to the process_utils.execute() function to
    specify the python executable to use when launching python to check
    prlimits. This is necessary when processutils.execute() is called from
    inside an API server running with uwsgi. In this case sys.executable is
    uwsgi (because uwsgi links libpython.so and is actually the interpreter)
    This doesn't work with the execute() function because it assumes the
    cpython interpreter CLI is used for the arguments it uses to call the
    prlimits module. To workaround this and enable API servers that may run
    under uwsgi to use this those applications can simply pass in an
    executable to use.

    Longer term it might be better to migrate the prlimits usage to call
    multiprocessing instead of subprocessing python. But that would require
    a more significant rewrite of both processutils and prlimit to
    facilitate that.

    Change-Id: I0ae60f0b4cc3700c783f6018e837358f0e053a09
    Closes-Bug: #1712463

Changed in oslo.concurrency:
status: In Progress → Fix Released
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote : Re: failure when running under wsgi configuration

The Glance part of this bug is addressed by https://review.openstack.org/531498

Changed in glance:
assignee: Brian Rosmaita (brian-rosmaita) → Matthew Treinish (treinish)
milestone: queens-2 → queens-3
summary: - failure when running under wsgi configuration
+ prlimit bound qemu-img call fails when running under wsgi configuration
description: updated
description: updated
tags: added: uwsgi
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.concurrency 3.25.0

This issue was fixed in the openstack/oslo.concurrency 3.25.0 release.

Changed in glance:
milestone: queens-3 → queens-rc1
Changed in glance:
milestone: queens-rc1 → queens-rc2
Changed in glance:
milestone: queens-rc2 → rocky-1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.