mistral engine sporadically stops processing
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mistral |
Incomplete
|
High
|
Unassigned |
Bug Description
Hi!
I am running mistral in a container based setup, with the multi node configuration. After some time the Mistral Engine stops processing messages and everything comes to a halt (no workflow, no task is processed anymore). You can see the errors in the API log below (for a cron and an API triggered run). If I redeploy the Mistral Engine everything is resumed where it was left off and works, till the error reoccurs.
I observed the sporadic error after weeks, as well as only after 30mins after a redeploy. Atm I wouldn't know where to look next, already did online research but didnt't find any hint, thus filing this bug report.
Mistral 8.0.0
Python 3.7.2
PostgresDB
RabbitMQ 3.7.12
Happy to provide any further information!
Steffn
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base [req-c140887e-
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base Traceback (most recent call last):
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base return self._queues[
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base return waiter.wait()
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base return get_hub().switch()
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base return self.greenlet.
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base _queue.Empty
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base During handling of the above exception, another exception occurred:
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base Traceback (most recent call last):
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/opt/stack/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base return method(*args, **kwargs)
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/opt/stack/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base params=params
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/opt/stack/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base **kwargs
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base retry=self.retry)
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base retry=retry)
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base call_monitor_
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base call_monitor_
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base message = self.waiters.
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base File "/usr/local/
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base 'to message ID %s' % msg_id)
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base oslo_messaging.
2019-07-01 08:10:58.216 28 ERROR mistral.rpc.base
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
2019-07-01 08:10:58.218 28 ERROR mistral.
description: | updated |
summary: |
- mistral engine stops processing + mistral engine sporadically stops processing |
Changed in mistral: | |
milestone: | none → train-1 |
Changed in mistral: | |
status: | New → Incomplete |
milestone: | train-rc1 → none |
In the meantime I run into the issue again and was able to check if the rabbitmq connection is still there. I was looking for that inspired by the healthcheck script https:/ /github. com/openstack/ tripleo- common/ blob/master/ healthcheck/ mistral- engine
So in the error situation the rabbitmq connection is gone. As a workaround I applied the same healthcheck in my environment and redeploy the mistral-engine container if it fails.
Still would be interesting to figure out the root cause and avoid the problem overall.