Heartbeat not sent for listen connections waiting for a message

Bug #2035113 reported by Arnaud Morin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.messaging
In Progress
Undecided
Arnaud Morin

Bug Description

When using heartbeat_in_pthreads = True, the threading and queue libraries are not eventlet monkey patched anymore for heartbeats (see [1])

Because of this, when waiting for a message, the queue.get(block=True) (see [2]) completely blocks the thread, preventing it to send heartbeats.

After a while, the connection could be dropped by rabbitmq because of missing heartbeats.

Note that it also depends on rpc_response_timeout value, which, by luck, is 60 sec by default (same as heartbeat timeout), so with default values this bug is not triggered, but if you try to increase rpc_response_timeout to 300secs and stop nova-conductor, you will see some nova-compute RPC connection beeing killed by rabbitmq servers because of misses heartbeats.

[1] https://github.com/openstack/oslo.messaging/blob/7705b4f3023e0e63f3b37e9a25c774f309fec55e/oslo_messaging/_drivers/impl_rabbit.py#L630-L657
[2] https://github.com/openstack/oslo.messaging/blob/7705b4f3023e0e63f3b37e9a25c774f309fec55e/oslo_messaging/_drivers/amqpdriver.py#L441

Changed in oslo.messaging:
assignee: nobody → Arnaud Morin (arnaud-morin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)
Changed in oslo.messaging:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.