Comment 2 for bug 1840348

Revision history for this message
Dan Hill (hillpd) wrote :

There are two edge-cases in 12.2.11 where a worker thread's suicide_grace value gets dropped:
[0] In the Threadpool context, Threadpool:worker() drops suicide_grace while waiting on an empty work queue.
[1] In the ShardedThreadpool context, OSD::ShardedOpWQ::_process() drops suicide_grace while opportunistically waiting for more work (to prevent additional lock contention).

The Threadpool context always re-assigns suicide_grace before driving any work. The ShardedThreadpool context does not follow this pattern. After delaying to find additional work, the default sharded work queue timeouts are not re-applied.

This oversight exists in Luminous on-wards. Mimic, and Nautilus have each reworked the ShardedOpWQ code path, but did not address the problem.

[0] https://github.com/ceph/ceph/blob/v12.2.11/src/common/WorkQueue.cc#L137
[1] https://github.com/ceph/ceph/blob/v12.2.11/src/osd/OSD.cc#L10476