There are two edge-cases in 12.2.11 where a worker thread's suicide_grace value gets dropped:
[0] In the Threadpool context, Threadpool:worker() drops suicide_grace while waiting on an empty work queue.
[1] In the ShardedThreadpool context, OSD::ShardedOpWQ::_process() drops suicide_grace while opportunistically waiting for more work (to prevent additional lock contention).
The Threadpool context always re-assigns suicide_grace before driving any work. The ShardedThreadpool context does not follow this pattern. After delaying to find additional work, the default sharded work queue timeouts are not re-applied.
This oversight exists in Luminous on-wards. Mimic, and Nautilus have each reworked the ShardedOpWQ code path, but did not address the problem.
There are two edge-cases in 12.2.11 where a worker thread's suicide_grace value gets dropped: Q::_process( ) drops suicide_grace while opportunistically waiting for more work (to prevent additional lock contention).
[0] In the Threadpool context, Threadpool:worker() drops suicide_grace while waiting on an empty work queue.
[1] In the ShardedThreadpool context, OSD::ShardedOpW
The Threadpool context always re-assigns suicide_grace before driving any work. The ShardedThreadpool context does not follow this pattern. After delaying to find additional work, the default sharded work queue timeouts are not re-applied.
This oversight exists in Luminous on-wards. Mimic, and Nautilus have each reworked the ShardedOpWQ code path, but did not address the problem.
[0] https:/ /github. com/ceph/ ceph/blob/ v12.2.11/ src/common/ WorkQueue. cc#L137 /github. com/ceph/ ceph/blob/ v12.2.11/ src/osd/ OSD.cc# L10476
[1] https:/