* Patches will be sent to the kernel-team mailing list
once the test kernel has been verified by the reporter.
[Impact]
* Users may experience cpu hard lockups when performing
rigorous writes to NVMe drives.
* The fix addresses an scheduling issue in the original
implementation of wbt/writeback throttling
* The fix is commit 2887e41b910b ("blk-wbt: Avoid lock
contention and thundering herd issue in wbt_wait"),
plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve
waking of tasks").
* There are additional commits to help with a cleaner
backport and future maintenance. So, of 13 commits,
9 are clean cherry-picks and 4 backports, which are
just changes to context lines (i.e. refresh) without
any functional changes in the backport itself.
[Test Case]
* This command has been reported to reproduce the problem:
* The commits have been verified for fixes in later commits in
linux-next as of 2019-01-08 and all known fix commits are in.
* The regression potential is mostly contained in the writeback
throttling (block/blk-wbt.*), as almost all of the 13 patches
change exclusively that, except for 4 of them (2 of which are
sysfs):
- blk-rq-qos: refactor out common elements of blk-wbt (block/)
- block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)
- block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)
- block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*)
[Other Info]
* Alternatively, it is probably possible to introduce just the
two commits that fix this with some changes to their code in
the backport, but since the 'blk-rq-qos: refactor ..' commit
may become a dependency for many additional/future fixes, it
seemed interesting to pull it in earlier in the 18.04 branch.
* The problem has been introduced with the blk-wbt mechanism,
in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,
so only Bionic and Cosmic needs this.
[NOTE]
* Patches will be sent to the kernel-team mailing list
once the test kernel has been verified by the reporter.
[Impact]
* Users may experience cpu hard lockups when performing
rigorous writes to NVMe drives.
* The fix addresses an scheduling issue in the original
implementation of wbt/writeback throttling
* The fix is commit 2887e41b910b ("blk-wbt: Avoid lock
contention and thundering herd issue in wbt_wait"),
plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve
waking of tasks").
* There are additional commits to help with a cleaner
backport and future maintenance. So, of 13 commits,
9 are clean cherry-picks and 4 backports, which are
just changes to context lines (i.e. refresh) without
any functional changes in the backport itself.
[Test Case]
* This command has been reported to reproduce the problem:
$ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128
* It generates stack traces as included below.
[Regression Potential]
* The commits have been verified for fixes in later commits in
linux-next as of 2019-01-08 and all known fix commits are in.
* The regression potential is mostly contained in the writeback
throttling (block/blk-wbt.*), as almost all of the 13 patches
change exclusively that, except for 4 of them (2 of which are
sysfs):
- blk-rq-qos: refactor out common elements of blk-wbt (block/) }register_ queue() (blk-sysfs.c) }register_ queue() (blk-{mq-}sysfs.c)
- block: Protect less code with sysfs_lock in blk_{un,
- block: Protect less code with sysfs_lock in blk_{un,
- block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*)
[Other Info]
* Alternatively, it is probably possible to introduce just the
two commits that fix this with some changes to their code in
the backport, but since the 'blk-rq-qos: refactor ..' commit
may become a dependency for many additional/future fixes, it
seemed interesting to pull it in earlier in the 18.04 branch.
* The problem has been introduced with the blk-wbt mechanism,
in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,
so only Bionic and Cosmic needs this.
[Stack Traces]
[ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 task_fair+ 0x6c/0x7f0 load_avg_ blocked_ se.isra. 37+0xd1/ 0x150 load_avg_ blocked_ se.isra. 37+0xd1/ 0x150 task+0x57/ 0xc0 0x9/0x10 0x9/0x10 activate+ 0x49/0x90 wake_up+ 0x1df/0x490 wake_function+ 0x12/0x20 wake_function+ 0x12/0x40 up_common+ 0x73/0x130 up_common_ lock+0x80/ 0xc0 part.21+ 0xa4/0xb0 free_request+ 0xca/0x1a0 end_request+ 0x48/0x90 rq+0x23/ 0x120 [nvme_core] complete_ rq+0x7a/ 0x130 [nvme] complete_ request+ 0xd2/0x140 complete_ request+ 0x18/0x20 cq+0xe1/ 0x1b0 [nvme] irq_event_ percpu+ 0x44/0x1a0 irq_event_ percpu+ 0x32/0x80 irq_event+ 0x3b/0x60 edge_irq+ 0x7c/0x190 irq+0x20/ 0x30 interrupt+ 0x84/0x84 enter_state+ 0x97/0x2f0 enter+0x17/ 0x20 0x23/0x40 entry+0x73/ 0x80 +0x1a6/ 0x200 startup_ 64+0xa5/ 0xb0
...
[ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu
...
[ 393.628720] Call Trace:
[ 393.628721] <IRQ>
[ 393.628724] enqueue_
[ 393.628726] ? __update_
[ 393.628728] ? __update_
[ 393.628731] activate_
[ 393.628735] ? sched_clock+
[ 393.628736] ? sched_clock+
[ 393.628738] ttwu_do_
[ 393.628739] try_to_
[ 393.628741] default_
[ 393.628743] autoremove_
[ 393.628744] __wake_
[ 393.628745] __wake_
[ 393.628746] __wake_up+0x13/0x20
[ 393.628749] __wbt_done.
[ 393.628749] wbt_done+0x72/0xa0
[ 393.628753] blk_mq_
[ 393.628755] blk_mq_
[ 393.628760] nvme_complete_
[ 393.628763] nvme_pci_
[ 393.628764] __blk_mq_
[ 393.628766] blk_mq_
[ 393.628767] nvme_process_
[ 393.628768] nvme_irq+0x23/0x50 [nvme]
[ 393.628772] __handle_
[ 393.628773] handle_
[ 393.628774] handle_
[ 393.628778] handle_
[ 393.628779] handle_
[ 393.628783] do_IRQ+0x46/0xd0
[ 393.628784] common_
[ 393.628785] </IRQ>
...
[ 393.628794] ? cpuidle_
[ 393.628796] cpuidle_
[ 393.628797] call_cpuidle+
[ 393.628798] do_idle+0x18c/0x1f0
[ 393.628799] cpu_startup_
[ 393.628802] start_secondary
[ 393.628804] secondary_
[ 393.628805] Code: ...
[ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled
[ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: entity+ 0x9f/0x6b0 task_fair+ 0x6c/0x7f0 task+0x57/ 0xc0 0x9/0x10 0x9/0x10 activate+ 0x49/0x90 wake_up+ 0x1df/0x490 wake_function+ 0x12/0x20 wake_function+ 0x12/0x40 up_common+ 0x73/0x130 up_common_ lock+0x80/ 0xc0 part.21+ 0xa4/0xb0 free_request+ 0xca/0x1a0 end_request+ 0x48/0x90 rq+0x23/ 0x120 [nvme_core] complete_ rq+0x7a/ 0x130 [nvme] complete_ request+ 0xd2/0x140 complete_ request+ 0x18/0x20 cq+0xe1/ 0x1b0 [nvme] irq_event_ percpu+ 0x44/0x1a0 irq_event_ percpu+ 0x32/0x80 irq_event+ 0x3b/0x60 edge_irq+ 0x7c/0x190 irq+0x20/ 0x30 interrupt+ 0x84/0x84 enter_state+ 0x97/0x2f0 enter+0x17/ 0x20 0x23/0x40 entry+0x73/ 0x80 +0x1a6/ 0x200 startup_ 64+0xa5/ 0xb0
[ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485
[ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818)
[ 435.617025] Sending NMI from CPU 8 to CPUs 30:
[ 435.617029] NMI backtrace for cpu 30
[ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu
...
[ 435.617047] Call Trace:
[ 435.617048] <IRQ>
[ 435.617051] enqueue_
[ 435.617053] enqueue_
[ 435.617056] activate_
[ 435.617059] ? sched_clock+
[ 435.617060] ? sched_clock+
[ 435.617061] ttwu_do_
[ 435.617063] try_to_
[ 435.617065] default_
[ 435.617067] autoremove_
[ 435.617068] __wake_
[ 435.617069] __wake_
[ 435.617070] __wake_up+0x13/0x20
[ 435.617073] __wbt_done.
[ 435.617074] wbt_done+0x72/0xa0
[ 435.617077] blk_mq_
[ 435.617079] blk_mq_
[ 435.617084] nvme_complete_
[ 435.617087] nvme_pci_
[ 435.617088] __blk_mq_
[ 435.617090] blk_mq_
[ 435.617091] nvme_process_
[ 435.617093] nvme_irq+0x23/0x50 [nvme]
[ 435.617096] __handle_
[ 435.617097] handle_
[ 435.617098] handle_
[ 435.617101] handle_
[ 435.617102] handle_
[ 435.617106] do_IRQ+0x46/0xd0
[ 435.617107] common_
[ 435.617108] </IRQ>
...
[ 435.617117] ? cpuidle_
[ 435.617118] cpuidle_
[ 435.617119] call_cpuidle+
[ 435.617121] do_idle+0x18c/0x1f0
[ 435.617122] cpu_startup_
[ 435.617125] start_secondary
[ 435.617127] secondary_
[ 435.617128] Code: ...