cfs_bandwidth01 in sched from ubuntu_ltp_stable failed on B-4.15

Bug #1931325 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Confirmed
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Confirmed
Undecided
Unassigned

Bug Description

[Impact]
Test case cfs_bandwidth01 in LTP sched test suite is a reproducer
of a CFS unthrottle_cfs_rq() issue (fe61468b2cbc2b sched/fair: Fix
enqueue_task_fair warning).

This test triggers a warning on our 4.15 kernel:
 LTP: starting cfs_bandwidth01 (cfs_bandwidth01 -i 5)
 ------------[ cut here ]------------
 rq->tmp_alone_branch != &rq->leaf_cfs_rq_list
 WARNING: CPU: 0 PID: 0 at /build/linux-fYK9kF/linux-4.15.0/kernel/sched/fair.c:393 unthrottle_cfs_rq+0x16f/0x200
 Modules linked in: input_leds joydev serio_raw mac_hid qemu_fw_cfg kvm_intel kvm irqbypass sch_fq_codel binfmt_misc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse virtio_blk pata_acpi floppy virtio_net i2c_piix4
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-144-generic #148-Ubuntu
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 RIP: 0010:unthrottle_cfs_rq+0x16f/0x200
 RSP: 0018:ffff989ebfc03e80 EFLAGS: 00010082
 RAX: 0000000000000000 RBX: ffff989eb4c6ac00 RCX: 0000000000000000
 RDX: 0000000000000005 RSI: ffffffffacb63c4d RDI: 0000000000000046
 RBP: ffff989ebfc03ea8 R08: 000000af39e61b33 R09: ffffffffacb63c20
 R10: 0000000000000000 R11: 0000000000000001 R12: ffff989eb57fe400
 R13: ffff989ebfc21900 R14: 0000000000000001 R15: 0000000000000001
 FS: 0000000000000000(0000) GS:ffff989ebfc00000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055593258d618 CR3: 000000007a044000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
 <IRQ>
 distribute_cfs_runtime+0xc3/0x110
 sched_cfs_period_timer+0xff/0x220
 ? sched_cfs_slack_timer+0xd0/0xd0
 __hrtimer_run_queues+0xdf/0x230
 hrtimer_interrupt+0xa0/0x1d0
 smp_apic_timer_interrupt+0x6f/0x140
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
 RIP: 0010:native_safe_halt+0x12/0x20
 RSP: 0018:ffffffffac603e28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
 RAX: ffffffffabbc9280 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffffffac603e28 R08: 000000af39850067 R09: ffff989e73749d00
 R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 ? __sched_text_end+0x1/0x1
 default_idle+0x20/0x100
 arch_cpu_idle+0x15/0x20
 default_idle_call+0x23/0x30
 do_idle+0x172/0x1f0
 cpu_startup_entry+0x73/0x80
 rest_init+0xae/0xb0
 start_kernel+0x4dc/0x500
 x86_64_start_reservations+0x24/0x26
 x86_64_start_kernel+0x74/0x77
 secondary_startup_64+0xa5/0xb0
 Code: 50 09 00 00 49 39 85 60 09 00 00 74 68 80 3d 3a 6e 54 01 00 75 5f 31 db 48 c7 c7 c0 3d 2d ac c6 05 28 6e 54 01 01 e8 11 36 fc ff <0f> 0b 48 85 db 74 43 49 8b 85 78 09 00 00 49 39 85 70 09 00 00
 ---[ end trace b6b9a70bc2945c0c ]---

[Fix]
Base on the test case description, we will need these fixes:
  * fe61468b2cbc2b sched/fair: Fix enqueue_task_fair warning
  * b34cb07dde7c23 sched/fair: Fix enqueue_task_fair() warning some more
  * 39f23ce07b9355 sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
  * 6d4d22468dae3d sched/fair: Reorder enqueue/dequeue_task_fair path
  * 5ab297bab98431 sched/fair: Fix reordering of enqueue/dequeue_task_fair()

Backport needed for Bionic since we're missing some new variables /
coding style changes introduced in the following commits (and their
corresponding fixes):
  * 97fb7a0a8944bd sched: Clean up and harmonize the coding style of the scheduler code base
  * 9f68395333ad7f sched/pelt: Add a new runnable average signal
  * 6212437f0f6043 sched/fair: Fix runnable_avg for throttled cfs
  * 43e9f7f231e40e sched/fair: Start tracking SCHED_IDLE tasks count in cfs_rq

I have also searched in the upstream tree to see if there is any other
commit claim to be a fix of these but didn't see any.

[Test]
Test kernel can be found here:
https://people.canonical.com/~phlin/kernel/lp-1931325-cfs_bandwidth01/

With these patches applied, the test can pass without triggering this
warning.

<<<test_start>>>
tag=cfs_bandwidth01 stime=1624260713
cmdline="cfs_bandwidth01 -i 5"
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
tst_test.c:1313: TINFO: Timeout per run is 0h 05m 00s
tst_buffers.c:55: TINFO: Test is using guarded buffers
cfs_bandwidth01.c:49: TINFO: Set 'worker1/cpu.max' = '3000 10000'
cfs_bandwidth01.c:49: TINFO: Set 'worker2/cpu.max' = '2000 10000'
cfs_bandwidth01.c:49: TINFO: Set 'worker3/cpu.max' = '3000 10000'
cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
cfs_bandwidth01.c:125: TPASS: Workers exited
cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
cfs_bandwidth01.c:125: TPASS: Workers exited
cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
cfs_bandwidth01.c:125: TPASS: Workers exited
cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
cfs_bandwidth01.c:125: TPASS: Workers exited
cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
cfs_bandwidth01.c:125: TPASS: Workers exited

Summary:
passed 10
failed 0
broken 0
skipped 0
warnings 0

I have also run the whole sched test suite in LTP to make sure there
is no other issues caused by this patchset.

[Where problems could occur]
* CFS (Completely Fair Scheduler) is the process scheduling system in
the kernel, if the patch is incorrect it might affect the sched
functionality. Especially system with CONFIG_FAIR_GROUP_SCHED and
CONFIG_CFS_BANDWIDTH enabled.

[Other Info]
Test case description:
 * Creates a multi-level CGroup hierarchy with the cpu controller
 * enabled. The leaf groups are populated with "busy" processes which
 * simulate intermittent cpu load. They spin for some time then sleep
 * then repeat.
 *
 * Both the trunk and leaf groups are set cpu bandwidth limits. The
 * busy processes will intermittently exceed these limits. Causing
 * them to be throttled. When they begin sleeping this will then cause
 * them to be unthrottle.

[Original Bug Report]
Issue found on Azure 4.15.0-1116.129 Bionic

This is a new test case added 8 days ago. So this is not a regression:
https://github.com/linux-test-project/ltp/commit/d7b2d7cea71feaea1f27cae185abfe39f238d649

Test log:

 cfs_bandwidth01.c:49: TINFO: Set 'worker1/cpu.max' = '3000 10000'
 cfs_bandwidth01.c:49: TINFO: Set 'worker2/cpu.max' = '2000 10000'
 cfs_bandwidth01.c:49: TINFO: Set 'worker3/cpu.max' = '3000 10000'
 cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
 cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
 cfs_bandwidth01.c:125: TPASS: Workers exited
 cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
 cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
 cfs_bandwidth01.c:125: TPASS: Workers exited
 cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
 cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
 cfs_bandwidth01.c:125: TPASS: Workers exited
 cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
 cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
 cfs_bandwidth01.c:125: TPASS: Workers exited
 cfs_bandwidth01.c:113: TPASS: Scheduled bandwidth constrained workers
 cfs_bandwidth01.c:49: TINFO: Set 'level2/cpu.max' = '5000 10000'
 cfs_bandwidth01.c:125: TPASS: Workers exited
 tst_test.c:1349: TFAIL: Kernel is now tainted.

 HINT: You _MAY_ be missing kernel fixes, see:

 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39f23ce07b93
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b34cb07dde7c
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cbc
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ab297bab984
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4d22468dae

 Summary:
 passed 10
 failed 1
 broken 0
 skipped 0
 warnings 0
 tag=cfs_bandwidth01 stime=1622734092 dur=15 exit=exited stat=1 core=no cu=478 cs=303

Note that this test has failed on the following instances:
  * Basic_A2
  * Standard_B1ms
  * Standard_D48_v3
  * Standard_F32s_v2

But passed with:
  * Standard_DS15_v2
  * Standard_DS5_v2

When this issue happens, kernel will be tainted with warning:
[ 694.761774] LTP: starting cfs_bandwidth01 (cfs_bandwidth01 -i 5)
[ 695.801637] ------------[ cut here ]------------
[ 695.801640] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list
[ 695.801694] WARNING: CPU: 0 PID: 0 at /build/linux-fYK9kF/linux-4.15.0/kernel/sched/fair.c:393 unthrottle_cfs_rq+0x16f/0x200
[ 695.801695] Modules linked in: input_leds joydev serio_raw mac_hid qemu_fw_cfg kvm_intel kvm irqbypass sch_fq_codel binfmt_misc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse virtio_blk pata_acpi floppy virtio_net i2c_piix4
[ 695.801726] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-144-generic #148-Ubuntu
[ 695.801727] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 695.801729] RIP: 0010:unthrottle_cfs_rq+0x16f/0x200
[ 695.801730] RSP: 0018:ffff989ebfc03e80 EFLAGS: 00010082
[ 695.801732] RAX: 0000000000000000 RBX: ffff989eb4c6ac00 RCX: 0000000000000000
[ 695.801732] RDX: 0000000000000005 RSI: ffffffffacb63c4d RDI: 0000000000000046
[ 695.801734] RBP: ffff989ebfc03ea8 R08: 000000af39e61b33 R09: ffffffffacb63c20
[ 695.801734] R10: 0000000000000000 R11: 0000000000000001 R12: ffff989eb57fe400
[ 695.801735] R13: ffff989ebfc21900 R14: 0000000000000001 R15: 0000000000000001
[ 695.801737] FS: 0000000000000000(0000) GS:ffff989ebfc00000(0000) knlGS:0000000000000000
[ 695.801738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 695.801739] CR2: 000055593258d618 CR3: 000000007a044000 CR4: 00000000000006f0
[ 695.801742] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 695.801743] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 695.801744] Call Trace:
[ 695.801755] <IRQ>
[ 695.801765] distribute_cfs_runtime+0xc3/0x110
[ 695.801767] sched_cfs_period_timer+0xff/0x220
[ 695.801768] ? sched_cfs_slack_timer+0xd0/0xd0
[ 695.801775] __hrtimer_run_queues+0xdf/0x230
[ 695.801777] hrtimer_interrupt+0xa0/0x1d0
[ 695.801786] smp_apic_timer_interrupt+0x6f/0x140
[ 695.801789] apic_timer_interrupt+0x90/0xa0
[ 695.801789] </IRQ>
[ 695.801791] RIP: 0010:native_safe_halt+0x12/0x20
[ 695.801792] RSP: 0018:ffffffffac603e28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[ 695.801793] RAX: ffffffffabbc9280 RBX: 0000000000000000 RCX: 0000000000000000
[ 695.801794] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 695.801795] RBP: ffffffffac603e28 R08: 000000af39850067 R09: ffff989e73749d00
[ 695.801796] R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000000
[ 695.801796] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 695.801801] ? __sched_text_end+0x1/0x1
[ 695.801804] default_idle+0x20/0x100
[ 695.801813] arch_cpu_idle+0x15/0x20
[ 695.801814] default_idle_call+0x23/0x30
[ 695.801821] do_idle+0x172/0x1f0
[ 695.801823] cpu_startup_entry+0x73/0x80
[ 695.801825] rest_init+0xae/0xb0
[ 695.801843] start_kernel+0x4dc/0x500
[ 695.801845] x86_64_start_reservations+0x24/0x26
[ 695.801847] x86_64_start_kernel+0x74/0x77
[ 695.801851] secondary_startup_64+0xa5/0xb0
[ 695.801852] Code: 50 09 00 00 49 39 85 60 09 00 00 74 68 80 3d 3a 6e 54 01 00 75 5f 31 db 48 c7 c7 c0 3d 2d ac c6 05 28 6e 54 01 01 e8 11 36 fc ff <0f> 0b 48 85 db 74 43 49 8b 85 78 09 00 00 49 39 85 70 09 00 00
[ 695.801875] ---[ end trace b6b9a70bc2945c0c ]---

Po-Hsu Lin (cypressyew)
tags: added: 4.15 azure bionic sru-20210531 ubuntu-ltp-stable
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This can be found on generic 4.15.0-145.149 as well

summary: - cfs_bandwidth01 in sched from ubuntu_ltp_stable failed on B-azure-4.15
+ cfs_bandwidth01 in sched from ubuntu_ltp_stable failed on B-4.15
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
description: updated
Po-Hsu Lin (cypressyew)
description: updated
description: updated
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: New → Fix Released
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
description: updated
tags: added: gt
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Changed in ubuntu-kernel-tests:
status: New → In Progress
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
description: updated
Po-Hsu Lin (cypressyew)
description: updated
tags: added: fips
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
tags: added: sru-20210621
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws (kernel 4.15), cycle sru-20210621.

tags: added: aws
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/oracle (kernel 4.15), cycle sru-20210621.

tags: added: oracle
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws-fips (kernel 4.15), cycle sru-20210621.

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Found on B/azure 4.15.0-1120-azure

Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Found on hirsute/linux 5.13.0-9.9 host kili. passes on host vought

tags: added: hirsute sru-20210719
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Sorry, the above is for Impish, not hirsute: Found on Impish/linux 5.13.0-9.9 host kili. passes on host vought

tags: added: impish
removed: hirsute
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: In Progress → Confirmed
Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Found on bionic/azure-4.15 4.15.0-1122.135

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It looks like this is a bit risky to backport this.
I will hold off the patch unless it can make the way into stable upstream.

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Found on bionic/linux-azure-fips/4.15.0-2035.39

tags: added: sru-20210816
Revision history for this message
Ian May (ian-may) wrote :

Found on: bionic/linux-aws: 4.15.0-1111.118

Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Bionic):
assignee: Po-Hsu Lin (cypressyew) → nobody
Changed in ubuntu-kernel-tests:
assignee: Po-Hsu Lin (cypressyew) → nobody
Po-Hsu Lin (cypressyew)
tags: added: 4.4 sru-20230612 xenial
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.