sockpair stress test hangs on a low-performance arm64 device

Bug #2028281 reported by Talha Can Havadar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Triaged
Medium
Colin Ian King

Bug Description

Hello,

We were trying to stress test an embedded arm64 device with 2G ram.

Here is the installed version of `stress-ng` snap:
```
ubuntu@kria:~$ snap list stress-ng
Name Version Rev Tracking Publisher Notes
stress-ng V0.16.02-40-20230719-13238-g6df6 7628 latest/edge cking-kernel-tools devmode
```

We executed the following command, even though we set the timeout for 30 seconds it keeps running and running. At some point sometimes, it gives `out of memory` messages, system is not completely frozen (it is almost impossible to do something else thou) but it doesn't even finish the tests in 3 hours. I had to give keyboard interrupt (Ctrl+C) to terminate the job, it does not immediately terminates but after around 10 minutes all stressors being terminated.

First try, finished in 58 seconds.
```
ubuntu@kria:~$ stress-ng --sockpair 0 --timeout 30 --verbose --syslog
update.go:85: cannot change mount namespace according to change mount (/var/lib/snapd/hostfs/boot /boot none bind,ro 0 0): permission denied
stress-ng: debug: [1468] invoked with '/snap/stress-ng/7628/usr/bin/stress-ng --sockpair 0 --timeout 30 --verbose --syslog' by user 1000
stress-ng: debug: [1468] stress-ng 0.16.02 g6df67425a58d
stress-ng: debug: [1468] system: Linux kria 5.15.0-9000-xilinx-zynqmp #1-Ubuntu SMP Tue Jul 11 02:56:05 UTC 2023 aarch64, gcc, glibc 2.35
stress-ng: debug: [1468] RAM total: 1.9G, RAM free: 1.4G, swap free: 0.0
stress-ng: debug: [1468] temporary file path: '/home/ubuntu', filesystem type: ext2 (2852022 blocks available)
stress-ng: debug: [1468] 4 processors online, 4 processors configured
stress-ng: info: [1468] setting to a 30 secs run per stressor
stress-ng: debug: [1468] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [1468] cache allocate: shared cache buffer size: 2048K
stress-ng: info: [1468] dispatching hogs: 4 sockpair
stress-ng: debug: [1468] starting stressors
stress-ng: debug: [1503] sockpair: [1503] started (instance 0 on CPU 3)
stress-ng: debug: [1505] sockpair: [1505] started (instance 2 on CPU 0)
stress-ng: debug: [1506] sockpair: [1506] started (instance 3 on CPU 3)
stress-ng: debug: [1468] 4 stressors started
stress-ng: debug: [1504] sockpair: [1504] started (instance 1 on CPU 1)
stress-ng: debug: [1504] sockpair: [1504] exited (instance 1 on CPU 0)
stress-ng: debug: [1506] sockpair: [1506] exited (instance 3 on CPU 2)
stress-ng: debug: [1503] sockpair: [1503] exited (instance 0 on CPU 3)
stress-ng: debug: [1468] sockpair: [1503] terminated (success)
stress-ng: debug: [1468] sockpair: [1504] terminated (success)
stress-ng: debug: [1505] sockpair: [1505] exited (instance 2 on CPU 3)
stress-ng: debug: [1468] sockpair: [1505] terminated (success)
stress-ng: debug: [1468] sockpair: [1506] terminated (success)
stress-ng: metrc: [1468] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [1468] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [1468] sockpair 81515 57.95 1.21 29.31 1406.55 2671.07 13.16 1460
stress-ng: metrc: [1468] miscellaneous metrics:
stress-ng: metrc: [1468] sockpair 26119.42 socketpair calls sec (geometric mean of 4 instances)
stress-ng: metrc: [1468] sockpair 8.77 MB written per sec (geometric mean of 4 instances)
stress-ng: debug: [1468] metrics-check: all stressor metrics validated and sane
stress-ng: info: [1468] passed: 4: sockpair (4)
stress-ng: info: [1468] failed: 0
stress-ng: info: [1468] skipped: 0
stress-ng: info: [1468] metrics untrustworthy: 0
stress-ng: info: [1468] successful run completed in 58.23 secs
```

Second try, got an out of memory error, had to terminate with Ctrl+C
```
ubuntu@kria:~$ stress-ng --sockpair 0 --timeout 30 --verbose --syslog
stress-ng: debug: [1541] invoked with '/snap/stress-ng/7628/usr/bin/stress-ng --sockpair 0 --timeout 30 --verbose --syslog' by user 1000
stress-ng: debug: [1541] stress-ng 0.16.02 g6df67425a58d
stress-ng: debug: [1541] system: Linux kria 5.15.0-9000-xilinx-zynqmp #1-Ubuntu SMP Tue Jul 11 02:56:05 UTC 2023 aarch64, gcc, glibc 2.35
stress-ng: debug: [1541] RAM total: 1.9G, RAM free: 1.6G, swap free: 0.0
stress-ng: debug: [1541] temporary file path: '/home/ubuntu', filesystem type: ext2 (2852022 blocks available)
stress-ng: debug: [1541] 4 processors online, 4 processors configured
stress-ng: info: [1541] setting to a 30 secs run per stressor
stress-ng: debug: [1541] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [1541] cache allocate: shared cache buffer size: 2048K
stress-ng: info: [1541] dispatching hogs: 4 sockpair
stress-ng: debug: [1541] starting stressors
stress-ng: debug: [1541] 4 stressors started
stress-ng: debug: [1573] sockpair: [1573] started (instance 2 on CPU 3)
stress-ng: debug: [1571] sockpair: [1571] started (instance 0 on CPU 0)
stress-ng: debug: [1574] sockpair: [1574] started (instance 3 on CPU 0)
stress-ng: debug: [1572] sockpair: [1572] started (instance 1 on CPU 1)
[ 165.655530] Out of memory: Killed process 1582 (stress-ng-sockp) total-vm:13316kB, anon-rss:396kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:52kB oom_score_adj:1000
^C^C^C^C^C^C^C[ 359.140885] Out of memory: Killed process 1581 (stress-ng-sockp) total-vm:13316kB, anon-rss:396kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:52kB oom_score_adj:1000
[ 361.209060] Out of memory: Killed process 1580 (stress-ng-sockp) total-vm:13316kB, anon-rss:396kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:52kB oom_score_adj:1000
[ 363.288462] Out of memory: Killed process 1579 (stress-ng-sockp) total-vm:13316kB, anon-rss:396kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:52kB oom_score_adj:1000
[ 363.386682] INFO: task kcompactd0:46 blocked for more than 120 seconds.
[ 363.393342] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.399460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.407580] INFO: task stress-ng-sockp:1572 blocked for more than 120 seconds.
[ 363.414814] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.420926] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.428964] INFO: task stress-ng-sockp:1575 blocked for more than 120 seconds.
[ 363.436205] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.442300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.450332] INFO: task stress-ng-sockp:1576 blocked for more than 120 seconds.
[ 363.457561] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.463652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.471676] INFO: task stress-ng-sockp:1579 blocked for more than 120 seconds.
[ 363.478932] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.485050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.493190] INFO: task stress-ng-sockp:1580 blocked for more than 120 seconds.
[ 363.500438] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.506549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.514661] INFO: task stress-ng-sockp:1581 blocked for more than 120 seconds.
[ 363.521891] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.528009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.536067] INFO: task stress-ng-sockp:1582 blocked for more than 120 seconds.
[ 363.543296] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 363.549416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 365.331460] Out of memory: Killed process 1578 (stress-ng-sockp) total-vm:13316kB, anon-rss:324kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:52kB oom_score_adj:1000
[ 365.356832] Out of memory: Killed process 1576 (stress-ng-sockp) total-vm:13316kB, anon-rss:324kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:52kB oom_score_adj:1000
^C^C^C[ 484.158296] INFO: task kcompactd0:46 blocked for more than 241 seconds.
[ 484.165001] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 484.171114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.179256] INFO: task stress-ng-sockp:1572 blocked for more than 241 seconds.
[ 484.186492] Not tainted 5.15.0-9000-xilinx-zynqmp #1-Ubuntu
[ 484.192614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
```

After reboot it finishes again (so i feel like it is happening when I try to run it again consecutively):
```
ubuntu@kria:~$ stress-ng --sockpair 0 --timeout 30 --verbose --syslog
update.go:85: cannot change mount namespace according to change mount (/var/lib/snapd/hostfs/boot /boot none bind,ro 0 0): permission denied
stress-ng: debug: [1333] invoked with '/snap/stress-ng/7628/usr/bin/stress-ng --sockpair 0 --timeout 30 --verbose --syslog' by user 1000
stress-ng: debug: [1333] stress-ng 0.16.02 g6df67425a58d
stress-ng: debug: [1333] system: Linux kria 5.15.0-9000-xilinx-zynqmp #1-Ubuntu SMP Tue Jul 11 02:56:05 UTC 2023 aarch64, gcc, glibc 2.35
stress-ng: debug: [1333] RAM total: 1.9G, RAM free: 1.4G, swap free: 0.0
stress-ng: debug: [1333] temporary file path: '/home/ubuntu', filesystem type: ext2 (2845821 blocks available)
stress-ng: debug: [1333] 4 processors online, 4 processors configured
stress-ng: info: [1333] setting to a 30 secs run per stressor
stress-ng: debug: [1333] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [1333] cache allocate: shared cache buffer size: 2048K
stress-ng: info: [1333] dispatching hogs: 4 sockpair
stress-ng: debug: [1333] starting stressors
stress-ng: debug: [1333] 4 stressors started
stress-ng: debug: [1472] sockpair: [1472] started (instance 0 on CPU 0)
stress-ng: debug: [1475] sockpair: [1475] started (instance 3 on CPU 2)
stress-ng: debug: [1474] sockpair: [1474] started (instance 2 on CPU 1)
stress-ng: debug: [1473] sockpair: [1473] started (instance 1 on CPU 2)
stress-ng: debug: [1475] sockpair: [1475] exited (instance 3 on CPU 1)
stress-ng: debug: [1474] sockpair: [1474] exited (instance 2 on CPU 2)
stress-ng: debug: [1473] sockpair: [1473] exited (instance 1 on CPU 2)
stress-ng: debug: [1472] sockpair: [1472] exited (instance 0 on CPU 2)
stress-ng: debug: [1333] sockpair: [1472] terminated (success)
stress-ng: debug: [1333] sockpair: [1473] terminated (success)
stress-ng: debug: [1333] sockpair: [1474] terminated (success)
stress-ng: debug: [1333] sockpair: [1475] terminated (success)
stress-ng: metrc: [1333] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [1333] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [1333] sockpair 170751 35.04 2.02 37.43 4872.35 4327.95 28.14 1456
stress-ng: metrc: [1333] miscellaneous metrics:
stress-ng: metrc: [1333] sockpair 18460.94 socketpair calls sec (geometric mean of 4 instances)
stress-ng: metrc: [1333] sockpair 6.80 MB written per sec (geometric mean of 4 instances)
stress-ng: debug: [1333] metrics-check: all stressor metrics validated and sane
stress-ng: info: [1333] passed: 4: sockpair (4)
stress-ng: info: [1333] failed: 0
stress-ng: info: [1333] skipped: 0
stress-ng: info: [1333] metrics untrustworthy: 0
stress-ng: info: [1333] successful run completed in 35.28 secs
```

Even when on first runs after reboot, it is not consistent sometimes it takes 58 seconds sometimes 35 seconds.

Is there a good way to handle those for low performance/memory devices? With current situation of sockpair it is not predictable for us to know when this sockpair test will be finished at worst.

Do you have any suggestions that will help us get more information about this issue?

Best Regards,
Talha

description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :

This is an old version of stress-ng and stress-ng snap support is deprecated because the snap uses up too much system resources and causes issues such as these.

Please retry using a more recent version of stress-ng. The stress-ng PPA contains builds of the recent version on supported Ubuntu releases: https://launchpad.net/~colin-king/+archive/ubuntu/stress-ng

Changed in stress-ng:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Colin Ian King (colin-king)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.