stress-ng: fail: [1379606] vm: detected 1694364648734976 bit errors while stressing memory
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Stress-ng |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Thanks as always for this fantastic tool. It continues to be invaluable for us. (You can add bug 2058557 to your list of issues found btw!)
Recently all of our arm64 certification regression tests have began to fail in the memory vm test - I've attached a log below. The failure is independent of Ubuntu release/kernel variant/SOC, so I suspect a test regression. There is nothing interesting in dmesg during this test on any system.
The failures seem to be correlated with the certification PPA updating stress-ng to version 0.17.06-
The last success before this was with 0.17.06-
I have no idea how to determine what commit hashes these releases correlate to, other than downloading the source package and comparing.
This looks like a promising potential fix:
commit 1d444cb8c76159e
Author: Colin Ian King <email address hidden>
Date: Mon Mar 25 19:21:59 2024 +0000
stress_
I verified that 0.17.06-
I verified that 0.17.06-
Our next set of tests should pick up a version with this fix, so I'll let you know if it appears to resolve this issue.
15 Mar 07:10: Running stress-ng vm stressor for 5315 seconds...
** stress-ng exited with code 2
stress-ng: info: [1379604] setting to a 1 hour, 28 mins, 35 secs run per stressor
stress-ng: info: [1379604] dispatching hogs: 256 vm
stress-ng: fail: [1379606] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379665] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379699] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379748] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379769] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379762] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379843] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379849] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379695] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379729] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379746] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379631] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379619] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379837] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379705] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379766] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379675] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379687] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379821] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379622] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379691] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379749] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379706] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379783] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379611] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379648] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379605] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379681] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: error: [1379604] vm: [1379605] terminated with an error, exit status=2 (stressor failed)
stress-ng: fail: [1379753] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379664] vm: detected 1694364648734976 bit errors while stressing memory
[...]
stress-ng: fail: [1379609] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: error: [1379604] vm: [1379609] terminated with an error, exit status=2 (stressor failed)
stress-ng: error: [1379604] vm: [1379610] terminated with an error, exit status=2 (stressor failed)
stress-ng: error: [1379604] vm: [1379611] terminated with an error, exit status=2 (stressor failed)
[...]
stress-ng: error: [1379604] vm: [1379859] terminated with an error, exit status=2 (stressor failed)
stress-ng: error: [1379604] vm: [1379860] terminated with an error, exit status=2 (stressor failed)
stress-ng: info: [1379604] skipped: 0
stress-ng: info: [1379604] passed: 0
stress-ng: info: [1379604] failed: 255: vm (255)
stress-ng: info: [1379604] metrics untrustworthy: 0
stress-ng: info: [1379604] unsuccessful run completed in 1 hour, 28 mins, 36.82 secs
The last failure we saw was 3 days ago, running 0.17.06- 0~202403251447~ ubuntuXX. YY.Z.
We've seen 6 successes since then, the first one was using 0.17.06- 0~202403261450~ ubuntuXX. YY.Z.
I'll mark this fix released!