Memory Stress-ng Test Failing on a System with 881.46GB of RAM

Bug #2082743 reported by Michael Reed
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
Undecided
Unassigned

Bug Description

I have a system with a large amount of memory that is failing with the stress-ng memory test. It did pass when the amount of memory for the test was reduced. Typically we like stress the system with the maximum amount available. Once the memory was added back in, the same failures occurred. We tried increasing the base timeout and that passed all of the stressors except malloc. I am unsure if this is an actual bug where the system resources cannot keep up or are we being too aggressive with the testcase.

CPU: AMD EPYC 9754 128-Core Processor (Bergamo)
Mem: 881 GB
22.04.4 5.15 kernel

Steps to Reproduce
sudo add-apt-repository ppa:checkbox-dev/stable
sudo apt install canonical-certification-server
sudo /usr/lib/checkbox-provider-base/bin/stress_ng_test.py memory

Michael Reed (mreed8855)
description: updated
Revision history for this message
Michael Reed (mreed8855) wrote (last edit ):

Initially mlock, mremap, shm-sysv, vm-splice, numa, malloc failed

03 Sep 07:34: Running stress-ng mlock stressor for 300 seconds...
** stress-ng timed out and was forcefully terminated

03 Sep 11:50: Running stress-ng mremap stressor for 300 seconds...
** stress-ng timed out and was forcefully terminated

03 Sep 12:00: Running stress-ng shm-sysv stressor for 300 seconds...
** stress-ng timed out and was forcefully terminated

03 Sep 12:10: Running stress-ng vm-splice stressor for 300 seconds...
** stress-ng timed out and was forcefully terminated

03 Sep 12:20: Running stress-ng numa stressor for 300 seconds...
** stress-ng timed out and was forcefully terminated

-03 Sep 12:30: Running stress-ng malloc stressor for 9115 seconds...
** stress-ng timed out and was forcefully terminated

However, after doubling, tripling and quadrupling the 300 second timeout malloc is the only stressors with an issue.

Revision history for this message
Michael Reed (mreed8855) wrote :

After increasing the timeout

02 Sep 12:30: Running stress-ng malloc stressor for 9115 seconds...
** stress-ng exited with code 3
stress-ng: info: [964793] setting to a 2 hours, 31 mins, 54 secs run per stressor
stress-ng: info: [964793] dispatching hogs: 512 malloc
stress-ng: info: [965806] malloc: failed to create counter lock. skipping stressor
stress-ng: info: [965809] malloc: failed to create counter lock. skipping stressor
stress-ng: info: [965811] malloc: failed to create counter lock. skipping stressor
stress-ng: info: [965810] malloc: failed to create counter lock. skipping stressor
stress-ng: info: [965812] malloc: failed to create counter lock. skipping stressor
stress-ng: warn: [964793] malloc: [965809] aborted early, out of system resources
stress-ng: warn: [964793] malloc: [965810] aborted early, out of system resources
stress-ng: warn: [964793] malloc: [965811] aborted early, out of system resources
stress-ng: warn: [964793] malloc: [965812] aborted early, out of system resources
stress-ng: info: [964793] skipped: 4: malloc (4)
stress-ng: info: [964793] passed: 507: malloc (507)
stress-ng: info: [964793] failed: 0
stress-ng: info: [964793] metrics untrustworthy: 0
stress-ng: info: [964793] successful run completed in 2 hours, 31 mins, 54.52 secs

Revision history for this message
Michael Reed (mreed8855) wrote :
Revision history for this message
Michael Reed (mreed8855) wrote :
Revision history for this message
Michael Reed (mreed8855) wrote :
Revision history for this message
Michael Reed (mreed8855) wrote :
Revision history for this message
Michael Reed (mreed8855) wrote :

This is what a normal output for malloc should be.

26 Sep 07:23: Running stress-ng malloc stressor for 1555 seconds...
stress-ng: info: [684586] setting to a 25 mins, 54 secs run per stressor
stress-ng: info: [684586] dispatching hogs: 192 malloc
stress-ng: info: [684586] skipped: 0
stress-ng: info: [684586] passed: 191: malloc (191)
stress-ng: info: [684586] failed: 0
stress-ng: info: [684586] metrics untrustworthy: 0
stress-ng: info: [684586] successful run completed in 25 mins, 54.12 secs

Michael Reed (mreed8855)
description: updated
description: updated
Michael Reed (mreed8855)
description: updated
Michael Reed (mreed8855)
description: updated
description: updated
Revision history for this message
Michael Reed (mreed8855) wrote :

An older version of stress-ng was used.

stress-ng - 0.18.01-0202407131132ubuntu22.04.1

Colin suggested using an updated version
https://launchpad.net/~colin-king/+archive/ubuntu/stress-ng

sudo add-apt-repository ppa:colin-king/stress-ng
sudo apt update
sudo apt install stress-ng

Verify the new version
$ sudo apt-cache policy stress-ng
stress-ng:
  Installed: 0.18.05-1~n0
  Candidate: 0.18.05-1~n0
  Version table:

$ sudo apt-cache policy stress-ng
stress-ng:
  Installed: 0.18.05-1~n0
  Candidate: 0.18.05-1~n0
  Version table:

Revision history for this message
Michael Reed (mreed8855) wrote :

Just an fyi, the test did pass when memory was reduced to 384GB on the older version.

Revision history for this message
Michael Reed (mreed8855) wrote :

This issue has been resolved in stress-ng version 18.06. https://github.com/ColinIanKing/stress-ng/issues/433

Changed in stress-ng:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.