Default VM overcommit sysctls in Ubuntu lead to unnecessary oom-killer invocation

Bug #1666683 reported by Mike Pontillo
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

On my system, running a couple of LXD containers and VMs (16 GB RAM, 16 GB swap) seems to cause the kernel oom-killer to be frequently triggered.

In order to try to resolve this, first I tried limiting the memory my containers were allowed to use, such as by using:

    lxc config set <container> limits.memory 1024GB

... and restarting the containers for good measure. However, this didn't resolve the problem.

After looking deeper into what might trigger the oom-killer even though I seemed to have plenty of memory free, I found out that the kernel VM overcommit can be tuned with the `vm.overcommit_memory` sysctl.

The default for value of `vm.overcommit_memory`, 0, uses a heuristic to determine whether or not requested memory is available. According to the `man 5 proc`, if the value is set to zero:

"""
    calls of mmap(2) with MAP_NORESERVE are not checked, and the
    default check is very weak, leading to the risk of getting a
    process "OOM-killed".
"""

Which seems to describe exactly my problem. However, upon setting this value to 2, many of my open programs immediately aborted with out-of-memory errors. This is because the default value for `vm.overcommit_ratio` only allows the usage of 50% of the system's total (memory + swap).

I then found the following answer on Server Fault:

    http://serverfault.com/a/510857/15268

The answers to this question seem to make a good case that the overcommit_ratio should be set to 100.

In summary, I think the following sysctl values should be the new defaults:

    vm.overcommit_memory = 2
    vm.overcommit_ratio = 100

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1666683

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Changing status to "Confirmed". I don't think there are any relevant logs for this issue. Here's some anecdotal evidence, though:

# dmesg | grep oom-killer
[1389379.248406] apt-mirror invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[1399428.772409] chrome invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=300
[1421778.653435] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[1464982.048253] sh invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[1479431.535969] postgres invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=-900
[1508936.131513] mount invoked oom-killer: gfp_mask=0x24040c0, order=3, oom_score_adj=0
[1526250.556039] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0

# cat /proc/uptime
1551384.55 11908654.03

So in the ~7 hours since changing these sysctls, the oom-killer hasn't run, whereas it was running regularly before that.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
tags: added: bot-stop-nagging
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Hm. Oddly enough, pulseaudio seemed to have problems for me until I set the overcommit ratio higher (to 150).

Revision history for this message
Seth Forshee (sforshee) wrote :

Keep in mind, overcommit considers *committed* address space, not used. So a process which creates a large anonymous mapping but doesn't touch any of that memory (so the address space is committed but no actual pages of RAM are assigned to the address space) will count against overcommit even though it consumes no RAM. Therefore overcommit_ratio=100 is actually pretty conservative, as it's common for processes to have some mappings which don't aren't backed by RAM (but could be).

Deciding exactly what overcommit ratio is appropriate generally is likely to be difficult, since every system has a different mix of processes running which may use mmaps in different ways.

One big advantage to overcommit_ratio=0 is that it allows the system to protect critical processes from OOM situations by adjusting those processes' oom_scroe_adj values (systemd allows this value to be specified in unit configuration files). With overcommit_ratio=0, once the limit is reached memory allocations and mmaps start to fail for all processes equally, which might result in critical processes terminating.

So I think overcommit_ratio=0 is better for a general use system. In situations where some other arrangement would work better the sysadmin can adjust the sysctl values.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

After a few days of uptime I still saw issues with the ratio set to 100. I'll give 0 a try.

Revision history for this message
Seth Forshee (sforshee) wrote :

Sorry, I mean overcommit_memory=0. overcommit_ratio=0 is likely to be a disaster.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.