| 2021-04-20 22:39:12 |
Paul Friel |
description |
Ever since the "ubuntu-bionic-18.04-amd64-server-20200729" EC2 Ubuntu AMI was released which has the "5.3.0-1032-aws" kernel we have been hitting a 100% repro memory leak that causes our app that is running under docker to be OOM killed.
The scenario is that we have an app running in a docker container and it occasionally catches a crash happening within itself and when that happens it creates another process which triggers a gdb dump of that parent app. Normally this works fine but under these specific kernels it causes the memory usage to grow and grow until it hits the maximum allowed memory for the container at which point the container is killed.
I have tested using several of the latest available Ubuntu AMIs including the latest "ubuntu-bionic-18.04-amd64-server-20210415" which has the "5.4.0-1045-aws" kernel and the bug still exists.
I also tested a bunch of the mainline kernels and found the fix was introduced for this memory leak in the v5.9-rc4 kernel (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.9-rc4/CHANGES).
Do you all have any idea if or when that set of changes will be backported into a supported kernel for Ubuntu 18.04 or 20.04?
Release we are running:
root@<redacted>:~# lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Docker / containerd.io versions:
- containerd.io: 1.4.4-1
- docker-ce: 5:20.10.5~3-0~ubuntu-bionic
Latest supported kernel I tried which still sees the memory leak:
root@us-east-1a-dev-devops03-reg-gs-i-04742b937b7628f05:~# apt-cache policy linux-aws
linux-aws:
Installed: 5.4.0.1045.27
Candidate: 5.4.0.1045.27
Version table:
*** 5.4.0.1045.27 500
500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
100 /var/lib/dpkg/status
4.15.0.1007.7 500
500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
Thanks,
Paul |
Ever since the "ubuntu-bionic-18.04-amd64-server-20200729" EC2 Ubuntu AMI was released which has the "5.3.0-1032-aws" kernel we have been hitting a 100% repro memory leak that causes our app that is running under docker to be OOM killed.
The scenario is that we have an app running in a docker container and it occasionally catches a crash happening within itself and when that happens it creates another process which triggers a gdb dump of that parent app. Normally this works fine but under these specific kernels it causes the memory usage to grow and grow until it hits the maximum allowed memory for the container at which point the container is killed.
I have tested using several of the latest available Ubuntu AMIs including the latest "ubuntu-bionic-18.04-amd64-server-20210415" which has the "5.4.0-1045-aws" kernel and the bug still exists.
I also tested a bunch of the mainline kernels and found the fix was introduced for this memory leak in the v5.9-rc4 kernel (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.9-rc4/CHANGES).
Do you all have any idea if or when that set of changes will be backported into a supported kernel for Ubuntu 18.04 or 20.04?
Release we are running:
root@<redacted>:~# lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Docker / containerd.io versions:
- containerd.io: 1.4.4-1
- docker-ce: 5:20.10.5~3-0~ubuntu-bionic
Latest supported kernel I tried which still sees the memory leak:
root@hostname:~# apt-cache policy linux-aws
linux-aws:
Installed: 5.4.0.1045.27
Candidate: 5.4.0.1045.27
Version table:
*** 5.4.0.1045.27 500
500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
100 /var/lib/dpkg/status
4.15.0.1007.7 500
500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
Thanks,
Paul |
|