Writeback not flushing to disk in 4.15.0-137-generic and above
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-signed-hwe (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hi!
We've come across some interesting behaviour in kernel 4.15.0-
After booting a fresh Ubuntu 16.04 instance on AWS, we replace the AWS kernel with "linux-
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=3600 --time_based --end_fsync=1
It does't matter whether fio is run against the boot disk or an attached secondary disk. After stopping fio we notice that some pages are stuck in "writeback" and are apparently not flushing to disk:
# lsb_release -rd
Description: Ubuntu 16.04.7 LTS
Release: 16.04
# cat /proc/vmstat | grep "nr_writeback "
nr_writeback 80
# cat /proc/meminfo | grep Writeback:
Writeback: 320 kB
This doesn't clear, not even days later. Running more fio only increases the amount of writeback pages.
Downgrading the kernel to 4.15.0-
Kernels 4.15.0-137-generic and above took down our Ceph cluster, because it seems that when the amount of "writeback" reaches the buffer ceiling of "dirty_bytes", all subsequent writes to the disk are incredibly slow. This is from an idle production system (not on AWS) running 16.04 with kernel 4.15.0-139-generic:
# lsb_release -rd
Description: Ubuntu 16.04.4 LTS
Release: 16.04
# cat /proc/sys/
629145600
# cat /proc/sys/
314572800
# cat /proc/meminfo | grep Writeback:
Writeback: 572108 kB
# dd if=/dev/zero of=/test bs=1M count=10; rm /test
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 126.529 s, 82.9 kB/s
Could there be a bug in kernel 4.15.0-137-generic and above?
Thank you!
Kind regards,
Christoph Dwertmann
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.15.0-140-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.30
Architecture: amd64
Date: Sun Apr 4 03:39:25 2021
Ec2AMI: ami-041e1cc8f4c
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: c5ad.xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
I'd like to add that this bug also affects 18.04 LTS (Bionic) as it uses the same kernel.