Random hangs after upgrade to 4.4.0-45, possibly cgroup related

Bug #1635891 reported by Pehr Söderman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Apologize in advance is anything is missing from this bug report.

After upgrading to 4.4.0-45 we have started to experience random kernel lockups. I have attached a sample from the syslog which I managed to catch after one of these events. In other cases, nothing was written to the syslog.

I notice that the cgroup system features here and there in the log, and the hanging server uses cgroups and namespaces heavily, so I suspect it is related. Sadly I don't know enough of the kernel to decipher the error I got further than that.

Another unusual thing we do on this server, which may possibly be related, is that we regularly clear the caches ("echo 3 > /proc/sys/vm/drop_caches")

/Pehr

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-45-generic 4.4.0-45.66
ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
Uname: Linux 4.4.0-45-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Oct 22 22:52 seq
 crw-rw---- 1 root audio 116, 33 Oct 22 22:52 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Sat Oct 22 23:37:59 2016
HibernationDevice: RESUME=/dev/mapper/template--16--vg-swap_1
InstallationDate: Installed on 2016-04-22 (183 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:

ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-45-generic root=/dev/mapper/template--16--vg-root ro cgroup_enable=memory swapaccount=1
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-45-generic N/A
 linux-backports-modules-4.4.0-45-generic N/A
 linux-firmware 1.157.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/17/2015
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/17/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Pehr Söderman (pehrs-7) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc2

Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete
Revision history for this message
Pehr Söderman (pehrs-7) wrote :

I have tested with v4.9-rc2, and been unable to reproduce the bug there. However, I haven't found any reliable way to reproduce the bug on the current kernel, so I am not completely certain that the bug is not present in v4.9-rc2.

For now, I have tagged the bug `kernel-fixed-upstream`, if this is incorrect in this situation, please correct the tag.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-fixed-upstream
Revision history for this message
Pehr Söderman (pehrs-7) wrote :

We just hit this bug again, so it is clearly not fixed yet. Attaching another syslog. It seems to be triggered by a combination of echo 3 > /proc/sys/vm/drop_caches and destroying cgroups, but I have yet to find a reliable way of reproducing it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.