igb transmit queue times out

Bug #1581777 reported by Basic
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned

Bug Description

Using an Intel I210AT built-in PCI-E NIC passed through from the ESXi 6 host to an Ubuntu 16.04 server virtual machine. Network is unusable due to constant "transmit queue 0 timed out" errors.

Issue is NOT present on a 14.04.4 machine running "3.19.0-59-generic #65~14.04.1-Ubuntu SMP Tue Apr 19 18:57:09 UTC 2016" kernel.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-22-generic 4.4.0-22.39
ProcVersionSignature: Ubuntu 4.4.0-22.39-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 14 12:09 seq
 crw-rw---- 1 root audio 116, 33 May 14 12:09 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
Date: Sat May 14 12:22:44 2016
HibernationDevice: RESUME=UUID=c31589d1-6ec7-471b-9073-180b6f2886f3
InstallationDate: Installed on 2016-05-13 (0 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
 Bus 002 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: VMware, Inc. VMware7,1
PciMultimedia:

ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-22-generic root=UUID=1c9de470-43c0-4955-80ab-85497792601e ro noplymouth crashkernel=384M-:128M
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-22-generic N/A
 linux-backports-modules-4.4.0-22-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/25/2015
dmi.bios.vendor: VMware, Inc.
dmi.bios.version: VMW71.00V.0.B64.1506250318
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnVMware,Inc.:bvrVMW71.00V.0.B64.1506250318:bd06/25/2015:svnVMware,Inc.:pnVMware7,1:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware7,1
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Basic (basicxp) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Basic (basicxp) wrote :

Tested with latest mainline kernel (linux-image-4.6.0-040600rc7-generic), problem persists, appropriate tag added.

tags: added: kernel-bug-exists-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
status: Incomplete → Triaged
Revision history for this message
Basic (basicxp) wrote :
tags: added: kernel-bug-reported-upstream
Revision history for this message
penalvch (penalvch) wrote :

Foster "Forst" Snowhill, the next step is to fully commit bisect from kernel 3.19 to 4.4 in order to identify the last good kernel commit, followed immediately by the first bad one. This will allow for a more expedited analysis of the root cause of your issue. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ?

Please note, finding adjacent kernel versions is not fully commit bisecting.

After the offending commit (not kernel version) has been identified, then please mark this report Status Confirmed.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: kernel-bug-exists-upstream-4.6-rc7 needs-bisect regression-release
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Basic (basicxp) wrote :

52f518a3a7c2f80551a38d38be28bc9f335e713c is the first bad commit

Full bisect log attached.

A couple of notes:

1. Often the bug is not reproduced the first time the system is booted with a given kernel version. Subsequent boots reproduce the bug.

2. In order to get bootable kernels from commit 9dda1658a9bd450d65da5153a2427955785d17c2 and onwards during bisecting I had to manually apply commit 425be5679fd292a3c36cb1fe423086708a99f11a. Otherwise the system gets stuck on "Loading initial ramdisk".

Revision history for this message
Basic (basicxp) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream-4.7-rc7
removed: kernel-bug-exists-upstream-4.6-rc7 needs-bisect
Revision history for this message
Basic (basicxp) wrote :

Also, the bug is present in 4.7-rc7.

penalvch (penalvch)
tags: added: bisect-done
Revision history for this message
penalvch (penalvch) wrote :

Foster "Forst" Snowhill, the issue you are reporting is an upstream one. Could you please report this problem following the instructions verbatim at https://wiki.ubuntu.com/Bugs/Upstream/kernel to the appropriate mailing list (TO Jiang Liu, and Thomas Gleixner CC linux-kernel)?

Please provide a direct URL to your post to the mailing list when it becomes available so that it may be tracked.

Thank you for your understanding.

Changed in linux (Ubuntu):
importance: Medium → High
status: Confirmed → Triaged
Revision history for this message
Basic (basicxp) wrote :

Reported to Thomas Gleixner and linux-kernel: https://lkml.org/lkml/2016/7/23/93

Jiang Liu is no longer reachable under the address specified in MAINTAINERS: "Recipient address rejected: User unknown in virtual mailbox table".

Revision history for this message
Basic (basicxp) wrote :

This seems to be a problem in ESXi, it was promised to be fixed in 6.0 Update 3 which hasn't been released as of yet.

However there is now a workaround patch that will make it to 4.4 and 4.7 stable trees:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

queue-4.4/genirq-msi-make-sure-pci-msis-are-activated-early.patch
queue-4.7/genirq-msi-make-sure-pci-msis-are-activated-early.patch

Revision history for this message
Basic (basicxp) wrote :

The patch is now upstream in 4.4.20 and 4.7.3, both released on 7th of September.

From what I know, it's not in the Ubuntu kernels yet.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.