qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Committed
|
Medium
|
Matthew Ruffell | ||
Focal |
Fix Released
|
Medium
|
Matthew Ruffell | ||
Groovy |
Fix Released
|
Medium
|
Matthew Ruffell | ||
Hirsute |
Won't Fix
|
Medium
|
Matthew Ruffell |
Bug Description
BugLink: https:/
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb4
Author: Manish Chopra <email address hidden>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https:/
This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
Note, this SRU isn't targeted for Bionic due to tx csum offload support only landing in 5.0 and onward, meaning the 4.15 kernel still works even without this patch. Because of this, Bionic can pick the patch up naturally from upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https:/
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel.
CVE References
affects: | ubuntu → linux (Ubuntu) |
description: | updated |
Changed in linux (Ubuntu Focal): | |
status: | New → In Progress |
Changed in linux (Ubuntu Groovy): | |
status: | New → In Progress |
Changed in linux (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Groovy): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Focal): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in linux (Ubuntu Groovy): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
summary: |
- Ubuntu kernel 5.x QL41xxx NIC (qede driver) Kubernetes internal DNS - failure + qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting + IPIP tx csum offload |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Hirsute): | |
status: | Confirmed → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Matthew Ruffell (mruffell) |
description: | updated |
Changed in linux (Ubuntu Groovy): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Hirsute): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1909062
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.