[SRU][22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI setup

Bug #2020022 reported by Adrian Huang
262
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Jeff Lane 
Kinetic
Won't Fix
Undecided
Unassigned
Lunar
In Progress
Undecided
Jeff Lane 
Mantic
Fix Released
Undecided
Jeff Lane 

Bug Description

[Impact]
When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown:

[ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f
[ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff
[ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff
[ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
[ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
...

Additional info:
  * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server.

Debugging info and fix commit info:
  * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure.

[Fix]
  * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue.

Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel?

[Test Plan]

Reproduce Step
1.Disable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled

2.Install OS

3.Enable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled

4.Rebooting will reproduce this issue

[ Where problems could occur ]
* Lenovo SE350 server and Lenovo SR850 v2 server
* The regression leads to the boot failure (cannot boot info OS successfully).

[ Other Info ]
https://code.launchpad.net/~bladernr/ubuntu/+source/linux/+git/lunar/+ref/LP2020022

Revision history for this message
Adrian Huang (ahuang12) wrote :
information type: Public → Private Security
information type: Private Security → Private
summary: - OS cannot boot successfully when enabling VMD in UEFI setup
+ [22.04.2] OS cannot boot successfully when enabling VMD in UEFI setup
summary: - [22.04.2] OS cannot boot successfully when enabling VMD in UEFI setup
+ [22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI
+ setup
Revision history for this message
Adrian Huang (ahuang12) wrote : Re: [22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI setup
description: updated
Adrian Huang (ahuang12)
affects: ubuntu → linux-hwe-5.19 (Ubuntu)
Revision history for this message
Jeff Lane  (bladernr) wrote :

Kernels seen 5.19, 6.2 so far.

Can you also try 5.15 (22.04 GA) and 5.4 (20.04 GA) as both of those are certified on both 22.04 and 20.04

Changed in linux-hwe-5.19 (Ubuntu):
status: New → Incomplete
affects: linux-hwe-5.19 (Ubuntu) → linux (Ubuntu)
Revision history for this message
Adrian Huang (ahuang12) wrote :
Revision history for this message
Adrian Huang (ahuang12) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Adrian, two things:

Can you provide steps to recreate this using the SE350 (including whatever you're setting in BIOS) so I can see if I can provide a local sample with the failure?

Second:
20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing)

Which kernel tree is this commit in? I could not find it in mainline (unless it has a different mainline commit ID from being merged).

Revision history for this message
Adrian Huang (ahuang12) wrote :

Jeff,

[Reproduce Step]
1.Disable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled

2.Install OS

3.Enable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled

4.Rebooting will reproduce this issue

[Commit 20f3337d350c]
This commit is from Linus's tree (merged in 6.4-rc1): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/lib/memset_64.S?id=20f3337d350c4e1b4ac66d731fd4e98565bf6cc0

Jeff Lane  (bladernr)
Changed in linux (Ubuntu Kinetic):
status: New → Won't Fix
Jeff Lane  (bladernr)
Changed in linux (Ubuntu Lunar):
assignee: nobody → Jeff Lane  (bladernr)
Changed in linux (Ubuntu Mantic):
assignee: nobody → Jeff Lane  (bladernr)
Changed in linux (Ubuntu Lunar):
status: New → In Progress
Changed in linux (Ubuntu Mantic):
status: Incomplete → In Progress
Revision history for this message
Jeff Lane  (bladernr) wrote :

No need for Mantic, which already contains the patch. Only need to pull this back to Lunar.

Changed in linux (Ubuntu Mantic):
status: In Progress → Invalid
Jeff Lane  (bladernr)
Changed in linux (Ubuntu Mantic):
status: Invalid → Fix Released
Revision history for this message
Michael Reed (mreed8855) wrote :

I have created a test kernel, please test it and provide feedback.

https://people.canonical.com/~mreed/lenovo/lp_2020022_vmd/lunar/

description: updated
Revision history for this message
Michael Reed (mreed8855) wrote :

Adrian,

Can you add to the "Where problems could occur" and provide the regression risk?

summary: - [22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI
- setup
+ [SRU][22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in
+ UEFI setup
Adrian Huang (ahuang12)
information type: Private → Private Security
Adrian Huang (ahuang12)
information type: Private Security → Public Security
Adrian Huang (ahuang12)
description: updated
Revision history for this message
Adrian Huang (ahuang12) wrote :

The test kernel is still failed. Not sure if the patch is included correctly. Could you put the source deb package in your URL? I can check that.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Can you install one of the daily ISOs for mantic (23.10) and test to see if this is an issue? The patches you mention are included in the Mantic kernel already, so that one should not see the failure.

We're still trying to figure out how to get you a working 6.2 kernel.

Revision history for this message
Adrian Huang (ahuang12) wrote :

Confirmed that the kernel (v6.5) of 23.10 does not have the issue.

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.