VM boots slowly with large-BAR GPU Passthrough due to pci/probe.c redundancy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Noble |
Fix Committed
|
Medium
|
Mitchell Augustin | ||
Oracular |
Fix Committed
|
Medium
|
Mitchell Augustin |
Bug Description
SRU Justification:
[ Impact ]
VM guests that have large-BAR GPUs passed through to them will take 2x as long to initialize all device BARs without this patch
[ Test Plan ]
I verified that this patch applies cleanly to the Noble kernel
and resolves the bug on DGX H100 and DGX A100. I observed no regressions.
This can be verified on any machine with a sufficiently large BAR and the
capability to pass through to a VM using vfio.
To verify no regressions, I applied this patch to the guest kernel, then
rebooted and confirmed that:
1. The measured PCI initialization time on boot was ~50% of the unmodified kernel
2. Relevant parts of /proc/iomem mappings, the PCI init section of dmesg output, and lspci -vv output remained unchanged between the system with the unmodified kernel and with the patched kernel
3. The Nvidia driver still successfully loaded and was shown via nvidia-smi after the patch was applied
[ Fix ]
Roughly half of the time consuming device configuration options invoked during
the PCI probe function can be eliminated by rearranging the memory and I/O disable/enable
calls such that they only occur per-device rather than per-BAR. This is what the upstream
patch does, and it results in roughly half the excess initialization time being eliminated
reliably during VM boot.
[ Where problems could occur ]
I do not expect any regressions. The only callers of ABIs changed by this patch are also adjusted within this patch, and the functional change only removes entirely redundant calls to disable/enable PCI memory/IO.
[ Additional Context ]
Upstream patch: https://<email address hidden>/
Upstream bug report: https:/
Changed in linux (Ubuntu): | |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
status: | New → In Progress |
Changed in linux (Ubuntu Noble): | |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Invalid |
Changed in linux (Ubuntu Noble): | |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
Changed in linux (Ubuntu): | |
assignee: | Mitchell Augustin (mitchellaugustin) → nobody |
Changed in linux (Ubuntu Oracular): | |
status: | New → In Progress |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
Changed in linux (Ubuntu Noble): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Oracular): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Noble): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Oracular): | |
importance: | Undecided → Medium |
Upstream patch submitted to kernel-team list with subject [SRU][N/O][PATCH 0/1] PCI: Batch BAR sizing operations