PCI passthrough on AMD IOMMU fails with "VFIO_MAP_DMA failed: Invalid argument" for VMs with ~1 TiB RAM or more
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
qemu (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Wishlist
|
Mauricio Faria de Oliveira | ||
Focal |
Fix Released
|
Wishlist
|
Mauricio Faria de Oliveira | ||
Jammy |
Fix Released
|
Wishlist
|
Mauricio Faria de Oliveira | ||
Kinetic |
Fix Released
|
Wishlist
|
Mauricio Faria de Oliveira | ||
Lunar |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[ Impact ]
* PCI passthrough on systems with AMD IOMMU
for VMs with approximately 1 terabyte RAM
or more will fail with this error message
that is not trivial to diagnose to relate
to the root cause (AMD IOMMU limitation):
VFIO_MAP_DMA failed: Invalid argument.
failed to setup container for group ...:
memory listener initialization failed: Region pc.ram:
vfio_
* The usage of AMD-based systems with IOMMU
is increasingly common, and apparently VMs
with increasingly more RAM are also rising.
* The maximum memory (limitation) for QEMU VMs
with PCI passthorugh on AMD IOMMU systems is:
- machine 'pc': 1035264 MiB (1011 GiB)
- machine 'q35': 1034240 MiB (1010 GiB)
* This issue is resolved in QEMU 7.1 (lunar has
QEMU 7.2 in -proposed, currently) with a large
patch series that is not worth backporting for
non-too common cases, given regression risk.
* In order to improve the user experience on
this particular error, check for this case,
and provide the error message with a hint:
VFIO_MAP_DMA failed: Invalid argument. (hint: AMD IOMMU: reduce VM ram)
...
[ Test Plan ]
* Check for the error message/hint on QEMU VMs
in the command line and libvirt, on pc / q35,
with memory size at the limit values.
The only change must be the hint on errors.
Detailed steps in comment #1.
[ Regression Potential ]
* The code changes are contained within the
vfio_dma_map() function, and check for a
1) specific error code from the kernel
2) starting address match
3) length in excess of spec-defined limit
4) AMD CPU (for AMD IOMMU)
* Thus, theoretical regression potential is
restricted to PCI passthrough via VFIO,
and tooling that expects error messages
without the hint (should be unlikely to
be this strict on the error path).
[ Other Info ]
* Patch set:
https:/
* Git commits:
$ git describe --contains e5b6555fb8e8a91
v7.1.0-rc0~1
$ git show e5b6555fb8e8a91
...
i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type
i386/pc: relocate 4g start to 1T where applicable
i386/pc: bounds check phys-bits against max used GPA
i386/pc: factor out device_memory base/size to helper
i386/pc: handle unitialized mr in pc_get_
i386/pc: factor out cxl range start to helper
i386/pc: factor out cxl range end to helper
i386/pc: factor out above-4g end to an helper
i386/pc: pass pci_hole64_size to pc_memory_init()
i386/pc: create pci-host qdev prior to pc_memory_init()
hw/i386: add 4g boundary start to X86MachineState
...
[ Original Description ]
$ virsh start vm
error: Failed to start domain 'vm'
error: internal error: qemu unexpectedly closed the monitor:
... qemu-system-x86_64: -device vfio-pci,...: VFIO_MAP_DMA failed: Invalid argument
... qemu-system-x86_64: -device vfio-pci,...: vfio 0000:a6:00.0:
failed to setup container for group 128: memory listener initialization failed: Region pc.ram:
vfio_dma_
Changed in qemu (Ubuntu Lunar): | |
status: | New → Fix Committed |
Changed in qemu (Ubuntu Kinetic): | |
status: | New → Triaged |
importance: | Undecided → Medium |
importance: | Medium → Wishlist |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
Changed in qemu (Ubuntu Jammy): | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
Changed in qemu (Ubuntu Focal): | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
Changed in qemu (Ubuntu Bionic): | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
description: | updated |
description: | updated |
Test Steps
---
1) Create cloud-init data ISO to initialize cloud images:
cat >meta-data <<EOF
instance-id: iid-local01
local-hostname: qemu-vm
EOF
cat >user-data <<EOF
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock user-data meta-data
2) Download cloud image, and create larger disk image:
for RELEASE in kinetic jammy focal bionic; do /cloud- images. ubuntu. com/$RELEASE/ current/ $RELEASE- server- cloudimg- amd64.img
wget https:/
qemu-img create -f qcow2 -F qcow2 -b $RELEASE- server- cloudimg- amd64.img $RELEASE.qcow2 8G
done
Now, for each RELEASE:
3) Run QEMU to emulate an AMD IOMMU system, with an audio device to be used for PCI passthrough.
(We'll run QEMU inside it, to verify the changes, as it running in a bare-metal AMD system).
qemu-system-x86_64 \ qcow2,if= virtio \ init-data. iso,if= virtio, read-only= on,driver= raw \ net0,hostfwd= tcp:127. 0.0.1:2222- :22 \ net,netdev= net0 \
-accel kvm -machine q35 -smp 1 -m 4G \
-nodefaults -nographic -no-user-config \
-serial stdio \
\
-drive file=$RELEASE.
-drive file=cloud-
\
-netdev user,id=
-device virtio-
\
-cpu EPYC-v1 \
-device amd-iommu \
-device intel-hda
...
login: ubuntu
password: passw0rd
...
(watch out for ctrl-c)
or
ssh ubuntu@127.0.0.1 -p 2222 # passw0rd
...
4) We'll need qemu-system-x86_64 and virsh/libvirt
sudo apt update && sudo apt install --yes --no-install- recommends qemu-system libvirt- daemon- system
logout # login again
5) Check the emulated hardware (and reserved ranges just below 1TiB on all IOMMU groups)
$ grep 'AMD EPYC' /proc/cpuinfo
model name : AMD EPYC Processor
$ lspci | grep -i iommu
00:02.0 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 0010
$ grep reserved /sys/kernel/ iommu_groups/ */reserved_ regions iommu_groups/ 0/reserved_ regions: 0x000000fd00000 000 0x000000ffffffffff reserved iommu_groups/ 1/reserved_ regions: 0x000000fd00000 000 0x000000ffffffffff reserved iommu_groups/ 2/reserved_ regions: 0x000000fd00000 000 0x000000ffffffffff reserved iommu_groups/ 3/reserved_ regions: 0x000000fd00000 000 0x000000ffffffffff reserved iommu_groups/ 4/reserved_ regions: 0x000000fd00000 000 0x000000ffffffffff reserved
/sys/kernel/
/sys/kernel/
/sys/kernel/
/sys/kernel/
/sys/kernel/
6) Configure the audio device for PCI passthrough
$ lspci | grep -i audio FBM/FR/ FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)
00:03.0 Audio device: Intel Corporation 82801FB/
$ ls /sys/bus/ pci/devices/ 0000:00: 03.0/iommu_ group/devices/
0000:00:03.0
PCI=0000:00:03.0
sudo modprobe vfio-pci pci/devices/ $PCI/driver_ override
echo vfio-pci | sudo tee /sys/bus/
echo $PCI | sudo tee /sys/bus/ pci/devices/ $PCI/driver/ unbind 2>/dev/null; sleep 1 pci/drivers/ vfio-pci/ bind; sleep 1
echo $PCI | sudo tee /sys/bus/
echo 1 | sudo tee /sys/module/ vfio_iommu_ type1/parameter s/allow_ unsafe_ interrupts
7) Enable memory overcommit to allow early start of VM w/ 1TB RAM:
echo 1 | sudo tee /proc/sys/ vm/overcommit_ memory
8) Verify the error with memory size above limit:
$ sudo qemu-system-x86_64 -nographic -device vfio-pci, host=$PCI. ..