qemu-system-amd64 max cpus is too low for latest processors

Bug #2012763 reported by Jeff Lane 
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
New
Undecided
Unassigned
Jammy
New
Undecided
Unassigned
Lunar
Invalid
Undecided
Unassigned
Mantic
Won't Fix
Undecided
Unassigned
Noble
New
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Critical
Sergio Durigan Junior
Jammy
Fix Released
Undecided
Sergio Durigan Junior
Lunar
Invalid
Undecided
Sergio Durigan Junior
Mantic
Fix Released
Critical
Sergio Durigan Junior
Noble
Fix Released
Critical
Sergio Durigan Junior

Bug Description

[ Impact ]

QEMU users on Ubuntu Jammy/Mantic who try to spawn a VM with more than 288 vCPUs will not be able to do so, because the machine types available don't support such scenario. The following error will happen:

qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-jammy' is 288

[ Test Plan ]

Ideally, the test should be performed in a machine with more than 288 physical CPUs available. However, due to the difficulty in finding such systems, it is possible to emulate the usage of more than 288 vCPUs.

On a Jammy/Mantic machine, making sure to adjust the machine type accordingly, you can do:

$ sudo qemu-system-x86_64 -M pc-q35-jammy,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

You will notice that the command will fail, as expected.

The proposed fix is to create a new machine type on Jammy/Mantic, in order to minimize the possibility of regressions in deployments using the existing machine types. This new type is named pc-{q35,i440fx}-{jammy,mantic}-maxcpus. When doing the test, make sure to provide this new machine type (as part of the "-M" argument).

[ Where problems could occur ]

As explained above, a new machine type was created in order to minimize the possibility of regressions. As such, the existing "pc-{q35,i440fx}-{jammy,mantic}" machine types should continue to work as before, without any change.

[ Original Description ]

During testing of an AMD Genoa CPU, it was discovered that qemu-system-amd64 doesn't support enough cpus.

The specific error the tester received was:

qemu-system-x86_64: Invalid SMP CPUs 384. The max supported by machine 'pc-q35-7.1' is 288

Looking at the sournce that seems to be an easy fix at first glance:

https://github.com/qemu/qemu/blob/master/hw/i386/pc_q35.c
372 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
373 m->max_cpus = 288;

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Jeff,
thanks for the request, that is a known limit that is being worked on by various upstream projects.

The limit of 288 [1] was deliberately chosen for being the limits of testing at the time and limits of xapic [2].

There recently ~5.15 (which is jammy and later) has been a lift of thelimit on the kernel side [3][4], but that is only the first step.

You also need other components to be ready, like the smbios 3.0 entry point which is in seabios 1.16 (Kinetic and later) and edk2 (there it is rather old and should be ok for longer).

The work / discussions in qemu is ongoing as you might see in [5], but those haven't completed or landed yet - it is work in progress that has to complete and stabilize. You see here that would be a post 7.2 change anyway.

There are more things in the stack which might need patching e.g. in libvirt or even higher parts, I haven't checked those yet - but overall this isn't a "change a number and done" change :-/

I hope that the upstream projects can continue their great work and complete it all, but right now despite looking like a simple number there is not enough confidence for all the implications yet to just bump up that number.

[1]: https://gitlab.com/qemu-project/qemu/-/commit/00d0f9fd6602a27b204f672ef5bc8e69736c7ff1
[2]: https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02266.html
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=074c82c8f7cf8a46c3b81965f122599e3a133450
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da1bfd52b930726288d58f066bd668df9ce15260
[5]: https://<email address hidden>/

Changed in qemu (Ubuntu):
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Jeff Lane  (bladernr) wrote :

Thanks Christian. The tester reporting it was from one of the OEM labs during cert testing on the newer CPUs... I don't think this is really any sort of show-stopper, just one of those things noticed in the output that looked concerning to them (They report in anything that looks out of the ordinary).

So in the context of the details you provided I think it's safe on our end then to just know it's going to be a limitation and then wait for the various bits to update naturally.

Revision history for this message
Jeff Lane  (bladernr) wrote :

This causes QEMU to be unusable on systems with more than 288 cores, notably recent AMD CPUs and is affecting certifications

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm,
"unusable" - really. Isn't it just limiting you to have each guest at max 288 vcpus?
Or did I miss that, due to that, it won't work to create any guest at all?

Revision history for this message
Rod Smith (rodsmith) wrote :

Our test is failing to run, not simply running with fewer than the requested number of cores. From the test output (which includes test script output and formatting, not just QEMU output):

DEBUG:root:Start VM:
ERROR:root:Command lxc start testbed returned a code of 1
ERROR:root: STDOUT:
ERROR:root: STDERR: Error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd=3 -- /snap/lxd/24322/bin/qemu-system-x86_64 -S -name testbed -uuid e149a6e6-ce67-4b5b-ab56-94c740521c0e -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/testbed/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/testbed/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/testbed/qemu.pid -D /var/snap/lxd/common/lxd/logs/testbed/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero value 1
Try `lxc info --show-log testbed` for more info

I know that may not be the error logs or output you need to fully debug this, but it's what I have on hand. (The system in question belongs to a Canonical partner.) We can work to produce more logs or output, but it would be helpful to know what you need.

Revision history for this message
Jeff Lane  (bladernr) wrote (last edit ):

Also, I did some digging, we use LXD to kick off KVMs and this exists int he LXD docs:
https://linuxcontainers.org/lxd/docs/stable-4.0/instances/

limits.cpu string - yes - Number or range of CPUs to expose to the instance (defaults to 1 CPU for VMs)

I had hoped that the issue was that kicking off that single VM was somehow going crazy and attaching to every CPU core.

BUT it looks like LXD defaults to 1 CPU for VMs, meaning it's not coming anywhere near close to that limit of 288. If that's the case that means QEMU itself is unsable on these new high-core-count CPUs

We can try to explicitly use limts.cpu with LXD but if that doesn't work, we need some help sorting out exactly what's happening here and how to work around it.

Revision history for this message
Jeff Lane  (bladernr) wrote :

I launched a VM via LXC using qemu and verified that it does only create / attach a single CPU core:
root@maximum-porpoise:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 165
Model name: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz

Note my machine has a single 10 core CPU with HT enabled:
Architecture: x86_64
  CPU op-mode(s): 32-bit, 64-bit
  Address sizes: 39 bits physical, 48 bits virtual
  Byte Order: Little Endian
CPU(s): 20
  On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
  Model name: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz
    CPU family: 6
    Model: 165
    Thread(s) per core: 2
    Core(s) per socket: 10

So I do suspect that qemu itself simply fails on systems with more than 288 cores regardless of the config of the VM...

The servers that are failing have dual 96 core AMD EPYC 9654 96-Core Processor, which, with hyperthreading provides 384 CPU cores to the system.

I've gone back and asked them to disable hyperthreading to get the CPU count down to 192 cores to see if qemu works then or not... if the same test succeeds with that config, I think that would certainly confirm the issue.

Revision history for this message
Jeff Lane  (bladernr) wrote (last edit ):

So just to update/reconfirm something, qemu-system-amd64 fails on systems with more than 288 cores, regardless of how you've configured the KVM Guest.

We have had them test both the default (which defaults to 1 vCPU), and by explicitly setting the config to a single vCPU. We have NEVER launched a KVM guest that was handed more than 1 CPU core, as we have always used the default config for simplicity.

Currently, this causes certification tests on systems with high end AMD CPUs to fail, as those have far more than 288 cores.

We do not have a system currently in house to test this with, but we can get our OEM partner to test patched versions of packages that address this. I have also raised this directly with AMD.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Oh wow, Sorry but I didn't read that in between the lines of the report yet.
I expected that to only block extra large guests which is where we would have waited for upstream.

Indeed guests up to the size limit should work (almost) no matter how many CPUs the system has.
Could you please work with Sergio (assigned now) to provide him access to the system so that he can have a look and potential debugging in the real thing.

tags: added: server-todo
Changed in qemu (Ubuntu):
importance: Wishlist → Critical
assignee: nobody → Sergio Durigan Junior (sergiodj)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> So just to update/reconfirm something, qemu-system-amd64 fails on systems with more than
> 288 cores, regardless of how you've configured the KVM Guest.

This really should be a guest size limit, I wonder if the system is picking up any default like "but it could be 384 via hotplugging" that one needs to configure.

@Jeff
Could you - in preparation - please provide the most simple libvirt-xml or qemu commandline that you expect to work but fails when the host count it >288.

> I launched a VM via LXC using qemu and verified that it does only create / attach a single CPU core

They also just use qemu, so that shouldn't be different...
Have you done that test
a) on a different system to check how many CPUs it configures by default?
b) on the 384 cpu system and you are saying "it works with the LXD snaps qemu, but not with the qemu in the Archive"?

If it was (a) that test isn't sufficient as qemu has the concept current and max cpus (available for hot-plug). And the Limit counts against the max-cpus.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I checked LXD myself on my laptop

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm
$ lxc exec j-vm lscpu | grep '^CPU(s):'
CPU(s): 1
=> Yes it is one by default, but it just doesn't give any arguments at all
$ ps axlf | grep qemu | grep j-vm
7 999 2014958 1 20 0 1776840 480184 - Sl ? 0:33 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 6e58b1c8-9484-4131-b4f4-d61e32556d28 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

And at first it looks like LXD does limit things via cpusets only
https://linuxcontainers.org/lxd/docs/stable-4.0/instances/#cpu-limits

Even with that set explicitly it behaves the same:

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm -c limits.cpu=1
Creating j-vm
Starting j-vm
$ lxc exec j-vm lscpu | grep '^CPU(s):'
CPU(s): 1
$ ps axlf | grep qemu | grep j-vm
7 999 2033243 1 20 0 1777348 477060 - Sl ? 0:12 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 4c469ad8-136e-422a-9366-3503f072cddd -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm -c limits.cpu=2
Creating j-vm
Starting j-vm
$ lxc exec j-vm lscpu | grep '^CPU(s):'
CPU(s): 2
$ ps axlf | grep qemu | grep j-vm
7 999 2036838 1 20 0 1984268 481300 - Sl ? 0:15 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 73ed3b5b-c1f9-4d8f-bed3-dc763a4329e2 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For a start to rule out a real bug...
And to rule out any other smartness let us start a very very small qemu that does almost nothing. Does the following stumble over the 384 cpu error as well?

$ sudo qemu-system-x86_64 -smp cpus=1,maxcpus=1 -enable-kvm -net none -m 512M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

That will load a kernel from your host disk, after kernel load it will fail missing a root disk but that is fine. This way we would quickly know if really "everything fails" (bug) or if there might be just a argument needed in your way to spawn guests (configuration).

Pleas let us know if this works

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

AFAIC - If you insist/depend on LXD - you need to go all-in and use raw.qemu to add commandline parameters ignoring LXDs intentional opinionated use:

$ lxc launch ubuntu-minimal-daily:j j-vm --ephemeral --vm -c raw.qemu="-smp cpus=1,maxcpus=1"
Creating j-vm
Starting j-vm
$ ps axlf | grep qemu | grep j-vm | grep smp
7 999 2043048 1 20 0 1777460 323388 - Sl ? 0:07 /snap/lxd/24918/bin/qemu-system-x86_64 -S -name j-vm -uuid 9346be46-67fa-4931-ba2d-529cbc268190 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/j-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/j-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/j-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/j-vm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd -smp cpus=1,maxcpus=1

P.S. if only cpu is set maxcpu is the same and if nowing else is there cpu is implied. So I know that raw.qemu="-smp 1" does the same, but I wanted to be explicit while debugging here.

Revision history for this message
Jeff Lane  (bladernr) wrote :

There seems to have been some movement on this upstream:

https://lore.kernel<email address hidden>/T/#m4f61669a283a87623e4b8ce484e65c1bbaa76935

The exact commands we use typically are:
lxc init ubuntu:22.04 testbed --vm
# lxc config set testbed limits.cpu 1
lxc start testbed

and assume defaults on everything. (the commented config line was added in later as an experiement)

I don't have direct access to a system with that many cores, but I'll ask them to try all your suggestions and update the bug with results.

Revision history for this message
Mark Coskey (mcoskey) wrote :

On our XD225v AMD server with 2P 9754 Bergamo 128c (512 vcpus) on Ubuntu 22.04.2LTS, I ran the command from comment #12, see attached output comment12.txt.

Revision history for this message
Mark Coskey (mcoskey) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

So this was apparently fixed in qemu 8.1.0:

commit e0001297eb2f8569e950e55dbda8ad686e4155fb
Author: Suravee Suthikulpanit <email address hidden>
Date: Wed Jun 7 15:57:17 2023 -0500

    pc: q35: Bump max_cpus to 1024

    Since KVM_MAX_VCPUS is currently defined to 1024 for x86 as shown in
    arch/x86/include/asm/kvm_host.h, update QEMU limits to the same number.

    In case KVM could not support the specified number of vcpus, QEMU would
    return the following error message:

      qemu-system-x86_64: kvm_init_vcpu: kvm_get_vcpu failed (xxx): Invalid argument

    Also, keep max_cpus at 288 for machine version 8.0 and older.

    Cc: Igor Mammedov <email address hidden>
    Cc: Daniel P. Berrangé <email address hidden>
    Cc: Michael S. Tsirkin <email address hidden>
    Cc: Julia Suvorova <email address hidden>
    Reviewed-by: Igor Mammedov <email address hidden>
    Signed-off-by: Suravee Suthikulpanit <email address hidden>
    Message-Id: <email address hidden>
    Reviewed-by: Michael S. Tsirkin <email address hidden>
    Signed-off-by: Michael S. Tsirkin <email address hidden>
    Reviewed-by: Daniel P. Berrangé <email address hidden>

$ git tag --contains e0001297eb2
v8.1.0
v8.1.0-rc0
v8.1.0-rc1
v8.1.0-rc2
v8.1.0-rc3
v8.1.0-rc4

Looking at rmadison, mantic only has 8.0.4:

 qemu | 1:8.0.4+dfsg-1ubuntu1 | mantic | source

Would it be possible to:

A: get mantic bumped to 8.1.0
B: work on getting this back to Jammy to unblock 22.04 certs? (well, for now we are just accepting failed VM tests because these larger CPUs have no support in Jammy due to the qemu-system-x86_64 max_cpu limitation.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the update.

It's not possible to bump QEMU to 8.1.0 on Mantic anymore (we're already on Feature Freeze), but it is possible to backport the patch above. It's also possible to backport this patch to Jammy as part of an SRU.

I'm assigning the bug to myself, but I'll likely only have time to work on this bug next week. Also, it's possible that I'll need your help to test the fix.

Thanks.

Changed in qemu (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Changed in qemu (Ubuntu Lunar):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Jeff et al,

I worked to create new machine types for Jammy which support up to 1024 CPUs, which is exactly what the upstream patch pointed to by Jeff does. We decided to implement this via new machine types because, as Christian said, it is not entirely clear what kind of side effects this (apparent simple) setting can have, and also (perhaps most importantly) because it is much easier to justify SRUing such change if it's as contained as possible.

You can find a PPA with the proposed change here:

https://launchpad.net/~sergiodj/+archive/ubuntu/qemu

The qemu version is 1:6.2+dfsg-2ubuntu6.16~ppa2. The new machine types are named:

pc-i440fx-jammy-maxcpus Ubuntu 22.04 PC (i440FX + PIIX, maxcpus=1024, 1996)
pc-i440fx-jammy-hpb-maxcpus Ubuntu 22.04 PC (i440FX + PIIX +host-phys-bits=true, maxcpus=1024, 1996)
pc-q35-jammy-maxcpus Ubuntu 22.04 PC (Q35 + ICH9, maxcpus=1024, 2009)
pc-q35-jammy-hpb-maxcpus Ubuntu 22.04 PC (Q35 + ICH9, +host-phys-bits=true, maxcpus=1024, 2009)

Would it be possible for you to give this a try and let me know if it works? I still don't have access to a machine with that number of CPUs, so the amount of testing I can do is limited.

Thanks.

Revision history for this message
Robie Basak (racb) wrote :

Untagging server-todo since this is awaiting feedback. It can be re-added through triage if needed.

tags: removed: server-todo
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [Bug 2012763] Re: qemu-system-amd64 max cpus is too low for latest processors

I reached to one of the server OEMS who has access to a failing system (AMD
Genoa 2S with 384 total cores, IIRC) to test the patched qemu packages.
Hopefully they'll respond with results in the next week or so

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

On Wednesday, October 25 2023, Jeff Lane  wrote:

> I reached to one of the server OEMS who has access to a failing system (AMD
> Genoa 2S with 384 total cores, IIRC) to test the patched qemu packages.
> Hopefully they'll respond with results in the next week or so

Thanks, Jeff.

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Amy Gou (goujm1) wrote (last edit ):

Hi, Sergio and Jeff, after the upgrade of QEMU and tried again, we got the same errors as before, please help refer to attached screenshots for detail.

The configuration of SUT:

Product name: ThinkSystem SR675 V3, which is based on AMD Genoa Platform

CPU: 2x AMD EPYC 9754 128-Core Processor, total 256 Cores and 512 threads.

Mem: 8x 128G DIMMs DDR5

Errors:

# lxc start testbed
error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd-3 -- /snap/Ixd/24322bin/gem-system-xX86-64 -S -name testbed -uuid 55914767-a334-4acb-aac1-9c544b05497e -daemonize -Cpu host,hy passthrough -nographic -serlal chardey:console -nodefaults -no-use-config -sandbox on,absolete=deny,elevateprivileges=allow,spaun=allow,resourcecontrol=deny -readconfig /var/snap/Ixd/cmon/1xd/los/testhed/emu.conf -spice unix-on.disale-ticket ine-on.addr-/var/snan/1xd/common/lxd/1s/testhed/oemwu.snice -nidfe/yar/snan/1xd/common/xd/ logs/testhed/qemu.nid -0 /yar/snap/1xd/common/1xd/ogs/testhed/cemu.logjanonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero valuery Ixc info --show-log testbed for more info
buntu@SR675V3-2204:~$ Ix info --show-log testbedName: testbed

# lxc info --show-log testbed
Name: tesetbed
Status: STOPPED
Type: virtual-machine
Architecture: x86_64

Log:

qemu-system-x86_64: Invalid SMP CPUs 512. The max CPUs supported by machine 'pc-q35-7.1' is 288.

Revision history for this message
Amy Gou (goujm1) wrote (last edit ):

screenshot1

Revision history for this message
Amy Gou (goujm1) wrote :

screenshot2

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Amy,

Thanks for the feedback. I see from the snapshots you posted that you're using the 'pc-q35-7.1' machine type when launching the VM. As explained on comment #19, you have to use the new machine types in order to enable support for more than 288 vCPUs. In this case, you can use the machine type 'pc-q35-jammy-maxcpus'.

On top of that, you're using LXD to create the VM which means that it'll use its own copy of qemu-system-x86_64 (/snap/lxd/current/bin/qemu-system-x86_64), and not the system one. Can you try invoking qemu directly with the machine type mentioned above?

I'm almost sure it won't work out of the box, and there may be more patches that need to be backported in order to make this work on qemu 6.2, but I'd like to take a look at the output you get.

Thank you.

Revision history for this message
Amy Gou (goujm1) wrote :

Hi, Sergio, could you help to share me the way to switch machine type from 'pc-q35-7.1' to 'pc-q35-jammy-maxcpus'?

Many thanks.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Hi Amy,

I believe the option to add to quemu-system-x86_64 is "-M"

qemu-system-x86_64 -M help

will output the list of all the machine types you can use, and I believe you can specify that exact one like this:

qemu-system-x68_64 -M pc-q35-jammy-maxcpus

Now, that said, I don't believe you can test this using our test-virtualization launcher, nor the virtualizaation.py test script as the test script uses LXD and AFAIK there's no way to specify the machine type to the lxc command. I'll follow up if I can find something different, but for now, I think just using the -M option with qemu-system-x86_64 will work.

Revision history for this message
Jeff Lane  (bladernr) wrote :

I've now added LXD to this as today I learned that LXD (which is what we use for launching VMs) doesn't use Ubuntu qemu but rather pulls directly from upstream when building the snap. So patching Ubuntu will help for those uses, but won't fix the broken certification test as that will never pick up the patched Ubuntu qemu. So we'll hae to sort out some sort of solution there as well.

Revision history for this message
JUNG GYUM KIM (junggyumkim) wrote :

Dear Jeff Lane,

I can't run the "qemu-system-x68_64 -M pc-q35-jammy-maxcpus" command due to my system doesn't have the "pc-q35-jammy-maxcpus".

Can you provide it?

ubuntu@xd295v:~$ qemu-system-x86_64 -M help | grep pc-q35-jammy
ubuntu-q35 Ubuntu 22.04 PC (Q35 + ICH9, 2009) (alias of pc-q35-jammy)
pc-q35-jammy Ubuntu 22.04 PC (Q35 + ICH9, 2009)
pc-q35-jammy-hpb Ubuntu 22.04 PC (Q35 + ICH9, +host-phys-bits=true, 2009)

Thank you.
Jack Kim

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for replying to Amy, Jeff.

@Amy, you can specify the machine by using the -M option, as Jeff said. You can try running a quick&dirty test by doing what Christian said above:

$ sudo qemu-system-x86_64 -smp cpus=1,maxcpus=1 -enable-kvm -net none -m 512M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

Note that you have to adjust the -smp parameter accordingly.

@JUNG, you need to install the qemu package from https://launchpad.net/~sergiodj/+archive/ubuntu/qemu.

The qemu version is 1:6.2+dfsg-2ubuntu6.16~ppa2.

Revision history for this message
anil (anilchabba) wrote :

I have used 8.1.2 and I am still getting error max 255 cpu can be added

root@us-ash-r1-c1-m2:~# qemu-system-x86_64 -version
QEMU emulator version 8.1.2 (Debian 1:8.1.2+ds-1)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
root@us-ash-r1-c1-m2:~#

Unable to complete install: 'unsupported configuration: more than 255 vCPUs require extended interrupt mode enabled on the iommu device'

Revision history for this message
Amy Gou (goujm1) wrote :

I modified the machine type to 'pc-q35-jammy-maxcpus' and revised the maxcpus to 512, I could see the vm could start, but also had some info like 'smpboot: native_cpu_up: bad cpu 297', could you help to look into the attached log for analysis?

The command I used:

$ sudo qemu-system-x86_64 -M pc-q35-jammy-maxcpus -smp cpus=512,maxcpus=512 -enable-kvm -net none -m 128G -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

Revision history for this message
Amy Gou (goujm1) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Hi Amy...

When you specify -smp cpus=512,maxcpus=512 you're creating a single VM with 512 vCPUs from the start. Could you try this but maybe set it to something smaller like `-smp cpus=2,maxcpus=64` to see what happens?

Or maybe also `-smp cpus-1,maxcpus=1` which is the default when launching these (single vCPU per VM)?

Revision history for this message
anil (anilchabba) wrote :

./src/qemu/qemu_validate.c:34:#define QEMU_MAX_VCPUS_WITHOUT_EIM 255

I think we need to change this value in libvirtd also to 1024

Revision history for this message
anil (anilchabba) wrote :

2023-11-02 14:26:06.060 7 ERROR nova.compute.manager [instance: 5d88d297-8c98-4dbc-b92f-b6a7f3ec882d] libvirt.libvirtError: unsupported configuration: more than 255 vCPUs require extended interrupt mode enabled on the iommu device

Revision history for this message
anil (anilchabba) wrote :

after fixing libvirtd able to launch the instance but getting following error
[ 3.694170] smpboot: native_kick_ap: bad cpu 477
[ 3.698119] smpboot: native_kick_ap: bad cpu 478
[ 3.702113] smpboot: native_kick_ap: bad cpu 479
[ 3.706216] smpboot: native_kick_ap: bad cpu 480
[ 3.710113] smpboot: native_kick_ap: bad cpu 481
[ 3.714111] smpboot: native_kick_ap: bad cpu 482
[ 3.718173] smpboot: native_kick_ap: bad cpu 483
[ 3.722113] smpboot: native_kick_ap: bad cpu 484
[ 3.726108] smpboot: native_kick_ap: bad cpu 485
[ 3.730225] smpboot: native_kick_ap: bad cpu 486
[ 3.734115] smpboot: native_kick_ap: bad cpu 487
[ 3.738110] smpboot: native_kick_ap: bad cpu 488
[ 3.742136] smpboot: native_kick_ap: bad cpu 489
[ 3.749814] smpboot: native_kick_ap: bad cpu 490
[ 3.754200] smpboot: native_kick_ap: bad cpu 491
[ 3.758115] smpboot: native_kick_ap: bad cpu 492
[ 3.762117] smpboot: native_kick_ap: bad cpu 493
[ 3.766202] smpboot: native_kick_ap: bad cpu 494
[ 3.770116] smpboot: native_kick_ap: bad cpu 495
[ 3.774108] smpboot: native_kick_ap: bad cpu 496
[ 3.778179] smpboot: native_kick_ap: bad cpu 497
[ 3.782117] smpboot: native_kick_ap: bad cpu 498
[ 3.786113] smpboot: native_kick_ap: bad cpu 499

also mpstat dint show any cpu after 255
root@test500:~# mpstat -P 255
Linux 6.5.7-vdx (test500) 11/03/2023 _x86_64_ (500 CPU)

01:05:56 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:05:56 AM 255 0.00 0.00 0.03 0.01 0.00 0.00 1.04 0.00 0.00 98.92
root@test500:~# mpstat -P 256
Linux 6.5.7-vdx (test500) 11/03/2023 _x86_64_ (500 CPU)

01:05:59 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
root@test500:~# mpstat -P 257
Linux 6.5.7-vdx (test500) 11/03/2023 _x86_64_ (500 CPU)

01:06:01 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
root@test500:~#

Revision history for this message
anil (anilchabba) wrote :

root@test500:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 500
On-line CPU(s) list: 0-255
Off-line CPU(s) list: 256-499
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 256
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 160
Model name: AMD EPYC 9754 128-Core Processor

Revision history for this message
anil (anilchabba) wrote :

it says Off-line CPU(s) list: 256-499

Revision history for this message
Amy Gou (goujm1) wrote :

Hi, Jeff and all, update my log, the parameters changed like below:

& sudo qemu-system-x86_64 -M pc-q35-jammy-maxcpus -smp cpus=2,maxcpus=64 -enable-kvm -net none -m 128G -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

Thanks.

Revision history for this message
Amy Gou (goujm1) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

I got this separately from Lenovo:
We tested the qemu 6.2+dfsg-2ubuntu6 , creating a vm with only 1vcpu, and the results are in the attachment

Revision history for this message
Jeff Lane  (bladernr) wrote :

Here's a further test
We tested two scenarios with qemu Debian 1:6.2+dfsg-2ubuntu6.16~ppa2:

1 qemu-system-x86_64 -smp cpus=1,maxcpus=1 the results are in Attachment 2 (qemu_1cpu_1maxcpu.log)

2 qemu-system-x86_64 -M pc-i440fx-jammy-maxcpus -smp cpus=300,maxcpus=300 the results are in Attachment 3(qemu_300cpu_300maxcpu.log)

Revision history for this message
Jeff Lane  (bladernr) wrote :

And there's the log with cpu=300,max_cpu=300

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the feedback.

I was kind of expecting this change to *not* be enough, so that confirms my suspicions (unfortunately). I'll have to dive deeper and take a better look at what's going on.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Hi Sergio, do you or anyone else on the team have time to look into this? It is currently causing problems certification on current systems wtih high-core-count CPUs and will continue to become an issue in the future as new generations of CPUs debut this year.

And arguably we'd need this resolved for at least Jammy and Noble as once Noble launches those two will be the active certification targets.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Lunar will be EOL before this is resolved.

Changed in qemu (Ubuntu Lunar):
status: New → Invalid
Revision history for this message
Jeff Lane  (bladernr) wrote :

Lunar will be EOL before this is resolved

Changed in lxd (Ubuntu Lunar):
status: New → Invalid
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Jeff,

I'll get back to this bug next week. Sorry about the delay, a bunch of other things are popping up on qemu-land :-/.

I'll take another look at the logs you posted and see what's going on. The fact that the backported patch isn't enough to bump the number of max CPUs is a bit concerning, TBH. I don't remember seeing any other upstream patch touching this area, but I will take another look. I'll keep you updated.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi,

Just an update before I call it a day. I was able to partially reproduce the failure with "-smp cpus=300,maxcpus=300". I say partially because the logs posted by Jeff clearly indicate that there has been some sort of out-of-memory scenario, which didn't happen here for some reason. But the most interesting part of the log (the "smpboot: native_cpu_up: bad cpu NNN" messages) happened here.

I found the following interesting discussions upstream:

https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03562.html
https://<email address hidden>/

So yeah, there's really more to the problem than we initially thought. I will start my day tomorrow backporting some of the patches mentioned in the threads above and seeing how far I can get. There may likely be other patches involved, but it's hard to say now.

Something else that I'm planning to do is building a version of qemu from Noble (currently at 8.2.0, which is pretty recent) with a new machine type changing the maxcpus thing (like I did for Jammy). @Jeff, I'll put it in a PPA tomorrow and let you know, in case you can give it a try. Hopefully it will have much better support for what we're trying to achieve here.

Revision history for this message
Michael Tokarev (mjt+launchpad-tls) wrote :

Looking at the severity of this bug report. Critical, sure it is?..

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

On Thursday, January 18 2024, Michael Tokarev wrote:

> Looking at the severity of this bug report. Critical, sure it is?..

The severity is set to Critical more because of the importance of having
this feature in Jammy/Noble. It also reflects the fact that it's at the
top of my TODO list.

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Jeff,

The situation is not looking very good for Jammy right now. The amount and complexity of the upstream patches we would need to backport seems to be borderline unrealistic, and I haven't finished investigating things.

It looks better for Noble, which already has qemu 8.2 in -proposed.

Christian is unavailable this week and the next, but I will sync with him and try to decide the next steps for this problem.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Jeff et al,

After exchanging some emails with a QEMU developer (David Woodhouse; kudos to him), I tried a different incantation that seemed to work here. Unfortunately, I don't have a bare-metal Jammy machine ready to test this, so I used a Mantic one. Here is the command line I used:

$ sudo qemu-system-x86_64 -M pc-q35-mantic-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

The trick here is the use of the following options:

-M pc-q35-mantic-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on

This will effectively remap the host's IOMMU into the guest, which (as David explained) is a required step to make QEMU work well with more than 288 vCPUs. It's also necessary to provide more memory to the VM (hence the "-m 4096M" option), because the kernel needs it in order to proper allocate enough data structures to represent the 300 vCPUs we're asking (otherwise, you will see a kernel panic complaining that there's not enough memory).

I would like to ask if you guys can give this a try on a *Mantic* system that *actually* has more than 288 CPUs. My test system "only" has 12 CPUs...

The PPA where you can find this new QEMU build is the same as before:

https://launchpad.net/~sergiodj/+archive/ubuntu/qemu

The QEMU package version is 1:8.0.4+dfsg-1ubuntu3.23.10.3~ppa2.

When launching the VM, you can use one of the following machine types:

pc-i440fx-mantic-maxcpus Ubuntu 23.10 PC (i440FX + PIIX, maxcpus=1024, 1996)
pc-q35-mantic-maxcpus Ubuntu 23.10 PC (Q35 + ICH9, maxcpus=1024, 2009)

Please let me know how it goes. If you can, please also test it using a real disk image.

I believe this is good news; it seems that we won't need to patch Mantic/Noble QEMUs. I still need to check Jammy's.

Thanks.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Following tests accomplished by doing the following:

sudo add-apt-repository -y ppa:checkbox-dev/stable
sudo apt-get -y install canonical-certification-server
/usr/lib/checkbox-provider-base/bin/virtualization.py --debug lxdvm

Jammy (GA 5.15): as a baseline, the test script fails and the qemu log shows this:
qemu-system-x86_64: Invalid SMP CPUs 384. The max CPUs supported by machine 'pc-q35-7.1' is 288

Jammy (HWE 6.5)
qemu-system-x86_64: Invalid SMP CPUs 384. The max CPUs supported by machine 'pc-q35-7.1' is 288

Mantic (GA 6.5):
VM Boots and the test is successful as-is.
Also tried the PPA version and that worked too.

Jammy was done with the stock qemu 1:6.2+dfsg-2ubuntu6.16

I tried cheating and installing the mantic packages on jammy but that had dependency issues and I got stuck in a nightmare of dependency tracking so I gave up and installed mantic to test that one.

Tried your command on mantic after adding your PPA and it worked fine.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks, Jeff.

This is good news for Mantic and Noble.

As for Jammy, I understand you were having trouble deploying it. Maybe we could try deploying Focal and then manually upgrading to Jammy instead?

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi there,

I just wanted to give a quick update. I still have to perform a "real world" test using the server with more than 300 CPUs, but meanwhile I was able to test the Jammy QEMU from my PPA using a 12-core machine and it seems like we won't need to backport any patches there either. Here's the command line I used:

$ sudo qemu-system-x86_64 -M pc-q35-jammy-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"

Upon booting and getting dropped into the initramfs shell, I did a "cat /proc/cpuinfo" and verified that the 300 vCPUs were correctly listed there.

So yeah, good news :-).

description: updated
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote (last edit ):

Hi Sergio, is there anything we can help with testing here?

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Jeff,

Thanks for the offer. Actually there is!

- If you could test Noble and double/triple check that we don't need to do anything else there in order to support maxcpus, that'd be great. Just use the "ubuntu" machine type and check if QEMU starts correct with more than 288 vCPUs.

- Once the Jammy/Mantic SRUs are accepted, we will need to perform a verification for both uploads. I can do it my side, but it would be very helpful if you could also verify the uploads on your side.

Thanks.

tags: added: server-todo
Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (5.9 KiB)

So after we talked on MM, I think maybe you and Simon have been testing
this instead, so for now is do I need to do this?

Thanks
Jeff

On Tue, Mar 19, 2024 at 3:10 PM Sergio Durigan Junior <
<email address hidden>> wrote:

> Hi Jeff,
>
> Thanks for the offer. Actually there is!
>
> - If you could test Noble and double/triple check that we don't need to
> do anything else there in order to support maxcpus, that'd be great.
> Just use the "ubuntu" machine type and check if QEMU starts correct with
> more than 288 vCPUs.
>
> - Once the Jammy/Mantic SRUs are accepted, we will need to perform a
> verification for both uploads. I can do it my side, but it would be
> very helpful if you could also verify the uploads on your side.
>
> Thanks.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2012763
>
> Title:
> qemu-system-amd64 max cpus is too low for latest processors
>
> Status in lxd package in Ubuntu:
> New
> Status in qemu package in Ubuntu:
> Confirmed
> Status in lxd source package in Jammy:
> New
> Status in qemu source package in Jammy:
> New
> Status in lxd source package in Lunar:
> Invalid
> Status in qemu source package in Lunar:
> Invalid
> Status in lxd source package in Mantic:
> New
> Status in qemu source package in Mantic:
> Confirmed
> Status in lxd source package in Noble:
> New
> Status in qemu source package in Noble:
> Confirmed
>
> Bug description:
> [ Impact ]
>
> QEMU users on Ubuntu Jammy/Mantic who try to spawn a VM with more than
> 288 vCPUs will not be able to do so, because the machine types
> available don't support such scenario. The following error will
> happen:
>
> qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by
> machine 'pc-q35-jammy' is 288
>
> [ Test Plan ]
>
> Ideally, the test should be performed in a machine with more than 288
> physical CPUs available. However, due to the difficulty in finding
> such systems, it is possible to emulate the usage of more than 288
> vCPUs.
>
> On a Jammy/Mantic machine, making sure to adjust the machine type
> accordingly, you can do:
>
> $ sudo qemu-system-x86_64 -M pc-q35-jammy,accel=kvm,kernel-
> irqchip=split -device intel-iommu,intremap=on -smp
> cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel
> /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0
> -mon chardev=char0,mode=readline -serial chardev:char0 -append
> "console=ttyS0"
>
> You will notice that the command will fail, as expected.
>
> The proposed fix is to create a new machine type on Jammy/Mantic, in
> order to minimize the possibility of regressions in deployments using
> the existing machine types. This new type is named
> pc-{q35,i440fx}-{jammy,mantic}-maxcpus. When doing the test, make
> sure to provide this new machine type (as part of the "-M" argument).
>
> [ Where problems could occur ]
>
> As explained above, a new machine type was created in order to
> minimize the possibility of regressions. As such, the existing
> "pc-{q35,i440fx}-{jammy,mantic}" machine types should cont...

Read more...

Changed in qemu (Ubuntu Jammy):
status: New → In Progress
Changed in qemu (Ubuntu Mantic):
status: Confirmed → In Progress
Changed in qemu (Ubuntu Noble):
status: Confirmed → Fix Committed
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Jeff, or anyone else affected,

Accepted qemu into mantic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:8.0.4+dfsg-1ubuntu3.23.10.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-mantic to verification-done-mantic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-mantic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Mantic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-mantic
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Performing the verification on Mantic.

First, reproduce the problem.

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Version table:
 *** 1:8.0.4+dfsg-1ubuntu3.23.10.3 500
        500 http://archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.0.4+dfsg-1ubuntu3.23.10.2 500
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages

# qemu-system-x86_64 -M pc-q35-jammy,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-jammy' is 288

# qemu-system-x86_64 -M pc-q35-mantic,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-mantic' is 288

Now, verifying that the package from mantic-proposed fixes the issue:

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.4
  Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.4
  Version table:
 *** 1:8.0.4+dfsg-1ubuntu3.23.10.4 100
        100 http://archive.ubuntu.com/ubuntu mantic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.0.4+dfsg-1ubuntu3.23.10.3 500
        500 http://archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3.23.10.2 500
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages

# qemu-system-x86_64 -M pc-q35-jammy-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
< all worked as expected >

# qemu-system-x86_64 -M pc-q35-mantic-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-mantic-maxcpus' is 288

As can be seen above, there is a problem with the new pc-q35-mantic-maxcpus machine type. As such, I am tagging this bug as verification-mantic-failed and will upload a fix for the issue.

tags: added: verification-failed-mantic
removed: verification-needed verification-needed-mantic
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Fixed package uploaded to Mantic: qemu_8.0.4+dfsg-1ubuntu3.23.10.5_source.changes

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Jeff, or anyone else affected,

Accepted qemu into mantic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:8.0.4+dfsg-1ubuntu3.23.10.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-mantic to verification-done-mantic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-mantic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-mantic
removed: verification-failed-mantic
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Download full text (3.4 KiB)

Performing the verification on Mantic.

First, verifying that we can reproduce the problem.

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Version table:
 *** 1:8.0.4+dfsg-1ubuntu3.23.10.3 500
        500 http://archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.0.4+dfsg-1ubuntu3.23.10.2 500
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages

# qemu-system-x86_64 -M pc-q35-jammy,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-jammy' is 288

# qemu-system-x86_64 -M pc-q35-mantic,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-mantic' is 288

Now, updating the package and verifying that the version from -proposed fixes the issue:

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.5
  Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.5
  Version table:
 *** 1:8.0.4+dfsg-1ubuntu3.23.10.5 100
        100 http://archive.ubuntu.com/ubuntu mantic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.0.4+dfsg-1ubuntu3.23.10.3 500
        500 http://archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3.23.10.2 500
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages

# qemu-system-x86_64 -M pc-q35-jammy-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kern
el /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: warning: Number of SMP cpus requested (300) exceeds the recommended cpus supported by KVM (12)
qemu-system-x86_64: warning: Number of hotpluggable cpus requested (300) exceeds the recommended cpus supported by KVM (12)
...
<no QEMU error>

# qemu-system-x86_64 -M pc-q35-mantic-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kern
el /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: warning: Number of SMP cpus requested (300) ex...

Read more...

tags: added: verification-done verification-done-mantic
removed: verification-needed verification-needed-mantic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:8.0.4+dfsg-1ubuntu3.23.10.5)

All autopkgtests for the newly accepted qemu (1:8.0.4+dfsg-1ubuntu3.23.10.5) for mantic have finished running.
The following regressions have been reported in tests triggered by the package:

livecd-rootfs/23.10.57 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/mantic/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:8.2.1+ds-1ubuntu8

---------------
qemu (1:8.2.1+ds-1ubuntu8) noble; urgency=medium

  * d/p/u/lp2012763-maxcpus-too-low.patch: Actually set the max_cpus
    property of the new Mantic machine types. (LP: #2012763)

 -- Sergio Durigan Junior <email address hidden> Mon, 25 Mar 2024 14:58:39 -0400

Changed in qemu (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Accepting this into jammy-proposed is pending verification of https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2046439 which is in jammy-proposed already. Either that, or you do a new upload to jammy-unapproved covering both bugs in the changes file, but that will require then a new verification of https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2046439

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you, Andreas.

I will wait for bug #2046439 to clear -proposed.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:8.0.4+dfsg-1ubuntu3.23.10.5

---------------
qemu (1:8.0.4+dfsg-1ubuntu3.23.10.5) mantic; urgency=medium

  * d/p/u/lp2012763-maxcpus-too-low.patch: Actually set the max_cpus
    property of the new Mantic machine types. (LP: #2012763)

qemu (1:8.0.4+dfsg-1ubuntu3.23.10.4) mantic; urgency=medium

  * d/p/u/lp2012763-maxcpus-too-low.patch: Bump max_cpus to 1024 on
    amd64. (LP: #2012763)

 -- Sergio Durigan Junior <email address hidden> Mon, 25 Mar 2024 14:54:06 -0400

Changed in qemu (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Jeff, or anyone else affected,

Accepted qemu into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:6.2+dfsg-2ubuntu6.19 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
removed: verification-done
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.19)

All autopkgtests for the newly accepted qemu (1:6.2+dfsg-2ubuntu6.19) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

cinder/2:20.3.1-0ubuntu1.1 (amd64)
systemd/249.11-0ubuntu3.12 (arm64, armhf, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Performing the verification on Jammy.

First, verifying that we can reproduce the problem.

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:6.2+dfsg-2ubuntu6.18
  Candidate: 1:6.2+dfsg-2ubuntu6.18
  Version table:
 *** 1:6.2+dfsg-2ubuntu6.18 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.2+dfsg-2ubuntu6.16 500
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     1:6.2+dfsg-2ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

# qemu-system-x86_64 -M pc-q35-jammy,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
qemu-system-x86_64: Invalid SMP CPUs 300. The max CPUs supported by machine 'pc-q35-jammy' is 288

Now, updating the package and verifying that the version from -proposed fixes the issue:

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:6.2+dfsg-2ubuntu6.19
  Candidate: 1:6.2+dfsg-2ubuntu6.19
  Version table:
 *** 1:6.2+dfsg-2ubuntu6.19 500
        500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.2+dfsg-2ubuntu6.18 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
     1:6.2+dfsg-2ubuntu6.16 500
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     1:6.2+dfsg-2ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

# qemu-system-x86_64 -M pc-q35-jammy-maxcpus,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp cpus=300,maxcpus=300 -enable-kvm -net none -m 4096M -nographic -kernel /boot/vmlinuz -initrd /boot/initrd.img -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -append "console=ttyS0"
...
<no QEMU error>

This concludes the verification on Jammy.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:6.2+dfsg-2ubuntu6.19

---------------
qemu (1:6.2+dfsg-2ubuntu6.19) jammy; urgency=medium

  * d/p/u/lp2012763-maxcpus-too-low.patch: Bump max_cpus to 1024 on
    amd64. (LP: #2012763)

 -- Sergio Durigan Junior <email address hidden> Mon, 18 Mar 2024 16:38:25 -0400

Changed in qemu (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Michael Reed (mreed8855) wrote :

We have a certification submission on Jammy that still hits this issue. In this case lxd virtual machines are hitting this bug.

Version
qemu-system-x86 1:6.2+dfsg-2ubuntu6.19

Chip is a dual socket Epyc 9734. 112 cores 224 threads, 448

https://certification.canonical.com/hardware/202403-33531/submission/367206/

ERROR:root:Command lxc start testbed returned a code of 1
ERROR:root: STDOUT:
ERROR:root: STDERR: Error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd=3 fd=4 -- /snap/lxd/27037/bin/qemu-system-x86_64 -S -name testbed -uuid b6e33a92-bd24-4217-80a8-f2971626c7b3 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/testbed/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/testbed/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/testbed/qemu.pid -D /var/snap/lxd/common/lxd/logs/testbed/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : exit status 1
Try `lxc info --show-log testbed` for more info

Changed in qemu (Ubuntu Jammy):
status: Fix Released → In Progress
Revision history for this message
Simon Déziel (sdeziel) wrote :

@Michael, could you provide which LXD version you are running? The LXD snap rev you are using (27037) doesn't seem to be the latest available and we, in theory, have fixed the issue in LXD 5.0/stable so maybe the fix is just a refresh away.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the reply, Simon.

I was about to say exactly the same thing. To the extent of my knowledge, the issue has been (at least partially) addressed on LXD, so it should be possible to launch VMs with more than 288 vCPUs with it.

Either way, this bug has been fixed in QEMU so I am going to set its status back to Fix Released.

Thanks.

Changed in qemu (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Michael Reed (mreed8855) wrote :

Hi Simon

My apologies for the alarm, I will have the team that submitted this issue refresh the snap and re-run the test.

         VERSION REVISION
lxd 5.0.3-ffb17cf 27037

Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 23.10 (Mantic Minotaur) has reached end of life, so this bug will not be fixed for that specific release.

Changed in lxd (Ubuntu Mantic):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.