Ubuntu
openvswitch package

Bug #2039768
Comment #0

Comment 0 for bug 2039768

Revision history for this message

Marcin Wilk (wilkmarcin) wrote on 2023-10-19:

This is actually a known upstream bug [1] however it affects OpenStack deployments with the OVS l2. Therefore I am opening also this LP to provide more context/findings.
Massive update stream of the kernel routing tables causes (the rtnetlink notifications) the ovs-vswitchd process to burn CPU. This is turn causes that the ovs-vswitchd it can't keep up the the updates causing OpenStack VM connectivity issues. I have seen this problem on the customer's production compute nodes (Focal/Ussuri) and this was affecting VM connectivity.

There is the ovs mailing list discussion thread [2] that explains the reasons and mechanics of this behaviour. Even though [1] and [2] mention BGB full table update as a trigger for the faulty situation, I was able to reproduce the issue without BGB at all. Just updating the kernel routing table in the loop.

The simplest reproducer on pure Ubuntu 20.04.6 LTS, kernel 5.4.0-162-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 1 vcpu:
1. sudo apt install openvswitch-switch
2. for i in {10..25}; do for k in {0..20}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP on local network>; done; done; done
3. wait a few minutes and keep watching the /var/log/openvswitch/ovs-vswitchd.log

And the reproducer steps on the OpenStack nova compute host (Focal/Ussuri), Ubuntu 20.04.4 LTS, kernel 5.4.0-164-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 4 vcpu
1. insert thousands of routing entries on a compute host (this took 90 minutes before I cancelled it):
for i in {10..25}; do for k in {0..255}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP from local net>; done; done; done
2. on a second session observe the ovs-vswitchd is consuming a lots of CPU cycles, also watch the /var/log/openvswitch/ovs-vswitchd.log for the messages I provided above.
3. sometimes, an attempt to schedule a new VM to this compute node fails because it can't add the VM's port to the bridge.
os server create --image cirros --boot-from-volume 3 --flavor m1.tiny --key-name testkey --network private --host juju-5ef7f4-octavia-20.cloud.sts --os-compute-api-version 2.74 cirr111

nova-compute.log contains:
: libvirt.libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int
2023-10-18 14:28:31.840 2612 ERROR nova.virt.libvirt.driver [req-7a6b767b-c7e0-4671-a1b9-48c0b68d397e 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] [instance: 8bbdc758-83c8-43d6-a3ab-d620c21c53a4] Failed to start libvirt guest: libvirt.libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int

[1] https://github.com/openvswitch/ovs-issues/issues/185
[2] https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052092.html

Ubuntuopenvswitch package

Comment 0 for bug 2039768

Ubuntu
openvswitch package