Ubuntu
openvswitch package

Massive stream of kernel routing table updates causes ovs-vswitchd process to burn CPU

Bug #2039768 reported by Marcin Wilk on 2023-10-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	openvswitch (Ubuntu)	New	Undecided	Unassigned

Bug Description

This is actually a known upstream bug [1] however it affects OpenStack deployments with the OVS l2. Therefore I am opening also this LP bug to provide more context/findings.
Massive update stream of the kernel routing tables causes (the rtnetlink notifications) the ovs-vswitchd process to burn CPU. This in turn causes that the ovs-vswitchd can't keep up the the updates causing OpenStack VM connectivity issues. I have seen this problem on the customer's production compute nodes (Focal/Ussuri) and this was affecting VM connectivity.

There is the ovs mailing list discussion thread [2] that explains the reasons and mechanics of this behaviour. Even though [1] and [2] mention BGB full table update as a trigger for the faulty situation, I was able to reproduce the issue without BGB at all. Just updating the kernel routing table in the loop.

The simplest reproducer on pure Ubuntu 20.04.6 LTS, kernel 5.4.0-162-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 1 vcpu:
1. sudo apt install openvswitch-switch
2. for i in {10..25}; do for k in {0..20}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP on local network>; done; done; done
3. wait a few minutes and keep watching the /var/log/openvswitch/ovs-vswitchd.log and the ovs-vswitchd consuming CPU.

And the reproducer steps on the OpenStack nova compute host (Focal/Ussuri), Ubuntu 20.04.4 LTS, kernel 5.4.0-164-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 4 vcpu
1. insert thousands of routing entries on a compute host (this took 90 minutes before I cancelled it):
for i in {10..25}; do for k in {0..255}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP from local net>; done; done; done
2. on a second session observe the ovs-vswitchd is consuming a lots of CPU cycles, also watch the /var/log/openvswitch/ovs-vswitchd.log for the messages I provided above.
3. sometimes, an attempt to schedule a new VM to this compute node fails because it can't add the VM's port to the bridge.
os server create --image cirros --boot-from-volume 3 --flavor m1.tiny --key-name testkey --network private --host juju-5ef7f4-octavia-20.cloud.sts --os-compute-api-version 2.74 cirr111

nova-compute.log contains:
: libvirt.libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int
2023-10-18 14:28:31.840 2612 ERROR nova.virt.libvirt.driver [req-7a6b767b-c7e0-4671-a1b9-48c0b68d397e 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] [instance: 8bbdc758-83c8-43d6-a3ab-d620c21c53a4] Failed to start libvirt guest: libvirt.libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int

[1] https://github.com/openvswitch/ovs-issues/issues/185
[2] https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052092.html

See original description

Marcin Wilk (wilkmarcin) on 2023-10-19

description:

updated

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuopenvswitch package

Massive stream of kernel routing table updates causes ovs-vswitchd process to burn CPU

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
openvswitch package