This is actually a known upstream bug [1] however it affects OpenStack deployments with the OVS l2. Therefore I am opening also this LP to provide more context/findings.
Massive update stream of the kernel routing tables causes (the rtnetlink notifications) the ovs-vswitchd process to burn CPU. This is turn causes that the ovs-vswitchd it can't keep up the the updates causing OpenStack VM connectivity issues. I have seen this problem on the customer's production compute nodes (Focal/Ussuri) and this was affecting VM connectivity.
The /var/log/openvswitch/ovs-vswitchd.log contains following messages indicating the problem:
2023-10-18T14:27:46.912Z|04626|poll_loop|INFO|Dropped 4 log messages in last 11 seconds (most recently, 11 seconds ago) due to excessive rate
2023-10-18T14:27:46.912Z|04627|poll_loop|INFO|wakeup due to [POLLIN] on fd 43 (127.0.0.1:49868<->127.0.0.1:6633) at ../lib/stream-fd.c:157 (97% CPU usage)
2023-10-18T14:27:46.912Z|04628|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (97% CPU usage)
2023-10-18T14:27:46.913Z|04629|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connecting...
2023-10-18T14:27:46.914Z|04630|rconn|INFO|br-data<->tcp:127.0.0.1:6633: connecting...
2023-10-18T14:27:47.986Z|02093|ovs_rcu(urcu4)|WARN|blocked 1001 ms waiting for main to quiesce
2023-10-18T14:27:48.985Z|02094|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting for main to quiesce
2023-10-18T14:27:50.986Z|02095|ovs_rcu(urcu4)|WARN|blocked 4001 ms waiting for main to quiesce
2023-10-18T14:27:54.985Z|02096|ovs_rcu(urcu4)|WARN|blocked 8000 ms waiting for main to quiesce
2023-10-18T14:27:57.817Z|04631|timeval|WARN|Unreasonably long 10905ms poll interval (3817ms user, 6811ms system)
2023-10-18T14:27:57.817Z|04632|timeval|WARN|context switches: 35 voluntary, 2722 involuntary
2023-10-18T14:27:57.817Z|04633|poll_loop|INFO|Dropped 2 log messages in last 11 seconds (most recently, 11 seconds ago) due to excessive rate
2023-10-18T14:27:57.817Z|04634|poll_loop|INFO|wakeup due to [POLLIN] on fd 43 (127.0.0.1:49868<->127.0.0.1:6633) at ../lib/stream-fd.c:157 (97% CPU usage)
2023-10-18T14:27:57.817Z|04635|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (97% CPU usage)
2023-10-18T14:27:57.818Z|04636|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connected
2023-10-18T14:27:57.818Z|04637|rconn|INFO|br-data<->tcp:127.0.0.1:6633: connected
2023-10-18T14:27:58.910Z|02097|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting for main to quiesce
2023-10-18T14:27:59.910Z|02098|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting for main to quiesce
2023-10-18T14:28:01.910Z|02099|ovs_rcu(urcu4)|WARN|blocked 4000 ms waiting for main to quiesce
2023-10-18T14:28:05.910Z|02100|ovs_rcu(urcu4)|WARN|blocked 8000 ms waiting for main to quiesce
2023-10-18T14:28:08.830Z|04638|timeval|WARN|Unreasonably long 11012ms poll interval (3972ms user, 6725ms system)
2023-10-18T14:28:08.830Z|04639|timeval|WARN|context switches: 32 voluntary, 2304 involuntary
2023-10-18T14:28:08.830Z|04640|poll_loop|INFO|Dropped 1 log messages in last 11 seconds (most recently, 11 seconds ago) due to excessive rate
2023-10-18T14:28:08.830Z|04641|poll_loop|INFO|wakeup due to [POLLIN] on fd 43 (127.0.0.1:49868<->127.0.0.1:6633) at ../lib/stream-fd.c:157 (97% CPU usage)
There is the ovs mailing list discussion thread [2] that explains the reasons and mechanics of this behaviour. Even though [1] and [2] mention BGB full table update as a trigger for the faulty situation, I was able to reproduce the issue without BGB at all. Just updating the kernel routing table in the loop.
The simplest reproducer on pure Ubuntu 20.04.6 LTS, kernel 5.4.0-162-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 1 vcpu:
1. sudo apt install openvswitch-switch
2. for i in {10..25}; do for k in {0..20}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP on local network>; done; done; done
3. wait a few minutes and keep watching the /var/log/openvswitch/ovs-vswitchd.log
And the reproducer steps on the OpenStack nova compute host (Focal/Ussuri), Ubuntu 20.04.4 LTS, kernel 5.4.0-164-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 4 vcpu
1. insert thousands of routing entries on a compute host (this took 90 minutes before I cancelled it):
for i in {10..25}; do for k in {0..255}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP from local net>; done; done; done
2. on a second session observe the ovs-vswitchd is consuming a lots of CPU cycles, also watch the /var/log/openvswitch/ovs-vswitchd.log for the messages I provided above.
3. sometimes, an attempt to schedule a new VM to this compute node fails because it can't add the VM's port to the bridge.
os server create --image cirros --boot-from-volume 3 --flavor m1.tiny --key-name testkey --network private --host juju-5ef7f4-octavia-20.cloud.sts --os-compute-api-version 2.74 cirr111
nova-compute.log contains:
: libvirt.libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int
2023-10-18 14:28:31.840 2612 ERROR nova.virt.libvirt.driver [req-7a6b767b-c7e0-4671-a1b9-48c0b68d397e 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] [instance: 8bbdc758-83c8-43d6-a3ab-d620c21c53a4] Failed to start libvirt guest: libvirt.libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int
This is actually a known upstream bug [1] however it affects OpenStack deployments with the OVS l2. Therefore I am opening also this LP to provide more context/findings.
Massive update stream of the kernel routing tables causes (the rtnetlink notifications) the ovs-vswitchd process to burn CPU. This is turn causes that the ovs-vswitchd it can't keep up the the updates causing OpenStack VM connectivity issues. I have seen this problem on the customer's production compute nodes (Focal/Ussuri) and this was affecting VM connectivity.
The /var/log/ openvswitch/ ovs-vswitchd. log contains following messages indicating the problem: 18T14:27: 46.912Z| 04626|poll_ loop|INFO| Dropped 4 log messages in last 11 seconds (most recently, 11 seconds ago) due to excessive rate 18T14:27: 46.912Z| 04627|poll_ loop|INFO| wakeup due to [POLLIN] on fd 43 (127.0. 0.1:49868< ->127.0. 0.1:6633) at ../lib/ stream- fd.c:157 (97% CPU usage) 18T14:27: 46.912Z| 04628|poll_ loop|INFO| wakeup due to [POLLIN] on fd 17 (NETLINK_ ROUTE<- >NETLINK_ ROUTE) at ../lib/ netlink- socket. c:1401 (97% CPU usage) 18T14:27: 46.913Z| 04629|rconn| INFO|br- int<->tcp: 127.0.0. 1:6633: connecting... 18T14:27: 46.914Z| 04630|rconn| INFO|br- data<-> tcp:127. 0.0.1:6633: connecting... 18T14:27: 47.986Z| 02093|ovs_ rcu(urcu4) |WARN|blocked 1001 ms waiting for main to quiesce 18T14:27: 48.985Z| 02094|ovs_ rcu(urcu4) |WARN|blocked 2000 ms waiting for main to quiesce 18T14:27: 50.986Z| 02095|ovs_ rcu(urcu4) |WARN|blocked 4001 ms waiting for main to quiesce 18T14:27: 54.985Z| 02096|ovs_ rcu(urcu4) |WARN|blocked 8000 ms waiting for main to quiesce 18T14:27: 57.817Z| 04631|timeval| WARN|Unreasonab ly long 10905ms poll interval (3817ms user, 6811ms system) 18T14:27: 57.817Z| 04632|timeval| WARN|context switches: 35 voluntary, 2722 involuntary 18T14:27: 57.817Z| 04633|poll_ loop|INFO| Dropped 2 log messages in last 11 seconds (most recently, 11 seconds ago) due to excessive rate 18T14:27: 57.817Z| 04634|poll_ loop|INFO| wakeup due to [POLLIN] on fd 43 (127.0. 0.1:49868< ->127.0. 0.1:6633) at ../lib/ stream- fd.c:157 (97% CPU usage) 18T14:27: 57.817Z| 04635|poll_ loop|INFO| wakeup due to [POLLIN] on fd 17 (NETLINK_ ROUTE<- >NETLINK_ ROUTE) at ../lib/ netlink- socket. c:1401 (97% CPU usage) 18T14:27: 57.818Z| 04636|rconn| INFO|br- int<->tcp: 127.0.0. 1:6633: connected 18T14:27: 57.818Z| 04637|rconn| INFO|br- data<-> tcp:127. 0.0.1:6633: connected 18T14:27: 58.910Z| 02097|ovs_ rcu(urcu4) |WARN|blocked 1000 ms waiting for main to quiesce 18T14:27: 59.910Z| 02098|ovs_ rcu(urcu4) |WARN|blocked 2000 ms waiting for main to quiesce 18T14:28: 01.910Z| 02099|ovs_ rcu(urcu4) |WARN|blocked 4000 ms waiting for main to quiesce 18T14:28: 05.910Z| 02100|ovs_ rcu(urcu4) |WARN|blocked 8000 ms waiting for main to quiesce 18T14:28: 08.830Z| 04638|timeval| WARN|Unreasonab ly long 11012ms poll interval (3972ms user, 6725ms system) 18T14:28: 08.830Z| 04639|timeval| WARN|context switches: 32 voluntary, 2304 involuntary 18T14:28: 08.830Z| 04640|poll_ loop|INFO| Dropped 1 log messages in last 11 seconds (most recently, 11 seconds ago) due to excessive rate 18T14:28: 08.830Z| 04641|poll_ loop|INFO| wakeup due to [POLLIN] on fd 43 (127.0. 0.1:49868< ->127.0. 0.1:6633) at ../lib/ stream- fd.c:157 (97% CPU usage)
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
2023-10-
There is the ovs mailing list discussion thread [2] that explains the reasons and mechanics of this behaviour. Even though [1] and [2] mention BGB full table update as a trigger for the faulty situation, I was able to reproduce the issue without BGB at all. Just updating the kernel routing table in the loop.
The simplest reproducer on pure Ubuntu 20.04.6 LTS, kernel 5.4.0-162-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 1 vcpu: openvswitch/ ovs-vswitchd. log
1. sudo apt install openvswitch-switch
2. for i in {10..25}; do for k in {0..20}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP on local network>; done; done; done
3. wait a few minutes and keep watching the /var/log/
And the reproducer steps on the OpenStack nova compute host (Focal/Ussuri), Ubuntu 20.04.4 LTS, kernel 5.4.0-164-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 4 vcpu openvswitch/ ovs-vswitchd. log for the messages I provided above. octavia- 20.cloud. sts --os-compute- api-version 2.74 cirr111
1. insert thousands of routing entries on a compute host (this took 90 minutes before I cancelled it):
for i in {10..25}; do for k in {0..255}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via <an IP from local net>; done; done; done
2. on a second session observe the ovs-vswitchd is consuming a lots of CPU cycles, also watch the /var/log/
3. sometimes, an attempt to schedule a new VM to this compute node fails because it can't add the VM's port to the bridge.
os server create --image cirros --boot-from-volume 3 --flavor m1.tiny --key-name testkey --network private --host juju-5ef7f4-
nova-compute.log contains: libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int libvirt. driver [req-7a6b767b- c7e0-4671- a1b9-48c0b68d39 7e 93eeeebc2920450 faaaf2395443505 fd d6761dacbb06491 89a07a4a1a191a8 c0 - 6a4439622e71431 c8b96073e33c3b7 e1 6a4439622e71431 c8b96073e33c3b7 e1] [instance: 8bbdc758- 83c8-43d6- a3ab-d620c21c53 a4] Failed to start libvirt guest: libvirt. libvirtError: internal error: Unable to add port tap32029b67-2a to OVS bridge br-int
: libvirt.
2023-10-18 14:28:31.840 2612 ERROR nova.virt.
[1] https:/ /github. com/openvswitch /ovs-issues/ issues/ 185 /mail.openvswit ch.org/ pipermail/ ovs-discuss/ 2022-October/ 052092. html
[2] https:/