Comment 3 for bug 1978088

Revision history for this message
LIU Yulong (dragon889) wrote :

I guess you have met the neutron scale issue. This https://bugs.launchpad.net/neutron/+bug/1813703 bug is a summary which includes many issues when you have many compute nodes or many resources on one host. So from my personal experiences, the problem can be overcomed by restarting the ovs-agent and L3-agent less than 20 hosts once. Neutron-server side had many time-comsuming DB query, which should be the main issue of this. So please have a try of restart agents on 20 hosts during your agent-restart-procedure.
Another problem is that I'm not sure why the L2 pop notifications are missing or out-of-order during the ovs-agent restart. Rpc loop>0 will not start unless the neutron-server side send all l2 pop informations, because update_device_list is a rpc "call" function which will return the information of devices_up, devices_down and so on. And the notify_l2pop_port_wiring is inside it. So, maybe run some check of why DB query lost the required information.