loosing connectivity to instance with FloatingIP randomly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned |
Bug Description
I have problem with randomly loosing connectivity to instances by FloatingIP.
Instances fully functional and can ping each other by private IP.
This problem state of network appear and disappear randomly but often state changed after create/delete instance in this private network or reboot neutron-
Usually, I have loosing connectivity to all instances in same private network and placed on same host. Instances on same host but in different network or in same network but on other host works properly.
Live migration of instance from one host to another usually restore connection to instance.
No errors in neutron/nova/ovs logs.
In problem state of network
tcpdump icmp packets on instance/tap port shows that requests reach instance and instance answer on it.
tcpdump on router/qr port shows only requests but no answers.
In working state of network I see request and answers on router/qr port.
I try to dump OVS flows in problem and working network state, but i did not found any differences in flows.
ovs-appctl ofproto/trace for icmp answers looks the same for working/problem state and shows correct output router port
Environment:
I use HA, DVR routers with OVS
OpenStack Train Release installed by kolla
Ubintu Xenial
Neutron 15.0.2 (but problem start appear after upgrade form Rocky to Train release to 15.0.0 neutron version, then to 15.0.1 and now 15.0.2)
OVS version 2.12.0
Changing firewall driver from OVS Hybrid to OVS Native do not help
Please help me localize and troubleshoot this bug!
I have some questions about the issue:
1. Does your deployment have centralized floating IPs? Your L3 agents in compute nodes are all in "dvr" mode?
2. What's your tenant network type? vlan? vxlan? or?
3. What's your external network type?
4. What's the bond mode of physical NICs for your floating IP traffic running on compute hosts? mode 6?
And if the config could be pasted here, that will help a lot for the team to find out the problem.