[L3-DVR]l3-agent arp table will not update

Bug #1968860 reported by uchenily
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

when create/update/delete ports, the PERMANENT arp entry in qrouter namespace will not be updated.

Take deleting port as a example, when deleting a port(without associating with a floating ip), callback method
 _notify_routers_callback() will be called when port AFTER_DELETE event occurs, router_ids == [], so rpc call: routers_updated() will not be triggered, router will not be updated, neither do ARP tables.
(https://opendev.org/openstack/neutron/src/branch/stable/yoga/neutron/db/l3_db.py#L1995)

In short, if we add PERMANENT arp entry in qrouter(https://opendev.org/openstack/neutron/src/tag/20.0.0.0rc2/neutron/agent/l3/dvr_local_router.py#L349), we must make sure arp cache can be updated properly.

And I have another question, why not use mac learning directly? Even when using tunnel networks, mac learning works well because of l2population, add permanent arp entry will bring a series of problems.

Tags: l3-dvr
uchenily (uchenily)
tags: added: l3-dvr
description: updated
Revision history for this message
LIU Yulong (dragon889) wrote :

This will be a really old story, it was introduced from the very original DVR implementation[1]. I guess there was no L2 pop with ARP repsonder at that time, so in order to restrain the ARP broadcast they added such ARP works. On another hand, for provider network types (vlan/flat), L2pop is not enabled, in such case, the ARP entry for DVR gateway is necessary.

So, maybe add a config option for various clouds to decide whether do such ARP entry work.

[1] https://review.opendev.org/c/openstack/neutron/+/89413/28/neutron/agent/l3_agent.py#666

Revision history for this message
uchenily (uchenily) wrote (last edit ):

Control plane arp updates for DVR has been removed[1], I think it may be the reason why the ARP table is not updated as expected.

The arp entry for DVR gateway is necessary, but for other arp entries, such as dhcp servers and vms, they are not such important, so may be we can change this method[2], remove unnecessary parts, for example, get_ports_by_subnet(subnet_id) is heavy and maybe we can remove it.

```
for p in self.get_snat_interfaces():
   self._update_arp_entry(...)
```
arp entry for dvr gateway is added in both _set_subnet_arp_info() and external_gateway_added()[3], this is all we need to keep.

[1]https://review.opendev.org/c/openstack/neutron/+/653883/7
[2]https://opendev.org/openstack/neutron/src/tag/20.0.0/neutron/agent/l3/dvr_local_router.py#L349
[3]https://opendev.org/openstack/neutron/src/tag/20.0.0/neutron/agent/l3/dvr_local_router.py#L629

Revision history for this message
uchenily (uchenily) wrote :

"DVR: Remove control plane arp updates for DVR" has been reverted.

related bugs:
https://bugs.launchpad.net/neutron/+bug/1916761

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

IIRC we need to populate arp entries in the DVR routers' namespaces because e.g. in case of the routing between two different, private networks when VMs are on different nodes, we can't basically send arp requests as those will not come back properly to the correct node.
So we need to populate arp entries to the router namespaces on all hosts.

@uchenily can You give exact steps to reproduce that issue? I will try to reproduce and check it locally then.

Revision history for this message
uchenily (uchenily) wrote :

@Slawek Kaplonski, I forgot to mention the latest code that was not used in our environment, the commit "DVR: Remove control plane arp updates for DVR" caused arp entries didn't update in DVR router namespace. After I updated the code, this commit has been reverted, so everything is normal now.

In the scenario you mentioned above, I think l2pop will work if we add tunnel network to dvr router, persistent arp entries are not necessary.
Besides, creating arp entries when router sync will take a lot of time if the router is associated with a large number of subnets.

Maybe we need to consider adding arp entries only when using flat/vlan network.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.