When we restart the neutron-l3-agent we observe that backup routers start accepting router advertisements. This leads to routes inside the router namespace which expire.
e.g.:
$ ip netns exec qrouter-a5f7fb32-3e30-4e15-89f9-4ae888c2cac6 ip -6 r
x:x:1002:1::/64 dev qr-72f85121-ce proto kernel metric 256 expires 86355sec pref medium
x:x:1002:1::/64 dev qr-4e84792f-aa proto kernel metric 256 expires 86355sec pref medium
fe80::/64 dev ha-9d085c9d-15 proto kernel metric 256 pref medium
default via fe80::f816:3eff:fed3:3fa6 dev qr-4e84792f-aa proto ra metric 1024 expires 255sec hoplimit 64 pref medium
default via fe80::f816:3eff:fed3:3fa6 dev qr-72f85121-ce proto ra metric 1024 expires 255sec hoplimit 64 pref medium
When we now failover to such a backup router, the kernel does not create the necessary directly attached routes because they already exist. The problem is that those routes expire and because we are now a master router the routes do not refresh from the router advertisement anymore and expire after 24h which breaks ipv6 for those routers.
After we dug a bit deeper into this issue we found that the function [1] that disables the accept_ra on the backup routers always returns false. So backup routers never get their router advertisement disabled.
When we restart the neutron-l3-agent we observe that backup routers start accepting router advertisements. This leads to routes inside the router namespace which expire. a5f7fb32- 3e30-4e15- 89f9-4ae888c2ca c6 ip -6 r 3eff:fed3: 3fa6 dev qr-4e84792f-aa proto ra metric 1024 expires 255sec hoplimit 64 pref medium 3eff:fed3: 3fa6 dev qr-72f85121-ce proto ra metric 1024 expires 255sec hoplimit 64 pref medium
e.g.:
$ ip netns exec qrouter-
x:x:1002:1::/64 dev qr-72f85121-ce proto kernel metric 256 expires 86355sec pref medium
x:x:1002:1::/64 dev qr-4e84792f-aa proto kernel metric 256 expires 86355sec pref medium
fe80::/64 dev ha-9d085c9d-15 proto kernel metric 256 pref medium
default via fe80::f816:
default via fe80::f816:
When we now failover to such a backup router, the kernel does not create the necessary directly attached routes because they already exist. The problem is that those routes expire and because we are now a master router the routes do not refresh from the router advertisement anymore and expire after 24h which breaks ipv6 for those routers.
After we dug a bit deeper into this issue we found that the function [1] that disables the accept_ra on the backup routers always returns false. So backup routers never get their router advertisement disabled.
master router: 92ed5c1f- c705-4ab9- a0e1-56e905d43a bd sysctl net.ipv6. conf.qr- c7eb60ab- f1.accept_ ra conf.qr- c7eb60ab- f1.accept_ ra = 1
$ ip netns exec qrouter-
net.ipv6.
backup router: 92ed5c1f- c705-4ab9- a0e1-56e905d43a bd sysctl net.ipv6. conf.qr- c7eb60ab- f1.accept_ ra conf.qr- c7eb60ab- f1.accept_ ra = 1
$ ip netns exec qrouter-
net.ipv6.
[1] https:/ /github. com/openstack/ neutron/ blob/master/ neutron/ agent/l3/ ha_router. py#L318