Static routes in subnet are ignored in multihomed deployments with policy based routing
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Medium
|
Lee Trager | ||
2.6 |
Won't Fix
|
Medium
|
Lee Trager | ||
2.7 |
In Progress
|
Medium
|
Lee Trager | ||
2.8 |
Fix Released
|
Medium
|
Adam Collard |
Bug Description
[Impact]
Static routes defined in a subnet set for policy routing (has a secondary gateway) are ignored because the gateway has higher priority
[Test case]
Having two subnets:
1. 192.168.122.0/24 with gateway 192.168.122.1
2. 10.0.1.0/24 with gateway 10.0.1.1 + static route: 10.0.2.0/24 via 10.0.1.2
Deploying a machine with a nic in each subnet creates the following netplan config:
...
routes:
- metric: 0
to: 10.0.2.0/24
- table: 1
to: 0.0.0.0/0
- from: 10.0.1.0/24
- from: 10.0.1.0/24
to: 10.0.1.0/24
...
Which ends up looking like this:
ubuntu@maas-node:~$ ip r
default via 192.168.122.1 dev ens3 proto static
10.0.1.0/24 dev ens1 proto kernel scope link src 10.0.1.252
10.0.2.0/24 via 10.0.1.2 dev ens1 proto static
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.210
ubuntu@maas-node:~$ ip rule
0: from all lookup local
0: from 10.0.1.0/24 to 10.0.1.0/24 lookup main
100: from 10.0.1.0/24 lookup 1
32766: from all lookup main
32767: from all lookup default
ubuntu@maas-node:~$ ip r show table 1
default via 10.0.1.1 dev ens1 proto static
This setup gives the default gw for the subnet (10.0.1.1) higher prio than the static route (10.0.1.2). To test it, if the machine has the following IPs:
ubuntu@maas-node:~$ ip a| grep ens
2: ens3: <BROADCAST,
inet 192.168.122.210/24 brd 192.168.122.255 scope global ens3
3: ens1: <BROADCAST,
inet 10.0.1.252/24 brd 10.0.1.255 scope global ens1
Asking "ip" to show the route depending on the interface, we can see how the next hop varies:
ubuntu@maas-node:~$ ip r get 10.0.2.1
10.0.2.1 via 10.0.1.2 dev ens1 src 10.0.1.252 uid 1000
cache
ubuntu@maas-node:~$ ip r get 10.0.2.1 from 10.0.1.252
10.0.2.1 from 10.0.1.252 via 10.0.1.1 dev ens1 table 1 uid 1000
cache
Userland applications results vary too depending on what interface they choose to use:
ubuntu@maas-node:~$ mtr 10.0.2.1 --report-cycles 1 --report
Start: 2020-05-
HOST: maas-node Loss% Snt Last Avg Best Wrst StDev
1.|-- 10.0.1.2 0.0% 1 1.0 1.0 1.0 1.0 0.0
2.|-- 10.0.2.1 0.0% 1 1.7 1.7 1.7 1.7 0.0
ubuntu@maas-node:~$ traceroute -n 10.0.2.1
traceroute to 10.0.2.1 (10.0.2.1), 30 hops max, 60 byte packets
1 10.0.1.1 0.719 ms 0.737 ms 0.729 ms
2 * * *
3 * * *
One way to fix this is to add the static route to the policy table (instead of main) and add a rule for it:
ubuntu@maas-node:~$ sudo ip rule add to 10.0.2.0/24 lookup 1 prio 100
ubuntu@maas-node:~$ sudo ip r add 10.0.2.0/24 via 10.0.1.2 dev ens1 table 1
ubuntu@maas-node:~$ traceroute -n 10.0.2.1
traceroute to 10.0.2.1 (10.0.2.1), 30 hops max, 60 byte packets
1 10.0.1.2 0.545 ms 0.487 ms 0.244 ms
2 10.0.2.1 0.990 ms 0.931 ms 0.862 ms
Leaving the config like this:
ubuntu@maas-node:~$ ip r
default via 192.168.122.1 dev ens3 proto static
10.0.1.0/24 dev ens1 proto kernel scope link src 10.0.1.252
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.210
ubuntu@maas-node:~$ ip rule
0: from all lookup local
0: from 10.0.1.0/24 to 10.0.1.0/24 lookup main
100: from 10.0.1.0/24 lookup 1
100: from all to 10.0.2.0/24 lookup 1
32766: from all lookup main
32767: from all lookup default
ubuntu@maas-node:~$ ip r show table 1
default via 10.0.1.1 dev ens1 proto static
10.0.2.0/24 via 10.0.1.2 dev ens1
Related branches
- Adam Collard (community): Approve
-
Diff: 160 lines (+57/-25)2 files modifiedsrc/maasserver/preseed_network.py (+45/-20)
src/maasserver/tests/test_preseed_network.py (+12/-5)
- Adam Collard (community): Approve
- MAAS Lander: Needs Fixing
-
Diff: 160 lines (+57/-25)2 files modifiedsrc/maasserver/preseed_network.py (+45/-20)
src/maasserver/tests/test_preseed_network.py (+12/-5)
Changed in maas: | |
assignee: | nobody → Lee Trager (ltrager) |
Changed in maas: | |
status: | New → In Progress |
importance: | Undecided → Medium |
milestone: | none → 2.8.0rc1 |
Changed in maas: | |
milestone: | 2.8.0rc1 → 2.8.0 |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
milestone: | 2.8.0 → 2.9.0b1 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
I forgot to mention that this happens from 2.6 up to 2.8. I'm attaching a patch for 2.8 that returns the expected results:
ubuntu@final:~$ ip r
default via 192.168.122.1 dev ens4 proto static
10.0.1.0/24 dev ens7 proto kernel scope link src 10.0.1.254
192.168.122.0/24 dev ens4 proto kernel scope link src 192.168.122.227
ubuntu@final:~$ ip r show table 1
default via 10.0.1.1 dev ens7 proto static
10.0.2.0/24 via 10.0.1.2 dev ens7 proto static
ubuntu@final:~$ ip rule
0: from all lookup local
0: from 10.0.1.0/24 to 10.0.1.0/24 lookup main
100: from 10.0.1.0/24 lookup 1
100: from all to 10.0.2.0/24 lookup 1
32766: from all lookup main
32767: from all lookup default
ubuntu@final:~$ ping 10.0.2.1 -c1
PING 10.0.2.1 (10.0.2.1) 56(84) bytes of data.
64 bytes from 10.0.2.1: icmp_seq=1 ttl=63 time=1.89 ms
--- 10.0.2.1 ping statistics --- 895/1.895/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.895/1.
ubuntu@final:~$ ping -I 10.0.1.254 10.0.2.1 -c1
PING 10.0.2.1 (10.0.2.1) from 10.0.1.254 : 56(84) bytes of data.
64 bytes from 10.0.2.1: icmp_seq=1 ttl=63 time=1.43 ms
--- 10.0.2.1 ping statistics --- 439/1.439/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.439/1.
ubuntu@final:~$ traceroute -n 10.0.2.1
traceroute to 10.0.2.1 (10.0.2.1), 30 hops max, 60 byte packets
1 10.0.1.2 0.521 ms 0.471 ms 0.436 ms
2 10.0.2.1 1.460 ms 1.435 ms 1.395 ms