Comment 1 for bug 1781856

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Since you linked the bug I created about multi-homing you might know the workarounds already but I will summarize them just in case.

One of the ways to workaround the problem is using source-based policy routing (as you mentioned) for receiving TCP traffic. For sending traffic static routes have to be used as the destination host has to be known to direct traffic to the right outbound hop.

I suspect that using a charm to set policy rules can be a problem if the "first space" is not the one that needs to be used to contact the Juju controller from a machine/unit agent and the controller is not on the same L2 (in a different subnet) - this case could be quite relevant with L3 leaf-spine deployments with Juju HA enabled.

Regardless of how this is applied (cloud-init or charm), the following could be used:

1) with TCP (even with using an unbound listening socket - 0.0.0.0/INADDR_ANY, see man 2 listen and man 7 ip), you can rely on the fact that a connected socket of your TCP server will use a source address that was specified as a destination address on a client. When the client creates its own socket (5-tuple) to establish a TCP connection, it does not expect a source address of a response packet to magically change. Unless there is a broken NAT configuration, the receiving host with the TCP server uses received_packet.destination_addr as connected_socket.source_addr.

This allows you to avoid static routes and handle "unknown sender" scenarios correctly for receiving traffic with the following rules:

CIDR=192.168.1.0/24 # e.g. if you have eth1 <-> 192.168.1.10
ip route add default via $GATEWAY table $TABLE # add a default route to a different table
# add a policy rule to use per-interface-subnet routing tables without hitting rp_filter by using asymmetric routing
ip rule add from $CIDR table $TABLE priority $PRIORITY

The trick is that a request will come to 192.168.1.10 from, say, 1.1.1.1 and a response source address will be selected as 192.168.1.10. The TCP server's kernel will then inspect the response packet source address and forward it using a $GATEWAY in $TABLE. This might be counter-intuitive as the locally-generated response is a subject of a policy rule - not a request.

A simple charm that could be used for that lives here (it can be improved to avoid hard-coding the interface):
https://git.launchpad.net/~canonical-bootstack/charm-policy-routing/tree/hooks/config-changed
https://jujucharms.com/u/canonical-bootstack/policy-routing

2) For UDP and unbound sockets (INADDR_ANY) the problem is that you only have one receiving (listening) socket and no connected socket. Your UDP server kernel figures out a source address to use during sendto(2) execution (getsockname would get the result). This is nicely summarized here: http://laforge.gnumonks.org/blog/20171020-local_ip_unbound_udp/

Fortunately, most of our workloads are TCP and we do not hit that problem that often. For OpenStack deployments designate-bind might be problematic in case multiple interfaces are used for its container.

3) For sending traffic either static routes or VRF + SO_BINDTODEVICE have to be used as you either have to know exactly how to route to a given end host or bind a socket used for sending to a certain interface associated with a routing table through a VRF.