Default gateway for LXD containers cannot be influenced/changed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
Sometimes you may wish to use a different interface as your default gateway, this is not currently possible for LXD containers. Secondly to that, the default gateway for the "default space" is not used - it appears to use the first space in the database or some other arbitrary ordering.
Currently, with the MAAS provider, while not in the Web UI you can specify a different interface as the default gateway interface using the following command:
$ maas skc interface set-default-gateway wftkqx 914
You specify the machine ID (wftkqx) and the ID of the link that you want to be the default gateway (914). For testing purposes, the following command makes it easy to identify the link IDs for a given machine's interfaces:
maas skc nodes read | jq '.[] | . as $parent | .interface_set[] | "system_
Unfortunately this logic does not extend to LXD containers deployed by juju. juju appears to always use the "first" space as the default gateway, and by first I am not sure but I am guessing it is based on the order it appears in the database or similar.
In my setup I have spaces "vsw0" and "vsw1" (in that order in the MAAS interface and "juju spaces" output). When deploying a container with both spaces on a host for which MAAS has the default gateway set to vsw1, it appears vsw0 is always chosen as the container gateway. This includes when you set a different "default space"
juju deploy percona-cluster --bind "vsw1 shared-db=vsw0" -n2 --to lxd,lxd
The main reason this cause real problems, is that currently containers don't attempt to do any kind of source routing. So when a container is contacted on one interface, the default gateway for another interface is used. This leads to broken networking in some setups.
This could potentially also be solved by Bug #1737428 which allows for different default routes basic on the traffic "Source IP" - however it makes sense without that support to simply be able to specify the default route for the container - at the very least, perhaps by using the default space or otherwise if that is a problem for some reason perhaps through some other option.
tags: | added: sts |
description: | updated |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 2.5.1 |
Changed in juju: | |
milestone: | 2.5.1 → 2.5.2 |
Changed in juju: | |
milestone: | 2.5.2 → 2.5.3 |
Changed in juju: | |
milestone: | 2.5.3 → 2.5.4 |
Changed in juju: | |
milestone: | 2.5.4 → 2.5.5 |
Changed in juju: | |
milestone: | 2.5.6 → 2.5.8 |
Changed in juju: | |
milestone: | 2.5.8 → 2.5.9 |
Since you linked the bug I created about multi-homing you might know the workarounds already but I will summarize them just in case.
One of the ways to workaround the problem is using source-based policy routing (as you mentioned) for receiving TCP traffic. For sending traffic static routes have to be used as the destination host has to be known to direct traffic to the right outbound hop.
I suspect that using a charm to set policy rules can be a problem if the "first space" is not the one that needs to be used to contact the Juju controller from a machine/unit agent and the controller is not on the same L2 (in a different subnet) - this case could be quite relevant with L3 leaf-spine deployments with Juju HA enabled.
Regardless of how this is applied (cloud-init or charm), the following could be used:
1) with TCP (even with using an unbound listening socket - 0.0.0.0/INADDR_ANY, see man 2 listen and man 7 ip), you can rely on the fact that a connected socket of your TCP server will use a source address that was specified as a destination address on a client. When the client creates its own socket (5-tuple) to establish a TCP connection, it does not expect a source address of a response packet to magically change. Unless there is a broken NAT configuration, the receiving host with the TCP server uses received_ packet. destination_ addr as connected_ socket. source_ addr.
This allows you to avoid static routes and handle "unknown sender" scenarios correctly for receiving traffic with the following rules:
CIDR=192.168.1.0/24 # e.g. if you have eth1 <-> 192.168.1.10 subnet routing tables without hitting rp_filter by using asymmetric routing
ip route add default via $GATEWAY table $TABLE # add a default route to a different table
# add a policy rule to use per-interface-
ip rule add from $CIDR table $TABLE priority $PRIORITY
The trick is that a request will come to 192.168.1.10 from, say, 1.1.1.1 and a response source address will be selected as 192.168.1.10. The TCP server's kernel will then inspect the response packet source address and forward it using a $GATEWAY in $TABLE. This might be counter-intuitive as the locally-generated response is a subject of a policy rule - not a request.
A simple charm that could be used for that lives here (it can be improved to avoid hard-coding the interface): /git.launchpad. net/~canonical- bootstack/ charm-policy- routing/ tree/hooks/ config- changed /jujucharms. com/u/canonical -bootstack/ policy- routing
https:/
https:/
2) For UDP and unbound sockets (INADDR_ANY) the problem is that you only have one receiving (listening) socket and no connected socket. Your UDP server kernel figures out a source address to use during sendto(2) execution (getsockname would get the result). This is nicely summarized here: http:// laforge. gnumonks. org/blog/ 20171020- local_ip_ unbound_ udp/
Fortunately, most of our workloads are TCP and we do not hit that problem that often. For OpenStack deployments designate-bind might be problematic in case multiple interfaces are used for its container.
3) For sending traffic either static routes or VRF + SO_BINDTODEVICE have to be used as you either have to know exactly how to route to a given end h...