Microstack Servers Have no Network Egress

Bug #1865892 reported by Joseph Phillips
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MicroStack
Confirmed
High
Unassigned

Bug Description

After running a dist-upgrade (on Bionic), Microstack servers can be created, but have no network egress.

Observed this issue with both beta and edge.

Current Snap info:
https://pastebin.ubuntu.com/p/FRYrk9d3dc/

systemctl status 'snap.microstack.*':
https://pastebin.ubuntu.com/p/7FSQ2RPDs3/

Current networks:
https://pastebin.ubuntu.com/p/NwZb2tcn9D/

Subnets:
https://pastebin.ubuntu.com/p/WqcBrfcBBq/

Revision history for this message
Pen Gale (pengale) wrote :

Thank you for filing the bug report!

Making as confirmed as I've also observed the issue, though I don't have a reproducer setup in our testing environment just yet ...

Changed in microstack:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Joseph Phillips (manadart) wrote :

I did a fresh installation using the latest edge Snap and --devmode.

I am still experiencing the issue, though the service issues look somewhat different.

systemctl status 'snap.microstack.*':
https://pastebin.ubuntu.com/p/ZkXghyN5yh/

Revision history for this message
mariole (keller-eric) wrote :

I do have the same issue.
Basically a fresh install of microstack + init seems to work fine, but when I poweroff the PC and start the VM after successfull boot it lack egress access, cannot ping google.com, cannot apt update...

I checked the services, seems to run fine there is no obvious error in the logs.

```
ip a

3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ea:68:ad:69:5e:67 brd ff:ff:ff:ff:ff:ff
4: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5a:15:ac:4f:1c:49 brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 76:69:92:6c:23:4d brd ff:ff:ff:ff:ff:ff
    inet 10.20.20.1/24 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::7469:92ff:fe6c:234d/64 scope link
       valid_lft forever preferred_lft forever
6: br-int: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
    link/ether 52:94:ec:32:f6:4c brd ff:ff:ff:ff:ff:ff
16: tapfd4e7dc3-6d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:ac:d5:6a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:feac:d56a/64 scope link
       valid_lft forever preferred_lft forever
```

seems that the `brctl show` results into empty output:
sudo brctl show
bridge name bridge id STP enabled interfaces

additionally the iptables look like:
```
# Generated by iptables-save v1.6.1 on Mon Apr 20 22:52:16 2020
*raw
:PREROUTING ACCEPT [164339:67231967]
:OUTPUT ACCEPT [161102:69855489]
COMMIT
# Completed on Mon Apr 20 22:52:16 2020
# Generated by iptables-save v1.6.1 on Mon Apr 20 22:52:16 2020
*mangle
:PREROUTING ACCEPT [164399:67246606]
:INPUT ACCEPT [162120:67082522]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [161160:69870024]
:POSTROUTING ACCEPT [161160:69870024]
COMMIT
# Completed on Mon Apr 20 22:52:16 2020
# Generated by iptables-save v1.6.1 on Mon Apr 20 22:52:16 2020
*filter
:INPUT ACCEPT [162122:67082626]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [161163:69870408]
COMMIT
# Completed on Mon Apr 20 22:52:16 2020
# Generated by iptables-save v1.6.1 on Mon Apr 20 22:52:16 2020
*nat
:PREROUTING ACCEPT [2310:170391]
:INPUT ACCEPT [33:6411]
:OUTPUT ACCEPT [1580:95288]
:POSTROUTING ACCEPT [1580:95288]
-A POSTROUTING -s 10.20.20.0/24 ! -d 10.20.20.0/24 -j MASQUERADE
COMMIT
# Completed on Mon Apr 20 22:52:16 2020
```

any hint how we can debug this further?

Revision history for this message
mariole (keller-eric) wrote :

following the network issue troubleshooting: https://wiki.openstack.org/wiki/OpsGuide-Network-Troubleshooting

```
Finding a Failure in the Path
Use ping to quickly find where a failure exists in the network path. In an instance, first see whether you can ping an external host, such as google.com. If you can, then there shouldn't be a network problem at all.

If you can't, try pinging the IP address of the compute node where the instance is hosted. If you can ping this IP, then the problem is somewhere between the compute node and that compute node's gateway.

If you can't ping the IP address of the compute node, the problem is between the instance and the compute node. This includes the bridge connecting the compute node's main NIC with the vnet NIC of the instance.

One last test is to launch a second instance and see whether the two instances can ping each other. If they can, the issue might be related to the firewall on the compute node.
```

I can ping between 2 instances, I can ping the compute node where I run microstack. Everthing seems to poing out "then the problem is somewhere between the compute node and that compute node's gateway."

so next step was to use tcpdump:
```
tcpdump -i any -n -v 'icmp[icmptype] = icmp-echoreply or icmp[icmptype] = icmp-echo'
...
10.20.20.2 > 192.168.178.36: ICMP echo request, id 1588, seq 1, length 64
23:04:09.838803 IP (tos 0x0, ttl 64, id 8866, offset 0, flags [none], proto ICMP (1), length 84)
...
```

on the compute hosting microstack I could track the icmp messages from the VM 10.20.20.2 to the node 192.168.178.36 specifying the ip.

on the other test, ping 8.8.8.8 from the node was also visible in the logs:
```
192.168.222.6 > 8.8.8.8: ICMP echo request, id 1638, seq 1, length 64
23:07:05.330433 IP (tos 0x0, ttl 63, id 16531, offset 0, flags [DF], proto ICMP (1), length 84)
```

seems that it could also be related to some DNS relay as the ip would go through but not google.com

I also change the VM /etc/resolv.conf file `nameserver 8.8.8.8`
but this did not change much.

```
sudo killall dnsmasq
sudo systemctl restart snap.microstack*
```

did also not solve the issue. :(

Revision history for this message
mariole (keller-eric) wrote :

seems there are quite some code getting ignore if it does now work properly:

```
cat microstack/snap-overlay/bin/setup-br-ex
#!/bin/bash
#
# Oneshot daemon which creates a networking bridge.
#
# Creates br-ex, and sets up an ip address for it. We put this in a
# oneshot so that the ip address persists after reboot, without
# needing to add networking entries to the host system. (We want this
# to work well when we turn off classic confinement.)

set -ex

extcidr=$(snapctl get config.network.ext-cidr)

# Create external integration bridge
ovs-vsctl --retry --may-exist add-br br-ex

# Configure br-ex
ip address add $extcidr dev br-ex || :
ip link set br-ex up || :

sudo iptables -w -t nat -A POSTROUTING -s $extcidr ! \
     -d $extcidr -j MASQUERADE || :

exit 0
```

the || : seems to alias to || true and return a 0 on failure.
https://superuser.com/questions/1022374/what-does-mean-in-the-context-of-a-shell-script
Is this really what we want here if some networking setup of the br-ex encountered some issues?

Revision history for this message
Joseph Phillips (manadart) wrote :

I am experiencing this again.

I only use my MicroStack occasionally for testing, but it was fine for some time. Last week, bootstrapping to it failed and I noticed that I no longer had any of the interfaces: br-int, br-ex, br-tun.

I re-installed the Snap, stein 2020-07-09 (206), with --devmode and it is exhibiting the lack of egress again.

Let me know if there are particulars I can provide.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Taking a shallow look, I can see that setup-br-ex is only called once during init:

https://opendev.org/x/microstack/blame/commit/e59d15eb587619c18619aaf3ec2dce55efe1f8ed/tools/init/init/questions/__init__.py#L187

The setup-br-ex script contains the code to apply non-persistent configuration to the system:

https://opendev.org/x/microstack/blame/commit/e59d15eb587619c18619aaf3ec2dce55efe1f8ed/snap-overlay/bin/setup-br-ex#L12-L24

Therefore, on reboot this will definitely cause egress connectivity to disappear.

Running this will likely work around the issue after reboot:

sudo snap run --shell microstack.openstack
# setup-br-ex

Revision history for this message
Joseph Phillips (manadart) wrote :

Running "setup-br-ex" appears to exit 0, but did not fix the issue.

Revision history for this message
Pen Gale (pengale) wrote :

setup-br-ex runs every time the snap starts, as a one shot daemon. It's specified in the snapcraft.yaml like so:

  external-bridge:
    command: wait-on-init setup-br-ex
    daemon: oneshot
    after: [ovs-vswitchd]
    plugs:
      - network
      - network-control

The script fails silently when it fails. The intent as to allow it to be idempotent without any fuss. But it is quite possible that it is masking a failure.

You might check to verify that ipv4 forwarding is turned on. The snap used to do this by itself, but the hook to do so go dropped as part of the strict confinement work. You can check this with the sysctl command. The value should be 1:

    ~$ sudo sysctl net.ipv4.ip_forward
    net.ipv4.ip_forward = 1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.