Stop and delete default libvirt network after installation

Bug #1387390 reported by Jill Rouleau
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
nova-compute (Juju Charms Collection)
Fix Released
Medium
Trent Lloyd

Bug Description

libvirt-bin (depended by nova-compute) brings in unnecessary MASQs, while nova-compute actually uses it's own openvswitch. Charm should delete or disable the extraneous MASQs, or at least provide the option to do so.

~$ sudo dpkg -S /etc/libvirt/qemu/networks/default.xml
libvirt-bin: /etc/libvirt/qemu/networks/default.xml

~$ sudo apt-cache rdepends libvirt-bin
libvirt-bin
Reverse Depends:
  nova-compute-libvirt
  libvirt-dev
  python-libvirt
  nova-compute-libvirt
  maas-cluster-controller
  libvirt-dev
  virt-goodies
  uvtool-libvirt
 |opennebula-node
  libsys-virt-perl
  koan
  gnome-boxes
  virtinst
  virt-manager
  ubuntu-virt-server
  python-libvirt
  nova-compute-libvirt
  maas-cluster-controller
  libvirt-dev
  apparmor

~$ sudo iptables -L -n -t nat
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
neutron-openvswi-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
neutron-openvswi-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
neutron-openvswi-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0
neutron-postrouting-bottom all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 192.168.122.0/24 224.0.0.0/24
RETURN all -- 192.168.122.0/24 255.255.255.255
MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FYI this triggered a service outage (swift-storage) from conntrack exhaustion
on a deployment where we aggregate nova-compute and swift-storage services
into same units, with (default) net.nf_conntrack_max = 65536.

tags: added: bootstack
tags: added: canonical-bootstack
removed: bootstack
tags: added: openstack
Revision history for this message
James Page (james-page) wrote :

@jjo

Would increasing the net.nf_conntrack_max configuration option make sense as a good practice default? I could see how that might get exhausted in a number of ways.

Revision history for this message
James Page (james-page) wrote :

I guess we could also stop and disable the default libvirt network; that would be sensible option.

Changed in nova-compute (Juju Charms Collection):
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

We could also delete the default libvirt network which should have the right result as well.

Changed in nova-compute (Juju Charms Collection):
status: New → Triaged
milestone: none → 15.04
summary: - provide a way to remove unnecessary libvirt MASQ
+ Stop and delete default libvirt network after installation
Changed in nova-compute (Juju Charms Collection):
importance: Medium → Low
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 15.04 → 15.07
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 15.07 → 15.10
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 15.10 → 16.01
Felipe Reyes (freyes)
Changed in nova-compute (Juju Charms Collection):
assignee: nobody → Felipe Reyes (freyes)
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 16.01 → 16.04
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 16.04 → 16.07
Revision history for this message
Trent Lloyd (lathiat) wrote :

I hit this today on a production system. One of the tenant networks was using 192.168.0.0/17 and eventually started allocating from 192.168.122.0. All traffic from those IPs get masqueraded which breaks all tenant traffic and the metadata service.

The traffic makes it over the GRE tunnel due to the rule being POSTROUTING but the public IP of the compute node is on the packet, and the return traffic never gets back to the compute node.

Deleting the default network with virsh will immediately delete the MASQUERADE rules and prevent it occurring on reboot. I suggest that the charm checks for existence of /etc/libvirt/qemu/networks/default.xml and then executes virsh net-destroy default if it does.

See also related bug about having a default in libvirt at all: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1567466

tags: added: sts
Revision history for this message
Trent Lloyd (lathiat) wrote :

Change submitted for review: https://review.openstack.org/340085

Changed in nova-compute (Juju Charms Collection):
assignee: Felipe Reyes (freyes) → Trent Lloyd (lathiat)
importance: Low → Medium
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.openstack.org/340085
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=b533a3fbfb96bd5939b3e790f67964ce3ba8f7fd
Submitter: Jenkins
Branch: master

commit b533a3fbfb96bd5939b3e790f67964ce3ba8f7fd
Author: Trent Lloyd <email address hidden>
Date: Sun Jul 10 10:59:47 2016 +0800

    Delete libvirt-bin default network to avoid MASQUERADE rules

    libvirt-bin installs a 192.168.122.0/24 default network and creates
    MASQUERADE rules for it on boot. These rules will effect and break
    instance traffic including GRE tenant networks.

    Check if the network exists and then destroy it with virsh net-destroy
    which both immediately removes the MASQUERADE rules and the network so
    it is not applied after reboot.

    Change-Id: Ia79aea6ef889d1ef58f903f967bea37dc07fd160
    Closes-Bug: #1387390

Changed in nova-compute (Juju Charms Collection):
status: In Progress → Fix Committed
Liam Young (gnuoy)
Changed in nova-compute (Juju Charms Collection):
status: Fix Committed → Fix Released
JuanJo Ciarlante (jjo)
Changed in nova-compute (Juju Charms Collection):
status: Fix Released → Confirmed
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Reopening, today we had a tenant outage because of this:
libvirt-bin package was (auto)upgraded on 2017-03-20,
re-instating the 192.168.122.0/24 network, interferring
with any tenant VM with a neutron port IP inside that CIDR.

Observed behavior FTR: packets leaving the neutron bridge
thru qvb interface getting MASQ'd by libvirt nat rules.

Details from the debugging session:
"af06e846-37" is the common id of the VMs ports, for iface purposes

1) OK: tap iface egress from the VM:
root@compute-031:~# tcpdump -n -i tapaf06e846-37 icmp
18:05:52.656501 IP 192.168.122.10 > 8.8.8.8: ICMP echo request, id 1, seq 387, length 40

2) OK: as seen at qbr neutron bridge
root@compute-031:~# tcpdump -n -i qbraf06e846-37 icmp
18:06:22.656489 IP 192.168.122.10 > 8.8.8.8: ICMP echo request, id 1, seq 393, length 40

3) BAD!: getting SNATted/MASQ-d when leaving the bridge:
root@compute-031:~# tcpdump -n -i qvbaf06e846-37 icmp
18:06:52.656471 IP 10.182.255.57 > 8.8.8.8: ICMP echo request, id 1, seq 399, length 40

Sticky workaround: after removing the network,
use dpkg-divert to void a default.xml re-placement:

juju run --application=nova-compute 'rm -f /etc/libvirt/qemu/networks/default.xml;
  virsh net-destroy default;
  dpkg-divert --package libvirt-bin --add --rename --divert /etc/libvirt/qemu/networks/default.xml{.disabled,}'

Revision history for this message
Nobuto Murata (nobuto) wrote :

It looks like the original patch only does `virsh net-destroy default`. But no `virsh net-autostart --disable default`. That could be a reason to show up 192.168.122.0/24 again after daemon restart or machine reboot.

Revision history for this message
Trent Lloyd (lathiat) wrote :

Yeah my original patch was faulty and only did net-destroy and not net-undefine.

I thought it had been fixed since but it turns out I started fixing it in my local branch and never pushed it up and I was looking at that code. I should probably look to submit that change for review.

I assume this is a conffile and dpkg shouldn't reinstall the file on upgrade?

Revision history for this message
Alvaro Uria (aluria) wrote :

On a xenial-ocata environment, a nova-compute unit shows:
"""
$ sudo iptables -S -t nat
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
-A POSTROUTING -s 10.0.121.0/24 ! -d 10.0.121.0/24 -m comment --comment "managed by lxd-bridge" -j MASQUERADE
"""

Pools are from virbr0 and lxdbr0.

Revision history for this message
Alvaro Uria (aluria) wrote :

Should we file a new bug to track the need of a new fix?

Revision history for this message
Trent Lloyd (lathiat) wrote :

For 192.168.122.0/24 this is the same issue, I'll investigate that in this bug.

For the lxd-bridge 10.0.121.0/24 that may need a new bug but I'll take an initial look at that one.

Revision history for this message
Alvaro Uria (aluria) wrote :

Is there any update on this bug? Milestone seems it needs to be updated, and future outages could happen as they've occurred in the past.

Revision history for this message
Trent Lloyd (lathiat) wrote :

The net-undefine was fixed in Bug #1800160 - so putting this bug back into fix released

Changed in nova-compute (Juju Charms Collection):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.