neutron l2 to dhcp lost when migrating in stable/stein 14.0.2

Bug #1849479 reported by Marek Grudzinski
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Slawek Kaplonski

Bug Description

Info about the environment:

3x controller nodes
50+ compute nodes

all in stable stein, neutron is 14.0.2 using OVS 2.11.0

neutron settings:
  - max_l3_agents_per_router = 3
  - dhcp_agents_per_network = 2
  - router_distributed = true
  - interface_driver = openvswitch
  - l3_ha = true

l3 agent:
  - agent_mode = dvr

ml2:
  - type_drivers = flat,vlan,vxlan
  - tenant_network_types = vxlan
  - mechanism_drivers = openvswitch,l2population
  - extension_drivers = port_security,dns
  - external_network_type = vlan

tenants may have multiple external networks
instances may have multiple interfaces

tests have been performed on 10 instances launched in a tenant network connected to a router in an external network. all instances have floating ip's assigned. these instances had only 1 interface. this particular testing tenant has rbac's for 4 external networks in which only 1 is used.

migrations have been done via cli with admin:
openstack server migrate --live <new_host> <instance_uuid>
have also tested using evacuate with same results

expected behavior:
when _multiple_ (in the ranges of 10+) instances is migrated simultaneously from one computehost to another, they should come up with a minor network service drop. all l2 should be resumed.

what actually happends:
instances are migrated, some errors pop in neutron/nova and then instances comes up with a minor network service drop. However L2 toward dhcp-servers is totally severed in OVS. The migrated instances will as expected start try renewal of lease half-way through it's current lease and at the end of it drop the IP. Easy test is try renewal of lease on an instance or icmp to any dhcp-server in that vxlan L2.

current workaround:
once the instance is migrated the l2 to dhcp-servers can be re-established by restarting neutron-openvswitch-agent on the destination host.

how to test:
create instances (10+), migrate and then try to ping neutron dhcp-server in the vxlan (tenant created network) or simply renew dhcp-leases.

error messages:

Exception during message handling: TooManyExternalNetworks: More than one external network exists. TooManyExternalNetworks: More than one external network exists.

other oddities:
when performing migration of small number of instances i.e. 1-4 migrations become successful and L2 with dhcp-servers is not lost.

when looking through debug logs i can't really find anything of relevance. no other large errors/warnings occur other that the one above.

i will perform more test when migrations are successful and/or neutron-openvswitch-agent restarted and see if L2 to dhcp-servers survive 24h.

This occurs in a 14.0.0 regression bug which should be fixed in 14.0.2 (this bugreport is for 14.0.2) but it could possible not work with this combination of settings(?).

Please let me know if any versions to api/services is required for this or any configurations or other info.

Revision history for this message
Marek Grudzinski (ivve) wrote :
Download full text (34.0 KiB)

Here is a full dumpflows from the host 10 instances are on (freshly migrated with the issue at hand). So in this dumpflow they can't communicate with dhcp.

 cookie=0xfc433b151081cc9d, duration=11448.791s, table=0, n_packets=0, n_bytes=0, priority=65535,vlan_tci=0x0fff/0x1fff actions=drop
 cookie=0xfc433b151081cc9d, duration=511.432s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo34809f80-73",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=497.129s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo4a20f841-b3",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=477.756s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoe855f573-8f",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=469.122s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo1fa8144f-c6",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=456.811s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoba71a357-ea",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=440.239s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoc711c404-f0",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=425.874s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo697908ee-00",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=413.444s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo3efbc4cf-ad",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=400.885s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo0634d6e7-a1",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=388.523s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo46770478-be",icmp_type=136 actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=511.429s, table=0, n_packets=7, n_bytes=294, priority=10,arp,in_port="qvo34809f80-73" actions=resubmit(,24)
 cookie=0xfc433b151081cc9d, duration=497.126s, table=0, n_packets=7, n_bytes=294, priority=10,arp,in_port="qvo4a20f841-b3" actions=resubmit(,24) ...

Revision history for this message
Marek Grudzinski (ivve) wrote :
Download full text (36.3 KiB)

And here is a full dumpflow when neutron-openvswitch-agent has been restarted and all 10 instances have verified successful L2 towards both dhcp-servers.

This is br-int again (like above).

 cookie=0x83eaa08cc0cf3e0a, duration=35.638s, table=0, n_packets=0, n_bytes=0, priority=65535,vlan_tci=0x0fff/0x1fff actions=drop
 cookie=0x83eaa08cc0cf3e0a, duration=18.914s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoe855f573-8f",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.902s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo697908ee-00",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.890s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoba71a357-ea",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.877s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo1fa8144f-c6",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.865s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo0634d6e7-a1",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.852s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo3efbc4cf-ad",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.839s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo46770478-be",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.826s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo34809f80-73",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.812s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoc711c404-f0",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.786s, table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvo4a20f841-b3",icmp_type=136 actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.911s, table=0, n_packets=0, n_bytes=0, priority=10,arp,in_port="qvoe855f573-8f" actions=resubmit(,24)
 cookie=0x83eaa08cc0cf3e0a, duration=18.899s, table=0, n_packets=0, n_bytes=0, priority=10,arp,in_port="qvo697908ee-00" actions=resubmit(,24) ...

Revision history for this message
Marek Grudzinski (ivve) wrote :
Download full text (10.7 KiB)

a full dumpflows of br-tun when instances has just been migrated (not working state).

 cookie=0x12519a1ea10f5355, duration=4882.076s, table=0, n_packets=9994, n_bytes=691494, priority=1,in_port="patch-int" actions=resubmit(,1)
 cookie=0x12519a1ea10f5355, duration=4882.104s, table=0, n_packets=18, n_bytes=756, priority=0 actions=drop
 cookie=0x12519a1ea10f5355, duration=1.005s, table=1, n_packets=5, n_bytes=210, priority=3,arp,dl_vlan=1,arp_tpa=10.13.37.1 actions=drop
 cookie=0x12519a1ea10f5355, duration=1.004s, table=1, n_packets=0, n_bytes=0, priority=2,dl_vlan=1,dl_dst=fa:16:3e:36:2d:37 actions=drop
 cookie=0x12519a1ea10f5355, duration=1.003s, table=1, n_packets=18, n_bytes=1568, priority=1,dl_vlan=1,dl_src=fa:16:3e:36:2d:37 actions=mod_dl_src:fa:16:3f:59:25:a3,resubmit(,2)
 cookie=0x12519a1ea10f5355, duration=4882.073s, table=1, n_packets=9867, n_bytes=682052, priority=0 actions=resubmit(,2)
 cookie=0x12519a1ea10f5355, duration=4882.103s, table=2, n_packets=7157, n_bytes=446146, priority=1,arp,dl_dst=ff:ff:ff:ff:ff:ff actions=resubmit(,21)
 cookie=0x12519a1ea10f5355, duration=4882.102s, table=2, n_packets=1702, n_bytes=148830, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x12519a1ea10f5355, duration=4882.100s, table=2, n_packets=1112, n_bytes=95456, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
 cookie=0x12519a1ea10f5355, duration=4882.099s, table=3, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x12519a1ea10f5355, duration=124.326s, table=4, n_packets=7, n_bytes=326, priority=1,tun_id=0x79 actions=mod_vlan_vid:1,resubmit(,9)
 cookie=0x12519a1ea10f5355, duration=4882.098s, table=4, n_packets=10885, n_bytes=466470, priority=0 actions=drop
 cookie=0x12519a1ea10f5355, duration=4882.097s, table=6, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x12519a1ea10f5355, duration=4882.046s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:0b:a6:26 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.040s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:0c:37:75 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.034s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:11:20:a8 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.027s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:12:46:02 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.021s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:14:fe:6f actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.015s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:24:b1:35 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.009s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:26:98:2f actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4882.002s, table=9, n_packets=0, n_bytes=0, priority=1,dl_src=fa:16:3f:27:c1:89 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4881.995s, table=9, n_packets=1, n_bytes=74, priority=1,dl_src=fa:16:3f:29:7c:63 actions=output:"patch-int"
 cookie=0x12519a1ea10f5355, duration=4881.989s, ...

Revision history for this message
Marek Grudzinski (ivve) wrote :
Download full text (13.8 KiB)

fully functional dhcp after neutron-openvswitch-agent has been restart.

 cookie=0x3addc1687d9fcdd6, duration=204.668s, table=0, n_packets=10244, n_bytes=709730, priority=1,in_port="patch-int" actions=resubmit(,1)
 cookie=0x3addc1687d9fcdd6, duration=198.314s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e47" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.308s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e46" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.303s, table=0, n_packets=747, n_bytes=31374, priority=1,in_port="vxlan-ac100e49" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.298s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e39" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.292s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e2e" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.287s, table=0, n_packets=47, n_bytes=4046, priority=1,in_port="vxlan-ac100e2b" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.277s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e42" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.272s, table=0, n_packets=48, n_bytes=4088, priority=1,in_port="vxlan-ac100e29" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.262s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e2a" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=198.255s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-ac100e32" actions=resubmit(,4)
 cookie=0x3addc1687d9fcdd6, duration=204.700s, table=0, n_packets=18, n_bytes=756, priority=0 actions=drop
 cookie=0x3addc1687d9fcdd6, duration=192.241s, table=1, n_packets=5, n_bytes=210, priority=3,arp,dl_vlan=1,arp_tpa=10.13.37.1 actions=drop
 cookie=0x3addc1687d9fcdd6, duration=192.240s, table=1, n_packets=0, n_bytes=0, priority=2,dl_vlan=1,dl_dst=fa:16:3e:36:2d:37 actions=drop
 cookie=0x3addc1687d9fcdd6, duration=192.239s, table=1, n_packets=41, n_bytes=3694, priority=1,dl_vlan=1,dl_src=fa:16:3e:36:2d:37 actions=mod_dl_src:fa:16:3f:59:25:a3,resubmit(,2)
 cookie=0x3addc1687d9fcdd6, duration=204.665s, table=1, n_packets=10094, n_bytes=698162, priority=0 actions=resubmit(,2)
 cookie=0x3addc1687d9fcdd6, duration=204.698s, table=2, n_packets=7287, n_bytes=453982, priority=1,arp,dl_dst=ff:ff:ff:ff:ff:ff actions=resubmit(,21)
 cookie=0x3addc1687d9fcdd6, duration=204.697s, table=2, n_packets=1820, n_bytes=159090, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x3addc1687d9fcdd6, duration=204.695s, table=2, n_packets=1114, n_bytes=95596, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
 cookie=0x3addc1687d9fcdd6, duration=204.694s, table=3, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x3addc1687d9fcdd6, duration=201.387s, table=4, n_packets=102, n_bytes=8460, priority=1,tun_id=0x79 actions=mod_vlan_vid:1,resubmit(,9)
 cookie=0x3addc1687d9fcdd6, duration=204.693s, table=4, n_packets=11632, n_bytes=497844, priority=0 actions=drop
 cookie=0x3addc1687d9fcdd6, duration=204.692s, table=6, n_packets=0, n_bytes=...

Revision history for this message
Marek Grudzinski (ivve) wrote :

It's been over 18 hours and instances are still OK. Renews and L2 to the dhcp server is still stable after restarting neutron-openvswitch-agent.

Revision history for this message
Marek Grudzinski (ivve) wrote :

I don't know how clear I was about it but migrating 1 or 2 instances at a time does not cause this problem. Only when migrating a bunch of instances (like emptying a host with evacuate or using openstack server migrate with a list of instances).

Revision history for this message
Brian Haley (brian-haley) wrote :

We talked about this bug in the L3 meeting this week, Slawek was going to take a look, just busy with other things.

Changed in neutron:
importance: Undecided → High
importance: High → Medium
Revision history for this message
Brian Haley (brian-haley) wrote :

Oops, we actually talked about https://bugs.launchpad.net/neutron/+bug/1849392 which seems related. I'll assign this one to Slawek as well since if they are he will know once the other is triaged.

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
tags: added: l3-dvr-backlog
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi Marek,

Can You also provide me some additional informations about logs which You attached?

1. What is mac address of port from VM which can't connect to dhcp port?
2. What is mac address of dhcp port?
3. Is connectivity between instances which are in same network as not working dhcp port works fine through tunnels?
4. Are You using L2populate mechanism driver?

Revision history for this message
Marek Grudzinski (ivve) wrote :

Hello Slawek,

1. I will provide mac's later during the day.

2. Same as 1.

3. Yes, full l2 to everything else works. I can manually assign the same IP and get l3 connection to other instances in the same vxlan or the router, getting full connectivity basically. And as mentioned, restarting the neutron-openvswitch-agent and refreshing dhcp-client on the instances get everything back up and running. It can also be done proactively to avoid the loss of lease which is inevitable unless the agent is restarted, forcing a refresh of the dhcp-client.

4. Yes as seen in the configuration in the initial post: mechanism_drivers = openvswitch,l2population.

Revision history for this message
Marek Grudzinski (ivve) wrote :

Hi again Slawek,

I was delayed with some other work/off-time.

Here are the macs (dhcp servers for the 10 test instances):

fa:16:3e:4b:f8:2d
fa:16:3e:51:f4:84

Macs for all the 10 instances:

fa:16:3e:e5:f5:48
fa:16:3e:58:d7:f2
fa:16:3e:de:60:d0
fa:16:3e:ef:03:f9
fa:16:3e:cd:75:35
fa:16:3e:cd:45:2b
fa:16:3e:5f:6a:a1
fa:16:3e:52:05:a1
fa:16:3e:0a:ea:d7
fa:16:3e:0e:3f:c2

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx Marek for this data. I checked flows and I think that there are missing some flows in br-tun bridge before You restart neutron-ovs-agent.

For sure there is no flows:

table=0, priority=1,in_port="vxlan-ac100e47" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e46" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e49" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e39" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e2e" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e2b" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e42" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e29" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e2a" actions=resubmit(,4)
table=0, priority=1,in_port="vxlan-ac100e32" actions=resubmit(,4)

Which are responsible for packets comming to the host from vxlan tunnels. And next flow, which is:

table=0, priority=0 actions=drop

has got some packets in the counter.

Also there are missing flows like:

table=20, priority=2,dl_vlan=1,dl_dst=fa:16:3e:51:f4:84 actions=strip_vlan,load:0x79->NXM_NX_TUN_ID[],output:"vxlan-ac100e2b"
table=20, priority=2,dl_vlan=1,dl_dst=fa:16:3e:4b:f8:2d actions=strip_vlan,load:0x79->NXM_NX_TUN_ID[],output:"vxlan-ac100e29"

table=21, priority=1,arp,dl_vlan=1,arp_tpa=10.13.37.11 actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e51f484->NXM_NX_ARP_SHA[],load:0xa0d250b->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:51:f4:84,IN_PORT
table=21, priority=1,arp,dl_vlan=1,arp_tpa=10.13.37.10 actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e4bf82d->NXM_NX_ARP_SHA[],load:0xa0d250a->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:4b:f8:2d,IN_PORT

table=22, priority=1,dl_vlan=1 actions=strip_vlan,load:0x79->NXM_NX_TUN_ID[],output:"vxlan-ac100e47",output:"vxlan-ac100e46",output:"vxlan-ac100e49",output:"vxlan-ac100e39",output:"vxlan-ac100e2e",output:"vxlan-ac100e2b",output:"vxlan-ac100e42",output:"vxlan-ac100e29",output:"vxlan-ac100e2a",output:"vxlan-ac100e32"

So what I suspect here, it is some issue with l2population mechanism driver but I don't know exactly what the issue is there.

As a next steps, I think You should enable debug everywhere (on neutron-server and ovs-agents) and than try to reproduce the issue and check what is maybe missing or wrong there.
Also You can check if this dhcp requests are not going out from compute node to the vxlan tunnel, or maybe requests are sent properly and replies are dropped somewhere. It may also help us to understand exactly which missing flow is causing this problem.

Revision history for this message
Marek Grudzinski (ivve) wrote :

Hello, we decided to do a full restart of all controller nodes. So far migrations _seem_ to work, but only done smaller moves (<10 instances). Will test some larger evacuates & migrations this week.

However this error still keeps appearing with no noticeable issues:

Exception during message handling: TooManyExternalNetworks: More than one external network exists.
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 254, in get_external_network_id
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server net_id = self.plugin.get_external_network_id(context)
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/db/external_net_db.py", line 149, in get_external_network_id
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server raise n_exc.TooManyExternalNetworks()
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server TooManyExternalNetworks: More than one external network exists.
2019-11-26 15:09:01.263 26 ERROR oslo_messaging.rpc.server

Revision history for this message
Piotr Misiak (pmisiak) wrote :

We also have similar issue. I suppose the l2pop is responsible.
In our case the OVS br-tun 22 table is not properly configured:

cookie=0x161e6c4bed4713c7, duration=8446.269s, table=22, n_packets=508, n_bytes=29903, priority=1,dl_vlan=35 actions=strip_vlan,load:0x9e->NXM_NX_TUN_ID[],output:"vxlan-0ad30035",output:"vxlan-0ad30030",output:"vxlan-0ad3004c",output:"vxlan-0ad3001e",output:"vxlan-0ad30025",output:"vxlan-0ad30010",output:"vxlan-0ad3000f",output:"vxlan-0ad3000e",output:"vxlan-0ad30016",output:"vxlan-0ad3001c",output:"vxlan-0ad3003f",output:"vxlan-0ad3003c",output:"vxlan-0ad30040",output:"vxlan-0ad30041"

The output ports list is not complete, it lacks some compute nodes where VMs attached to the same virtual network are running.
Sometimes it lacks output ports towards network nodes where DHCP servers are running.

Restarting neutron-openvswitch-agent fixes the OVS entry and resolves the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.