Neutron port leak when connection is dropped during port create in instance boot.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Matthew Gilliard | ||
Icehouse |
Fix Released
|
Undecided
|
Unassigned | ||
neutron |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Sometimes an instance fails to boot because the call to neutron to allocate a port fails. However, we see cases where this happens but the port is actually created and allocated to the now-deleted instance.
The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak.
Evidence
========
Sometimes instances fail to boot with the following error:
2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-
How did we run out of ports? Investigating further, starting with the neutron database:
mysql> select * from ports where device_owner like 'comput%';
This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://
Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically.
The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated.
Logs
====
The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created:
nova-compute.log:
2014-05-28 08:08:53.413 16699 DEBUG neutronclient.
2014-05-28 08:08:53.417 16699 ERROR nova.network.
(fuller section of log: http://
0.2s later, nova-compute.log:
2014-05-28 08:08:53.664 16699 DEBUG neutronclient.
(this is repeated once more after 0.2s longer. Slightly longer log section: http://
But eventually the port is present in the neutron database:
+------
| tenant_id | id | name | network_id | mac_address | admin_state_up | status | device_id | device_owner |
+------
| 10409882459003 | 916cba73-
It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked.
Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.
Solutions
=========
Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible.
A user can free up their ports with a script like https:/
Neutron synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).
no longer affects: | neutron |
tags: | added: network |
Changed in nova: | |
status: | New → In Progress |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
summary: |
- port leak when instance is deleted before ACTIVE + Neutron port leak when connection is dropped during port create in + instance boot. |
description: | updated |
description: | updated |
description: | updated |
Changed in nova: | |
importance: | Undecided → High |
assignee: | nobody → Aaron Rosen (arosen) |
Changed in nova: | |
milestone: | none → juno-1 |
status: | Fix Committed → Fix Released |
tags: | removed: icehouse-backport-potential |
Changed in neutron: | |
status: | In Progress → Invalid |
Changed in neutron: | |
assignee: | Matthew Gilliard (matthew-gilliard-u) → nobody |
Changed in nova: | |
milestone: | juno-1 → 2014.2 |
Changed in nova: | |
assignee: | Aaron Rosen (arosen) → Matthew Gilliard (matthew-gilliard-u) |
As a logged-in user this script will clean up ports allocated to deleted instances: https:/ /gist.github. com/moorryan/ 93fa4be65fc5ea6 0b3ed But we should be preventing it from happenning in the first place.