agentschedulers: concurrent port delete on unscheduling may cause unscheduling to fail

Bug #1775496 reported by Kailun Qin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Kailun Qin

Bug Description

When a network is removed from a dhcp agent, in some scenarios if the agent releases its port concurrently, there is chance that the removal of network from agent will fail due to that the target port is not found.

The issue can be reproduced on the latest devstack.
Steps to reproduce:
1. neutron port-list, identify the port to delete
2. Remove one network from a dhcp agent: neutron dhcp-agent-network-remove --dhcp_agent xxx --network xxx
   AND at the same time delete the port associated: neutron port-delete xxx

Failed CLI:
vagrant@control:~/devstack$ neutron dhcp-agent-network-remove --dhcp_agent 73721261-41c6-4f82-b0f4-ef9a750c7f70 --network net
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Port 6089a77e-1975-40a5-9d4d-819e0d9e8fd5 could not be found.
Neutron server returns request_ids: ['req-dfecf6a3-8d61-435b-a6a2-919ac6ca972f']
Failed Log:
DEBUG oslo.privsep.daemon [^[[00;36m-] ^[[01;35mprivsep: Exception during request[140677005388208]: Network interface tap83924265-3e not found in namespace qdhcp-d57b2982-69e8-4e62-8dd0-6241f204e132.^[[00m ^[[00;33m{{(pid=10686) loop /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:449}}^[[00m
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 445, in loop
    reply = self._process_cmd(*msg)
  File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 428, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 209, in _wrap
    return func(*args, **kwargs)
  File "/opt/stack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 272, in get_link_attributes
    link = _run_iproute_link("get", device, namespace)[0]
  File "/opt/stack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 130, in _run_iproute_link
    idx = _get_link_id(device, namespace)
  File "/opt/stack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 124, in _get_link_id
    raise NetworkInterfaceNotFound(device=device, namespace=namespace)
NetworkInterfaceNotFound: Network interface tap83924265-3e not found in namespace qdhcp-d57b2982-69e8-4e62-8dd0-6241f204e132.

Kailun Qin (kailun.qin)
Changed in neutron:
assignee: nobody → Kailun Qin (kailun.qin)
summary: - agentschedulers: concurrent port delete on unscheduling may cause port
- not found
+ agentschedulers: concurrent port delete on unscheduling may cause
+ unscheduling to fail
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/573097

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/573097
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fe907b7fc62bbf0e5d30dcfb885848bd29469a50
Submitter: Zuul
Branch: master

commit fe907b7fc62bbf0e5d30dcfb885848bd29469a50
Author: Kailun Qin <email address hidden>
Date: Thu Jun 7 20:29:48 2018 +0800

    Fix unscheduling fail when concurrent port delete

    When a network is removed from a dhcp agent, in some scenarios if the
    agent releases its port concurrently, there is chance that the
    unscheduling will fail due to that the target port is not found.

    Catch the PortNotFound exception as an expected error under this type of
    concurrent circumstance and logs it to move forward.

    Closes-Bug: #1775496
    Change-Id: Ib51b364f6ced0de7685c8ee07c1d292308d919f5
    Signed-off-by: Kailun Qin <email address hidden>

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b3

This issue was fixed in the openstack/neutron 13.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.