port-delete fails

Bug #1517480 reported by Wim De Clercq
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-bgpvpn
Fix Released
Critical
Thomas Morin

Bug Description

Reproduce:
- BGPVPN A
- Network B associated to A
- Port C in network B

Now delete Port C. neutronclient will report success but in the neutron logs I see the logs below.

2015-11-18 11:51:01.527 DEBUG neutron.plugins.ml2.plugin [req-c7bc9eb4-9412-47ec-852c-3182b8cd15bf admin fcd217bcd44e40298d9920720e66d790] Calling delete_port for 8780f4d0-f482-4d6f-8494-78effbc3c1e8 owned by from (pid=1110) delete_port /data/repos/os-neutron-bgpvpn/neutron/plugins/ml2/plugin.py:1360
2015-11-18 11:51:01.650 DEBUG neutron.callbacks.manager [req-c7bc9eb4-9412-47ec-852c-3182b8cd15bf admin fcd217bcd44e40298d9920720e66d790] Notify callbacks for port, after_delete from (pid=1110) _notify_loop /data/repos/os-neutron-bgpvpn/neutron/callbacks/manager.py:133
2015-11-18 11:51:01.651 DEBUG neutron.callbacks.manager [req-c7bc9eb4-9412-47ec-852c-3182b8cd15bf admin fcd217bcd44e40298d9920720e66d790] Calling callback networking_bgpvpn.neutron.services.service_drivers.bagpipe.bagpipe.BaGPipeBGPVPNDriver.registry_port_deleted from (pid=1110) _notify_loop /data/repos/os-neutron-bgpvpn/neutron/callbacks/manager.py:140
2015-11-18 11:51:01.652 INFO networking_bgpvpn.neutron.services.service_drivers.bagpipe.bagpipe [req-c7bc9eb4-9412-47ec-852c-3182b8cd15bf admin fcd217bcd44e40298d9920720e66d790] remove_port_from_bgpvpn_agent port updated on port 8780f4d0-f482-4d6f-8494-78effbc3c1e8 status DOWN
2015-11-18 11:51:01.739 ERROR neutron.callbacks.manager [req-c7bc9eb4-9412-47ec-852c-3182b8cd15bf admin fcd217bcd44e40298d9920720e66d790] Error during notification for networking_bgpvpn.neutron.services.service_drivers.bagpipe.bagpipe.BaGPipeBGPVPNDriver.registry_port_deleted port, after_delete
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager Traceback (most recent call last):
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager File "/data/repos/os-neutron-bgpvpn/neutron/callbacks/manager.py", line 141, in _notify_loop
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager callback(resource, event, trigger, **kwargs)
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager File "/data/repos/networking-bgpvpn/networking_bgpvpn/neutron/services/service_drivers/bagpipe/bagpipe.py", line 392, in registry_port_deleted
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager self.remove_port_from_bgpvpn_agent(context, port_dict)
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager File "/data/repos/networking-bgpvpn/networking_bgpvpn/neutron/services/service_drivers/bagpipe/bagpipe.py", line 353, in remove_port_from_bgpvpn_agent
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager agent_host = self._get_port_host(port['id'])
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager File "/data/repos/networking-bgpvpn/networking_bgpvpn/neutron/services/service_drivers/bagpipe/bagpipe.py", line 300, in _get_port_host
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager port_id)
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager File "/data/repos/os-neutron-bgpvpn/neutron/db/db_base_plugin_v2.py", line 1252, in get_port
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager port = self._get_port(context, id)
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager File "/data/repos/os-neutron-bgpvpn/neutron/db/db_base_plugin_common.py", line 218, in _get_port
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager raise n_exc.PortNotFound(port_id=id)
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager PortNotFound: Port 8780f4d0-f482-4d6f-8494-78effbc3c1e8 could not be found
2015-11-18 11:51:01.739 TRACE neutron.callbacks.manager

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

The issue lies in the fact that the service plugin subscribes to port AFTER_DELETE events rather than BEFORE_DELETE event.
When the callback is triggered, it is too late, the Port does not exist anymore.

Changed in bgpvpn:
status: New → Confirmed
importance: Undecided → Critical
milestone: none → liberty
Revision history for this message
Wim De Clercq (wim-de-clercq) wrote :

Note that according to ml2 code in BEFORE_DELETE there is still a chance that the port isn't there in the database.

It seems to me the passed port in the AFTER_DELETE already contains all required data, maybe we don't need the lookup?

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

See [1], line 260, on why we need to do a db lookup.

I would be surprised if anything in the DB is touched before the BEFORE_DELETE event is fired: is there something in particular that makes you believe that ?

[1] https://github.com/openstack/networking-bgpvpn/blob/master/networking_bgpvpn/neutron/services/service_drivers/bagpipe/bagpipe.py#L260

Revision history for this message
Wim De Clercq (wim-de-clercq) wrote :

Pretty much nothing is don at before_delete time.

I can only assume ml2 has code for port not existing because of concurrency reasons.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

I agree. Using BEFORE_DELETE event as a trigger will work only if no further processing is done by ML2 before all event consumers have acted upon the event. Else we'll have races.

Need to check how the callback framework is supposed to work...

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

The callback registry is supposed to dispatch to all consumer to let them a chance to raise an Exception and abort the processing. So we shouldn't have races.

However, assuming bagpipe bgpbpn driver listens to BEFORE_DELETE, in the case where because of something external to bgpvpn, the PORT_DELETE process is aborted, bagpipe bgpvpn should be ready to abort/cancel or revert the port delete work.

Reverting should be doable (do like for a port coming up), but will result in the agent getting a "down" followed by an "up".

The alternative is to do the db lookup (to find the host_id) during BEFORE_DELETE, but delay the real action until AFTER_DELETE arrives. For this to work we need a way to propagate the host_id to the place where AFTER_DELETE is processed.

Revision history for this message
Wim De Clercq (wim-de-clercq) wrote :

How would bagpipe be notified if another external thing causes the PORT_DELETE abort?

--

As far as I know the only way to save something from BEFORE_DELETE to AFTER_DELETE is by putting it in the context object. I'm not a 100% sure if that's intended usage of the context but as far as I know that works.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

I've also thought about putting data in the context object, but like you, I'm not sure the approach is supposed to work in the future.

On abort:

By bagpipe would be notified if another external thing causes the PORT_DELETE abort, by subscribing to and receiving an ABORT_DELETE event.

See http://docs.openstack.org/developer/neutron/devref/callbacks.html

Looking at the source, this does not seem to be implemented though.

The simple solution, for now, is I believe so simply act on the BEFORE_DELETE event.

Revision history for this message
Wim De Clercq (wim-de-clercq) wrote :

Thinking about it.. I actually wonder what's causing the port that comes in in AFTER_DELETE makes it not have a binding:host...
I'm not sure where policy.json comes into the picture.

I mean, the port is retrieved from the database, I don't think the database part has any knowledge of which columns are for admin or not.
After it is pulled out of the DB it goes into
self._make_port_dict(port_db)
This method doesn't take a tenant or session or context, so it can't know whether it's admin or not.

And after that it's send to AFTER_DELETE ...

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

My assumption had been that port state update processing were not done in an admin context, but maybe I was wrong.
Maybe the reason why the port dict is incomplete is somewhere else.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

ML2 in fact returns a port dict built from models_v2.Port, but the host binding information is in another db object (models.PortBinding).

Revision history for this message
Wim De Clercq (wim-de-clercq) wrote :

Yeah I've seen that. When I'm debugging though, I also see the binding info is part of that port from models_v2....
I didn't look deeper as to why that happens though.

Changed in bgpvpn:
assignee: nobody → Thomas Morin (tmmorin-orange)
summary: - port-delete succeeds but gives a traceback in neutron logs
+ port-delete fails
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-bgpvpn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/249081

Changed in bgpvpn:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-bgpvpn (master)

Reviewed: https://review.openstack.org/249081
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=2aebe3da2d16578574ab8f8c2a86aeafdd16c0ca
Submitter: Jenkins
Branch: master

commit 2aebe3da2d16578574ab8f8c2a86aeafdd16c0ca
Author: Thomas Morin <email address hidden>
Date: Tue Nov 24 09:51:47 2015 +0100

    bagpipe: port delete action on BEFORE_DELETE event

    Details on the issue in bug #1517480.

    We need to subscribe to port before_delete events, because
    on after_delete events the port is already removed from the db
    and we can't retrieve the binding:host_id information (which
    is not passed in the event either).

    This change also include a unit test against this bug, which also
    happens to go through the ML2 RPC-to-registry codepaths so that
    the effect of real registry events against our code are tested, thus
    testing the validity of the expectation that the bagpipe driver code
    has against ML2. This addresses bug #1493901.

    One test is not passing, and thus commented out: see bug 1519258.

    Closes-Bug: 1517480
    Closes-Bug: 1493901

    Change-Id: Icce89d98bcc4424529dcbe661fed3c1c8191418c

Changed in bgpvpn:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-bgpvpn (backport/kilo)

Fix proposed to branch: backport/kilo
Review: https://review.openstack.org/251780

Changed in bgpvpn:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.