Live migration artifacts are not cleaned up properly when queued live migration is aborted

Bug #1960412 reported by Alexey Stupnikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Alexey Stupnikov
Wallaby
Fix Committed
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned

Bug Description

Bug #1949808 describes one of the problems affecting aborted queued live migrations: VM's status is not reverted back to ACTIVE and VM is left in MIGRATING state.

However that's not the single problem with aborted queued live migrations: some changes (port bindings on destination host, resource allocations and possibly instance's pci_requests) are introduced by Nova control plane before live migration actually started by source compute host. Those left-overs should also be removed.

description: updated
Changed in nova:
assignee: nobody → Alexey Stupnikov (astupnikov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/830010

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I used the following patch to confirm that instance's pci_requests are not actually changed by update_pci_request_spec_with_allocated_interface_name() when it is called by conductor's _find_destination() (conductor/tasks/live_migrate.py)

  diff --git a/nova/tests/functional/test_servers_resource_request.py b/nova/tests/functional/test_servers_resource_request.py
  index 1fb39ac98a..ae65df740c 100644
  --- a/nova/tests/functional/test_servers_resource_request.py
  +++ b/nova/tests/functional/test_servers_resource_request.py
  @@ -2725,6 +2725,7 @@ class LiveMigrateAbortWithPortResourceRequestTest(

           # wait for the migration to start
           migration = self._wait_for_migration_status(server, ['running'])
  + self._assert_pci_request_pf_device_name(server, 'host2-ens2')

           # delete the migration to abort it
           self.api.delete_migration(server['id'], migration['id'])

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
tags: added: live-migration yoga-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/830010
Committed: https://opendev.org/openstack/nova/commit/1ad287bf9a8f65ce68c14f4634775f58abda15c2
Submitter: "Zuul (22348)"
Branch: master

commit 1ad287bf9a8f65ce68c14f4634775f58abda15c2
Author: Alexey Stupnikov <email address hidden>
Date: Sat Feb 19 21:38:44 2022 +0100

    Add functional tests to reproduce bug #1960412

    Instance would be affected by problems described in bug #1949808
    and bug #1960412 when queued live migration is aborted.

    This change adds functional test to reproduce problems with
    placement allocations (record for aborted live migration is not
    removed when queued live migration is aborted) and with Neutron port
    bindings (INACTIVE port binding records for destination host are not
    removed when queued live migration is aborted).

    It looks like there are no other modifications introduced by Nova
    control plane which should be reverted when queued live migration is
    aborted.

    This patch also changes libvirt and neutron fixtures:

    - libvirt fixture was changed to support live migrations of
      instances with regular ports: without this change
      _update_vif_xml() complains about lack of address element in VIF's
      XML.
    - neutron fixture was changed to improve active port binding's
      tracking during live migration: without this change port's
      binding:host_id is not updated when activate_port_binding() is
      called. As a result, list_ports() function returns empty list
      when constants.BINDING_HOST_ID is used in search_opts, which is
      the case for setup_networks_on_host() called with teardown=True.

    Related-bug: #1960412
    Related-bug: #1949808
    Change-Id: I152581deb6e659c551f78eed66e4b0b958b20c53

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/828570
Committed: https://opendev.org/openstack/nova/commit/219520d9cec6a204e0d0f75881d75c8db48e7f56
Submitter: "Zuul (22348)"
Branch: master

commit 219520d9cec6a204e0d0f75881d75c8db48e7f56
Author: Alexey Stupnikov <email address hidden>
Date: Mon Mar 7 16:57:39 2022 +0100

    Clean up when queued live migration aborted

    This patch solves bug #1949808 and bug #1960412 by tuning
    live_migration_abort() function and adding calls to:

    - remove placement allocations for live migration;
    - remove INACTIVE port bindings against destination compute node;
    - restore instance's state.

    Related unit test was adjusted and related functional tests were
    fixed.

    Closes-bug: #1949808
    Closes-bug: #1960412

    Change-Id: Ic97eff86f580bff67b1f02c8eeb60c4cf4181e6a

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.0.0.0rc1

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/835854

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/835855

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/836146

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/836147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/xena)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/835855
Reason: Incorrect change ID

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/835854
Reason: Incorrect change ID

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/836146
Committed: https://opendev.org/openstack/nova/commit/479b8db3ab07dd1f50c029904cca17f3a5708685
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 479b8db3ab07dd1f50c029904cca17f3a5708685
Author: Alexey Stupnikov <email address hidden>
Date: Sat Feb 19 21:38:44 2022 +0100

    Add functional tests to reproduce bug #1960412

    Instance would be affected by problems described in bug #1949808
    and bug #1960412 when queued live migration is aborted.

    This change adds functional test to reproduce problems with
    placement allocations (record for aborted live migration is not
    removed when queued live migration is aborted) and with Neutron port
    bindings (INACTIVE port binding records for destination host are not
    removed when queued live migration is aborted).

    It looks like there are no other modifications introduced by Nova
    control plane which should be reverted when queued live migration is
    aborted.

    This patch also changes libvirt and neutron fixtures:

    - libvirt fixture was changed to support live migrations of
      instances with regular ports: without this change
      _update_vif_xml() complains about lack of address element in VIF's
      XML.
    - neutron fixture was changed to improve active port binding's
      tracking during live migration: without this change port's
      binding:host_id is not updated when activate_port_binding() is
      called. As a result, list_ports() function returns empty list
      when constants.BINDING_HOST_ID is used in search_opts, which is
      the case for setup_networks_on_host() called with teardown=True.

    Related-bug: #1960412
    Related-bug: #1949808
    Change-Id: I152581deb6e659c551f78eed66e4b0b958b20c53
    (cherry picked from commit 1ad287bf9a8f65ce68c14f4634775f58abda15c2)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/836147
Committed: https://opendev.org/openstack/nova/commit/8670ca8bb290d7b434437ebf6c65d2e396498df8
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 8670ca8bb290d7b434437ebf6c65d2e396498df8
Author: Alexey Stupnikov <email address hidden>
Date: Mon Mar 7 16:57:39 2022 +0100

    Clean up when queued live migration aborted

    This patch solves bug #1949808 and bug #1960412 by tuning
    live_migration_abort() function and adding calls to:

    - remove placement allocations for live migration;
    - remove INACTIVE port bindings against destination compute node;
    - restore instance's state.

    Related unit test was adjusted and related functional tests were
    fixed.

    Closes-bug: #1949808
    Closes-bug: #1960412
    Change-Id: Ic97eff86f580bff67b1f02c8eeb60c4cf4181e6a
    (cherry picked from commit 219520d9cec6a204e0d0f75881d75c8db48e7f56)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/841760

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/841736

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/845753

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/845754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.1.1

This issue was fixed in the openstack/nova 24.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/841760
Committed: https://opendev.org/openstack/nova/commit/3d698040a17f39954fe095502dafb2b193120243
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 3d698040a17f39954fe095502dafb2b193120243
Author: Alexey Stupnikov <email address hidden>
Date: Sat Feb 19 21:38:44 2022 +0100

    Add functional tests to reproduce bug #1960412

    Instance would be affected by problems described in bug #1949808
    and bug #1960412 when queued live migration is aborted.

    This change adds functional test to reproduce problems with
    placement allocations (record for aborted live migration is not
    removed when queued live migration is aborted) and with Neutron port
    bindings (INACTIVE port binding records for destination host are not
    removed when queued live migration is aborted).

    It looks like there are no other modifications introduced by Nova
    control plane which should be reverted when queued live migration is
    aborted.

    This patch also changes neutron fixture:

    - neutron fixture was changed to improve active port binding's
      tracking during live migration: without this change port's
      binding:host_id is not updated when activate_port_binding() is
      called. As a result, list_ports() function returns empty list
      when constants.BINDING_HOST_ID is used in search_opts, which is
      the case for setup_networks_on_host() called with teardown=True.

    Conflicts:
    - nova/tests/fixtures/libvirt.py
    - nova/tests/fixtures/neutron.py

    NOTE. There is no need to change libvirt fixture because original
    problem with lack of address element is no longer there (I also
    removed related note from commit message itself). NeutronFixture
    class is defined in different place instable/wallaby, but code
    staus the same.

    Related-bug: #1960412
    Related-bug: #1949808
    Change-Id: I152581deb6e659c551f78eed66e4b0b958b20c53
    (cherry picked from commit 1ad287bf9a8f65ce68c14f4634775f58abda15c2)
    (cherry picked from commit 479b8db3ab07dd1f50c029904cca17f3a5708685)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/841736
Committed: https://opendev.org/openstack/nova/commit/750b3640c2fc05b8a5183804cf15d90a7ebd9e8f
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 750b3640c2fc05b8a5183804cf15d90a7ebd9e8f
Author: Alexey Stupnikov <email address hidden>
Date: Mon Mar 7 16:57:39 2022 +0100

    Clean up when queued live migration aborted

    This patch solves bug #1949808 and bug #1960412 by tuning
    live_migration_abort() function and adding calls to:

    - remove placement allocations for live migration;
    - remove INACTIVE port bindings against destination compute node;
    - restore instance's state.

    Related unit test was adjusted and related functional tests were
    fixed.

    Closes-bug: #1949808
    Closes-bug: #1960412
    Change-Id: Ic97eff86f580bff67b1f02c8eeb60c4cf4181e6a
    (cherry picked from commit 219520d9cec6a204e0d0f75881d75c8db48e7f56)
    (cherry picked from commit 8670ca8bb290d7b434437ebf6c65d2e396498df8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/873576

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/873577

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/873579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/873580

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/873579
Reason: pushed change to incorrect branch

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/873577
Reason: Patch became too huge and ugly during backport

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/873576
Reason: Patch became too huge and ugly during backport

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/873580
Reason: Patch became too huge and ugly during backport

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/845754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/845753

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.