live migration does not clean up at target node if a failure occurs during post migration
- Series zed
- Bug #1628606
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Low
|
Artom Lifshitz | ||
Wallaby |
New
|
Undecided
|
Unassigned | ||
Xena |
New
|
Undecided
|
Unassigned | ||
Yoga |
New
|
Undecided
|
Unassigned | ||
Zed |
New
|
Undecided
|
Unassigned |
Bug Description
If a live migration fails during the post processing on the source (i.e. failure to disconnect volumes) it can lead to the instance being shutdown on the source node and left in a migrating task state. Also the copy of the instance on the target node will be left running although not usable because neutron networking has not yet been switch to target and nova stills records the instance as being on the source node.
This situation can be resolved as follows:
on target
virsh destroy <instance domain id>
if the compute nodes are NOT using shared storage
sudo rm -rf <instance uuid directory>
Then use nova client as admin to restart the instance on the source node:
nova reset-state --active <instance uuid>
nova reboot --hard <instance uuid>
I will investigate how to address this issue
Changed in nova: | |
assignee: | nobody → Paul Carlton (paul-carlton2) |
Changed in nova: | |
importance: | Undecided → Low |
tags: | added: live-migration |
Paul Carlton (paul-carlton2) wrote : | #1 |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master) | #2 |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | New → In Progress |
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master) | #3 |
Change abandoned by Paul Carlton (<email address hidden>) on branch: master
Review: https:/
Reason: Leaving for someone else to fix as they see fit
Sivasathurappan Radhakrishnan (siva-radhakrishnan) wrote : | #4 |
Since Paul Carton abandoned his patch, removing him as assignee.
Changed in nova: | |
assignee: | Paul Carlton (paul-carlton2) → nobody |
status: | In Progress → Confirmed |
Matthew Booth (mbooth-9) wrote : | #5 |
I think this bug is pretty serious. Say we fail get a cinder error in driver.
ComputeManager.
...
self.
...
self.
The above code runs on the source compute. We update instance.host to the destination in post_live_
Hostever, _post_live_
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master) | #6 |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | nobody → Artom Lifshitz (notartom) |
status: | Confirmed → In Progress |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky) | #7 |
Fix proposed to branch: stable/rocky
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens) | #8 |
Fix proposed to branch: stable/queens
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike) | #9 |
Fix proposed to branch: stable/pike
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master) | #10 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 5513f48dea529fe
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400
Handle volume API failure in _post_live_
Previously, if the call to Cinder in _post_live_
exception went unhandled and prevented us from calling
post_
host and task state. This left the system in an inconsistent state,
with the instance actually running on the destination, but
with instance.host still set to the source. This patch simply wraps
the Cinder API calls in a try/except, and logs the exception instead
of blowing up. While "dumb", this has the virtue of being simple and
minimizing potential side effects. A comprehensive refactoring of
when, where and how we set instance host and task state to try to
guarantee consistency is left as a TODO.
Partial-bug: 1628606
Change-Id: Icb0bdaf454935b
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky) | #11 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit cf3c2f391ad0f9d
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400
Handle volume API failure in _post_live_
Previously, if the call to Cinder in _post_live_
exception went unhandled and prevented us from calling
post_
host and task state. This left the system in an inconsistent state,
with the instance actually running on the destination, but
with instance.host still set to the source. This patch simply wraps
the Cinder API calls in a try/except, and logs the exception instead
of blowing up. While "dumb", this has the virtue of being simple and
minimizing potential side effects. A comprehensive refactoring of
when, where and how we set instance host and task state to try to
guarantee consistency is left as a TODO.
Partial-bug: 1628606
Change-Id: Icb0bdaf454935b
(cherry picked from commit 5513f48dea529fe
tags: | added: in-stable-rocky |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens) | #12 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 53f9c8e51040768
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400
Handle volume API failure in _post_live_
Previously, if the call to Cinder in _post_live_
exception went unhandled and prevented us from calling
post_
host and task state. This left the system in an inconsistent state,
with the instance actually running on the destination, but
with instance.host still set to the source. This patch simply wraps
the Cinder API calls in a try/except, and logs the exception instead
of blowing up. While "dumb", this has the virtue of being simple and
minimizing potential side effects. A comprehensive refactoring of
when, where and how we set instance host and task state to try to
guarantee consistency is left as a TODO.
Partial-bug: 1628606
Change-Id: Icb0bdaf454935b
(cherry picked from commit 5513f48dea529fe
(cherry picked from commit cf3c2f391ad0f9d
tags: | added: in-stable-queens |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike) | #13 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit 28bc3c8221c6f12
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400
Handle volume API failure in _post_live_
Previously, if the call to Cinder in _post_live_
exception went unhandled and prevented us from calling
post_
host and task state. This left the system in an inconsistent state,
with the instance actually running on the destination, but
with instance.host still set to the source. This patch simply wraps
the Cinder API calls in a try/except, and logs the exception instead
of blowing up. While "dumb", this has the virtue of being simple and
minimizing potential side effects. A comprehensive refactoring of
when, where and how we set instance host and task state to try to
guarantee consistency is left as a TODO.
Conflicts in nova/compute/
conditional (and corresponding modifications to tests).
Partial-bug: 1628606
Change-Id: Icb0bdaf454935b
(cherry picked from commit 5513f48dea529fe
(cherry picked from commit cf3c2f391ad0f9d
(cherry picked from commit 53f9c8e51040768
tags: | added: in-stable-pike |
Matt Riedemann (mriedem) wrote : | #14 |
Bug 1818873 is related, possibly a duplicate.
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike) | #15 |
Fix proposed to branch: stable/pike
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike) | #16 |
Change abandoned by huanhongda (<email address hidden>) on branch: stable/pike
Review: https:/
Reason: Maybe cherry-pick 013f421bca4067b
OpenStack Infra (hudson-openstack) wrote : | #17 |
Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/pike
Review: https:/
Reason: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master) | #18 |
Fix proposed to branch: master
Review: https:/
sean mooney (sean-k-mooney) wrote : | #19 |
downstream we are tarcking this as
https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master) | #20 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit a20baeca1f5ebb0
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master) | #21 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 8449b7caefa4a5c
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/zed) | #22 |
Related fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed) | #23 |
Fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/yoga) | #24 |
Related fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga) | #25 |
Fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/zed) | #26 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit 74a618a8118642c
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds updating server after _live_migrate in reproducer
test (missed in main commit)
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
tags: | added: in-stable-zed |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena) | #27 |
Fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/xena) | #28 |
Change abandoned by "Amit Uniyal <email address hidden>" on branch: stable/xena
Review: https:/
Reason: Test from GUI
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena) | #29 |
Related fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed) | #30 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit 643b0c7d35752b2
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby) | #31 |
Related fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby) | #32 |
Fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/victoria) | #33 |
Related fix proposed to branch: stable/victoria
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria) | #34 |
Fix proposed to branch: stable/victoria
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/yoga) | #35 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 71e5a1dbcc22aea
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
(cherry picked from commit 74a618a8118642c
tags: | added: in-stable-yoga |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ussuri) | #36 |
Related fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri) | #37 |
Fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train) | #38 |
Related fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train) | #39 |
Fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga) | #40 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 17ae907569e45cc
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
(cherry picked from commit 643b0c7d35752b2
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train) | #41 |
Related fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train) | #42 |
Fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train) | #43 |
Change abandoned by "Amit Uniyal <email address hidden>" on branch: stable/train
Review: https:/
Reason: Added because I thought its a good idea to have more test cases, abandoning, because its not really required from backport perspective
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train) | #44 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit d3b46af01b7afa1
Author: Artom Lifshitz <email address hidden>
Date: Fri May 1 13:47:44 2020 -0400
func: Add _live_migrate helper to InstanceHelperMixin
This is a partial backport of I70c4715de05d64
that introduced this helper will addressing feedback on
Ia3d7351c18
I78e79112a9
Follow-up for NUMA live migration functional tests
This patch addresses outstanding feedback on
Ia3d7351c18
I78e79112a9
Related-Bug: #1628606
Change-Id: I70c4715de05d64
(cherry picked from commit ca8f1f422298b0a
(cherry picked from commit 726ca4aec5ccea9
tags: | added: in-stable-train |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/xena) | #45 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 5efcc3f695e02d6
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
(cherry picked from commit 74a618a8118642c
(cherry picked from commit 71e5a1dbcc22aea
tags: | added: in-stable-xena |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena) | #46 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 15502ddedc23e65
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
(cherry picked from commit 643b0c7d35752b2
(cherry picked from commit 17ae907569e45cc
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/wallaby) | #47 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit ed1ea71489b60c0
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
(cherry picked from commit 74a618a8118642c
(cherry picked from commit 71e5a1dbcc22aea
(cherry picked from commit 5efcc3f695e02d6
tags: | added: in-stable-wallaby |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby) | #48 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 43c0e40d2889607
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
(cherry picked from commit 643b0c7d35752b2
(cherry picked from commit 17ae907569e45cc
(cherry picked from commit 15502ddedc23e65
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/victoria) | #49 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/victoria
commit 6dda4f7ca3f25a1
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
(cherry picked from commit 74a618a8118642c
(cherry picked from commit 71e5a1dbcc22aea
(cherry picked from commit 5efcc3f695e02d6
(cherry picked from commit ed1ea71489b60c0
tags: | added: in-stable-victoria |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria) | #50 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/victoria
commit 0ac64bba8b7aba2
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
(cherry picked from commit 643b0c7d35752b2
(cherry picked from commit 17ae907569e45cc
(cherry picked from commit 15502ddedc23e65
(cherry picked from commit 43c0e40d2889607
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ussuri) | #51 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/ussuri
commit 5e955b62fa63b72
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
NOTE(auniyal): Differences
* Replaced GlanceFixture with fake.stub_
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
(cherry picked from commit 74a618a8118642c
(cherry picked from commit 71e5a1dbcc22aea
(cherry picked from commit 5efcc3f695e02d6
(cherry picked from commit ed1ea71489b60c0
(cherry picked from commit 6dda4f7ca3f25a1
tags: | added: in-stable-ussuri |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri) | #52 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/ussuri
commit 3885f983c358e5a
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
(cherry picked from commit 643b0c7d35752b2
(cherry picked from commit 17ae907569e45cc
(cherry picked from commit 15502ddedc23e65
(cherry picked from commit 43c0e40d2889607
(cherry picked from commit 0ac64bba8b7aba2
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train) | #53 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit 186db1751080eff
Author: Lee Yarwood <email address hidden>
Date: Wed Jul 29 10:51:34 2020 +0100
func: Introduce a server_
Useful when testing live migration failures that leave the server in an
non ACTIVE state. This change also renames the migration_
arg to migration_
with _create_server.
NOTE(artom): This is to facilitate subsequent backports of live
migration regression tests and bug fixes.
Partial-Bug: #1628606
Change-Id: Ie0852a89fc9423
(cherry picked from commit e70ddd621cb59a8
(cherry picked from commit 2b0cf8edf88c5f8
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train) | #54 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit 3969228eebac197
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000
Adds a repoducer for post live migration fail
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
NOTE(auniyal): Differences
* Replaced GlanceFixture with fake.stub_
NOTE(auniyal): Differences from ussuri to train
* integrated_helpers: Added self.api parameter while calling
* regression: imported mock module, as unitetest.mock is addted post
train release.
as _create_server is not present in train used
Related-Bug: #1628606
Change-Id: I48dbe0aae8a394
(cherry picked from commit a20baeca1f5ebb0
(cherry picked from commit 74a618a8118642c
(cherry picked from commit 71e5a1dbcc22aea
(cherry picked from commit 5efcc3f695e02d6
(cherry picked from commit ed1ea71489b60c0
(cherry picked from commit 6dda4f7ca3f25a1
(cherry picked from commit 5e955b62fa63b72
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train) | #55 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit ec31d4d22e4163a
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100
[compute] always set instance.host in post_livemigration
This change add a new _post_live_
function that wraps _post_live_
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
NOTE(auniyal): Differences from ussuri to train
* nova/tests/
* Added instance.
Change-Id: Ibc4bc7edf1c8d1
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c
(cherry picked from commit 643b0c7d35752b2
(cherry picked from commit 17ae907569e45cc
(cherry picked from commit 15502ddedc23e65
(cherry picked from commit 43c0e40d2889607
(cherry picked from commit 0ac64bba8b7aba2
(cherry picked from commit 3885f983c358e5a
Thinking about this it is not that simple. Once the instance has been started on the target it could do work that would be lost if we destroy it and resurrect the instance on the source. As we found out when Matt Booth was fixing the post copy network bug with certain neutron providers the instance at the target becomes accessible to the network immediately it starts up (due to arp'ing) so effectively once libvirt has un-paused the instance on the target and destroyed the instance on the the source we are effective beyond the point of no return.
Trouble is the instance host does not get updated until the end of the post migration processing so it still looks like it is on the source in a migrating state. If any step in post migration give rise to an exception it skips the rest of the post migration and updates the migration as failed but leaves the instance as is.
The best solution I can think of is to wrap the call to the post method in a try except that will set the instance to the target host if any exception occurs. Given that in some circumstances the source instance could still be present, i.e. not cleaned up and the networking to the target might not be setup correctly so I'm thinking maybe the instance on the target should be placed in error state to indicate that there may be an issue? Alternatively, is the fact that the migration status will be failed enough to indicate that some further operator action might be needed?