libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"

Bug #1982284 reported by melanie witt
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
melanie witt
Train
In Progress
Undecided
Unassigned
Ussuri
In Progress
Undecided
Unassigned
Victoria
In Progress
Undecided
Unassigned
Wallaby
In Progress
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned
Yoga
Fix Released
Undecided
Unassigned
Zed
Fix Released
Undecided
Unassigned
Ubuntu Cloud Archive
New
Undecided
Unassigned
Ussuri
New
Undecided
Unassigned
Victoria
New
Undecided
Unassigned
Wallaby
New
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned
Yoga
Fix Released
Undecided
Unassigned
Zed
Fix Released
Undecided
Unassigned

Bug Description

We have seen this downstream where live migration randomly fails with the following error [1]:

  libvirt.libvirtError: internal error: migration was active, but no RAM info was set

Discussion on [1] gravitated toward a possible race condition issue in qemu around the query-migrate command [2]. The query-migrate command is used (indirectly) by the libvirt driver during monitoring of live migrations [3][4][5].

While searching for info about this error, I found a thread on libvir-list from the past [6] where someone else encountered the same error and for them it happened if they called query-migrate *after* a live migration had completed.

Based on this, it seemed possible that our live migration monitoring thread sometimes races and calls jobStats() after the migration has completed, resulting in this error being raised and the migration being considered failed when it was actually complete.

A patch has since been proposed and committed [7] to address the possible issue.

Meanwhile, on our side in nova, we can mitigate this problematic behavior by catching the specific error from libvirt and ignoring it so that a live migration in this situation will be considered completed by the libvirt driver.

Doing this would improve the experience for users that are hitting this error and getting erroneous live migration failures.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205
[2] https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848
[3] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123
[4] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655
[5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats
[6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html
[7] https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/852002

Revision history for this message
Brett Milford (brettmilford) wrote :

Hi Melanie,

We have an large openstack ussuri environment that encounters this issue with some frequency.
I've proposed the fix above, looking forward to your feedback.

Revision history for this message
Tyler Stachecki (tstachecki) wrote :

Hello,

I am able to successfully reproduce this issue in a development cluster. It does take a few hours to trigger, so not something I can easily provide steps for... but happy to test candidate fixes.

Taking the opendev PR and running it through the gauntlet now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/859358

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "melanie witt <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/842687
Reason: This patch does not work properly, so we will proceed with a different patch: https://review.opendev.org/c/openstack/nova/+/852002

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/852002
Committed: https://opendev.org/openstack/nova/commit/9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a
Submitter: "Zuul (22348)"
Branch: master

commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a
Author: Brett Milford <email address hidden>
Date: Thu Aug 4 16:52:33 2022 +1000

    Handle "no RAM info was set" migration case

    This handles the case where the live migration monitoring thread may
    race and call jobStats() after the migration has completed resulting in
    the following error:

        libvirt.libvirtError: internal error: migration was active, but no
        RAM info was set

    Closes-Bug: #1982284
    Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/860732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/860733

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/860734

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/860735

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/860736

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/860737

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/860739

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/nova/+/860732
Committed: https://opendev.org/openstack/nova/commit/00396fa9396324780c09161ed57a86b7e458c26f
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 00396fa9396324780c09161ed57a86b7e458c26f
Author: Brett Milford <email address hidden>
Date: Thu Aug 4 16:52:33 2022 +1000

    Handle "no RAM info was set" migration case

    This handles the case where the live migration monitoring thread may
    race and call jobStats() after the migration has completed resulting in
    the following error:

        libvirt.libvirtError: internal error: migration was active, but no
        RAM info was set

    Closes-Bug: #1982284
    Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267
    (cherry picked from commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/860733
Committed: https://opendev.org/openstack/nova/commit/4316234e63b76e4f9877ec6e924b5c54ea761bbb
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 4316234e63b76e4f9877ec6e924b5c54ea761bbb
Author: Brett Milford <email address hidden>
Date: Thu Aug 4 16:52:33 2022 +1000

    Handle "no RAM info was set" migration case

    This handles the case where the live migration monitoring thread may
    race and call jobStats() after the migration has completed resulting in
    the following error:

        libvirt.libvirtError: internal error: migration was active, but no
        RAM info was set

    Closes-Bug: #1982284
    Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267
    (cherry picked from commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a)
    (cherry picked from commit 00396fa9396324780c09161ed57a86b7e458c26f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/860734
Committed: https://opendev.org/openstack/nova/commit/98d9936e54b900db1bd2d5477a9a1d7e5a7be104
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 98d9936e54b900db1bd2d5477a9a1d7e5a7be104
Author: Brett Milford <email address hidden>
Date: Thu Aug 4 16:52:33 2022 +1000

    Handle "no RAM info was set" migration case

    This handles the case where the live migration monitoring thread may
    race and call jobStats() after the migration has completed resulting in
    the following error:

        libvirt.libvirtError: internal error: migration was active, but no
        RAM info was set

    Closes-Bug: #1982284
    Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267
    (cherry picked from commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a)
    (cherry picked from commit 00396fa9396324780c09161ed57a86b7e458c26f)
    (cherry picked from commit 4316234e63b76e4f9877ec6e924b5c54ea761bbb)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/859358
Committed: https://opendev.org/openstack/nova/commit/ca9b7defe857404162bfd3909652bbd98514ffa8
Submitter: "Zuul (22348)"
Branch: master

commit ca9b7defe857404162bfd3909652bbd98514ffa8
Author: melanie witt <email address hidden>
Date: Mon Sep 26 22:43:51 2022 +0000

    Unit test exceptions raised duing live migration monitoring

    This adds some test coverage to verify the expected handling of
    exceptions raised in Guest.get_job_info() during live migration
    monitoring.

    In the case of the related bug fix, this testing verifies that the post
    live migration method is called after handling the special case and
    workaround of libvirt error "migration was active, but no RAM info was
    set".

    Related-Bug: #1982284

    Change-Id: Ibab54aa4f2e929b384b75e7e0a21c8b059b680c5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.2.0

This issue was fixed in the openstack/nova 24.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.1.0

This issue was fixed in the openstack/nova 25.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.1.0

This issue was fixed in the openstack/nova 26.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.0.0.0rc1

This issue was fixed in the openstack/nova 27.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/860739
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.