libvirtd randomly crashes on xenial nodes with "realloc(): invalid next size"

Bug #1638982 reported by Matt Riedemann
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Gate
Fix Released
Medium
Clark Boylan
libvirt (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Revision history for this message
Matt Riedemann (mriedem) wrote :

Removed nova from this bug as there is likely not anything nova can do about this, it's a bug in libvirt 1.3.1 on xenial.

no longer affects: nova
Changed in openstack-gate:
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I don't know why I got a notification of this today ?!, but well here I am looking at it.

I've found bugs like that (The signatures only slightly differ) reported very sporadically back til 2011. So it seems a long standing bug at least the overall exit path (might have different root causes).
But it was never reproducible enough.

Seeing 16 occurrences last week in your Kibana seems to be closer to reproduce than ever.

Yet I wanted to write what is needed and in terms of progress-already-being-made I found another bug which covered more or less what I wanted to write.
Therefore I wanted to ask - is this a dup of 1643911?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi (cross post I know),
there is a bit of a "somewhat dup" context around the list of the following bugs:
- bug 1646779
- bug 1643911
- bug 1638982
- bug 1673483

Unfortunately, I’ve never hit these bugs in any of my libvirt/qemu runs and these are literally thousands every day due to all the automated tests. I also checked with our Ubuntu OpenStack Team and they didn't see them so far either. That makes it hard to debug them in depth as you will all understand.

But while the “signature” on each bug is different, they share a lot of things still (lets call it the bug "meta signature"):
- there is no way to recreate yet other than watching gate test crash statistics
- they seem to haunt the openstack gate testing more than anything else
- most are directly or indirectly related to memory corruption

As I’m unable to reproduce any of these bugs myself, I’d like to get some help from anyone that can help to recreate. Therefore I ask all people affected (mostly the same on all these bugs anyway) to test the PPAs I created for bug 1673483. That is the one bug where thanks to the great help of Kashyp, Matt and Dan (and more) at least a potential fix was identified.

That means two ppa's for you to try:
- Backport of the minimal recommended fix: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2620
- Backport of the full series of related fixes: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2619

Especially since the potential error of these fixes refers to almost anything from memleak to deadlock there is a chance that they all might be caused by the same root cause.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
Clark Boylan (cboylan) wrote :

It should be noted that after switching to using Ubuntu Cloud Archive which includes newer libvirt this issue went away in the gate.

Changed in openstack-gate:
status: Confirmed → Fix Released
importance: Undecided → Medium
assignee: nobody → Clark Boylan (cboylan)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.