Mir

/usr/sbin/unity-system-compositor:*** Error in `unity-system-compositor': free(): invalid pointer: ADDR ***

Bug #1376324 reported by errors.ubuntu.com bug bridge
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
High
Unassigned
Mir
Fix Released
High
Alan Griffiths
0.8
Fix Released
High
Daniel van Vugt
Unity System Compositor
Invalid
High
Unassigned
mir (Ubuntu)
Fix Released
High
Unassigned
mir (Ubuntu RTM)
Fix Released
High
Unassigned
unity-system-compositor (Ubuntu)
Invalid
High
Unassigned

Bug Description

The Ubuntu Error Tracker has been receiving reports about a problem regarding unity-system-compositor. This problem was most recently seen with version 0.0.5+14.10.20140917-0ubuntu1, the problem page at https://errors.ubuntu.com/problem/bfb855ebc60f251bf495a2ffb24f2924c50bdbf8 contains more details.

*** WARNING ***
Due to the limited nature of the stack trace comparison in errors.ubuntu.com, ALL heap corruption bugs in unity-system-compositor will land here. That does NOT make them the same as this bug.

Tags: utopic

Related branches

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

This looks as though the send_response_result member of ProtobufResponder is no longer valid (i.e. that the ProtobufResponder at this=0xac900574 has likely been freed).

I'm not sure what sequence of events would lead to a send_response() message being passed to a no-longer valid ProtobufResponder - but is likely to be a race within Mir between closing the connection and the compositor::BufferQueue::give_buffer_to_client() sending a buffer to the client.

I'll audit the Mir code in this area.

Changed in mir (Ubuntu):
status: New → In Progress
assignee: nobody → Alan Griffiths (alan-griffiths)
status: In Progress → New
assignee: Alan Griffiths (alan-griffiths) → nobody
Changed in mir:
status: New → In Progress
assignee: nobody → Alan Griffiths (alan-griffiths)
importance: Undecided → High
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I've not been able to reproduce yet[1], but looking at the code I can't see anything that prevents a buffer stream being owned by the compositor (for composition) after the corresponding Surface and Session have been closed. If that then attempts to "complete" a buffer swap then we would see the above result.

If I'm right then adding code to ~BasicSurface() to prevent the surface_buffer_stream completing pending swaps is all that is needed.

[1] I have to leave myself something to do tomorrow.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Crashing in free() like this means something else in the USC process has corrupted the heap. And the effect is delayed, so you won't find the root cause of the corruption in the stack traces. We'll have to do some heap checking on live unity-system-compositor processes instead.

Changed in mir:
status: In Progress → Incomplete
Changed in unity-system-compositor:
status: New → Incomplete
Changed in mir (Ubuntu):
status: New → Incomplete
Changed in unity-system-compositor (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Actually, no. There is a chance it's crashing where the error occurred, but it's usually unlikely. :)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Our experience clearly differs.

I find that it is often the second free of a block that triggers the error. As the occasions when the free list has previously been corrupted take disproportionately more time to solve I'm hoping this is an "easy" one. ;)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It would depend entirely on how much the code touches the heap. In a reasonably efficient project, the heap (structure not content) doesn't change that much. You always have to assume malloc/free/new/delete are slow and should avoid them. So the offending corruption could have happened some time ago.

Changed in mir:
status: Incomplete → In Progress
Changed in mir:
milestone: none → 0.8.0
Changed in mir (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → High
Changed in mir:
milestone: 0.8.0 → 0.9.0
no longer affects: mir
no longer affects: mir/0.8
Changed in mir:
assignee: nobody → Alan Griffiths (alan-griffiths)
importance: Undecided → High
milestone: none → 0.9.0
status: New → In Progress
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.8.0

Changed in mir:
status: In Progress → Fix Committed
Changed in unity-system-compositor:
status: Incomplete → Invalid
Changed in unity-system-compositor (Ubuntu):
status: Incomplete → Invalid
Changed in mir (Ubuntu RTM):
importance: Undecided → High
status: New → Triaged
Changed in mir:
status: Fix Committed → In Progress
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.9.0

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.9.0+15.04.20141125-0ubuntu1

---------------
mir (0.9.0+15.04.20141125-0ubuntu1) vivid; urgency=medium

  [ Alberto Aguirre ]
  * New upstream release 0.9.0 (https://launchpad.net/mir/+milestone/0.9.0)
    - Enhancements:
      . New simpler API to configure and run a mir server.
      . The event loop is now based on GLib's main loop library instead of
        Boost.Asio.
      . For Android platforms, the server now sends buffer fence fds to its
        clients instead of potentially stalling the compositor thread waiting
        for them to be signalled.
      . New client debug interface to translate from surface to screen
        coordinates.
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI bumped to 3
      . Mirplatform ABI bumped to 4
      . Mirserver ABI bumped to 27
    - Bug fixes:
      . Add a debug interface to translate from surface to screen coordinates
        (LP: #1346633)
      . Ensure a buffer requested by a surface is not delivered
        after the surface is deleted (LP: #1376324)
      . Overlays are not displayed onscreen in some positions (LP: #1378326)
      . Server aborts when an exception is thrown from the main thread
        (LP: #1378740)
      . Fix race causing lost alarm notifications (LP: #1381925)
      . Avoid lifecycle notifications racing with connection release
        (LP: #1386646)
      . Improve error checking and reporting for the client library
       (LP: #1390388)
      . Mir demo-shell now detects power button using proper Linux scan codes
       (LP: #1303817)
      . A prompt session with an invalid application pid should be an error
        (LP: #1377968)
      . When XDG_RUNTIME_DIR is defined but pointing to a non-existing
        directory use "/tmp" (LP: #1304873)
      . [regression] demo-shell bypass is not used on fullscreen surfaces if
        there are windowed surfaces behind (LP: #1378706)
      . Mir upgrade through dist-upgrade installs incorrect platform
        (LP: #1378995)
      . Fix Mir progressbar example using internal glibc defines(LP: #239272)
      . Stop the default_lifecycle_event_handler raising SIGHUP while
        disconnecting (LP: #1386185)
      . [regression] Mir fails to build with MIR_ENABLE_TESTS=OFF (LP: #1388539)
      . [regression] mir_demo_server_basic does not start (LP: #1391923)

  [ Ubuntu daily release ]
  * New rebuild forced
 -- Ubuntu daily release <email address hidden> Tue, 25 Nov 2014 17:49:24 +0000

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This bug is still an issue. If you look at:
https://errors.ubuntu.com/problem/bfb855ebc60f251bf495a2ffb24f2924c50bdbf8

You will see Ubuntu Touch devices (including RTM) hitting this crash every day.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I expect the fix is correct; we just haven't backported it to 0.8 yet.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sadly, not actually fixed yet. The same crash has occurred 52 times and counting in vivid images that contain the "fix" (with Mir 0.9.0):
https://errors.ubuntu.com/problem/bfb855ebc60f251bf495a2ffb24f2924c50bdbf8

Changed in mir:
milestone: 0.9.0 → 0.11.0
status: Fix Released → Triaged
assignee: Alan Griffiths (alan-griffiths) → nobody
Changed in mir (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

BTW, errors.ubuntu.com's idea of "StacktraceAddressSignature" looks like it could be subject to collisions. In the cases mentioned in comment #13, they're only grouped together because they all have crashes that are identical to depth 4:

#0 __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44
No locals.
#1 0xb6808e5e in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _a1 = 0
        _a3tmp = 6
        _a1tmp = 0
        _a3 = 6
        _nametmp = 268
        _a2tmp = 2338
        _a2 = 2338
        _name = 268
        _sys_result = <optimized out>
        pd = 0xae2ff3d0
        pid = 0
        selftid = 2338
#2 0xb6809b4e in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {2922372744, 19, 2922379408, 3061935093, 0, 28, 2922374040, 2923454184, 0, 3061940382, 3061940382, 3061940382, 3062538872, 0, 2922373148, 3010790933, 37, 128, 1, 0, 1, 0, 2922372840, 3010933547, 144, 2923450104, 0, 0, 0, 0, 0, 0}}, sa_flags = 0, sa_restorer = 0xae2fdff0}
        sigs = {__val = {32, 0 <repeats 31 times>}}
#3 0xb68323f8 in __libc_message (do_abort=<optimized out>, fmt=0xb68b0518 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
        ap = {__ap = 0xae2fe03c}
        fd = 2
        on_2 = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>

Changed in unity-system-compositor (Ubuntu):
status: Invalid → Confirmed
importance: Undecided → High
Changed in unity-system-compositor:
status: Invalid → Confirmed
importance: Undecided → High
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Oooh, we might be getting bug 1401488, which would have a similar signature and could be accidentally grouped in here from Mir 0.9 devices.

Changed in mir:
milestone: 0.11.0 → 0.9.0
status: Triaged → Fix Released
Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in unity-system-compositor (Ubuntu):
status: Confirmed → Invalid
Changed in unity-system-compositor:
status: Confirmed → Invalid
Changed in mir:
assignee: nobody → Alan Griffiths (alan-griffiths)
Revision history for this message
kevin gunn (kgunn72) wrote :

Adding canonical-devices-system-image in order to determine if we may add this our rtm branches.
No one is complaining, but we are seeing this bug which has a solution, show up in some crash logs.
Please target for an appropriate ww##.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

This could explain some reports, would definitely land it

Changed in canonical-devices-system-image:
importance: Undecided → High
milestone: none → ww05-2015
status: New → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Suspicious still. There has been one similar (heap corruption?) crash on Mir 0.10.0 since it got released:
https://errors.ubuntu.com/oops/73fadb24-9ca5-11e4-a40f-fa163e78b027

Not sure what that is because we can't see the full stack trace without it getting grouped under:
https://errors.ubuntu.com/problem/bfb855ebc60f251bf495a2ffb24f2924c50bdbf8
which *might* be a different full stack trace... ?

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.8.2+15.04.20150115~rtm-0ubuntu1

---------------
mir (0.8.2+15.04.20150115~rtm-0ubuntu1) 14.09; urgency=medium

  [ Daniel van Vugt ]
  * Bug fix release 0.8.2 (https://launchpad.net/mir/+milestone/0.8.2)
    - ABIs all unchanged. No downstream projects need rebuilding.
    - Fixes bug: Crash in /usr/sbin/unity-system-compositor:*** Error in
      `unity-system-compositor': free(): invalid pointer: ADDR ***
      (LP: #1376324)
 -- Ubuntu daily release <email address hidden> Thu, 15 Jan 2015 15:55:24 +0000

Changed in mir (Ubuntu RTM):
status: Triaged → Fix Released
Changed in canonical-devices-system-image:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.