retracer attaching incomplete backtrace

Bug #705572 reported by Bryce Harrington
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Apport
Fix Released
Undecided
Unassigned
apport (Ubuntu)
Fix Released
High
Martin Pitt
Natty
Fix Released
High
Martin Pitt

Bug Description

Binary package hint: apport

I was pleasantly surprised this morning to see xserver crash reports collected by apport once again. (Did something get fixed in the plumbing layer or apport to make this start working again?)

Anyway, as can be seen on bugs #705078, #705089, #705295 something is still not quite right. It appears the retracer is decoding the first few calls (which are just cleanup from the crash) and then stops after call #2.

Thread 1 (process 4692):
#0 XISendDeviceHierarchyEvent (flags=0xbfcccf9c)
  ...
#1 0x08067984 in DisableDevice (dev=0xaaa8e70, sendevent=1 '\001')
  ...
#2 0xbfcccf9c in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

What we need to see is the bits after call #9, I think.

Revision history for this message
Bryce Harrington (bryce) wrote :

Maybe this isn't apport but rather something lower down in the toolchain but I'm unsure what. Martin, please direct this to a better package accordingly.

I'd really like to see this solved since it'd enable us to work on xserver crashes.

Changed in apport (Ubuntu):
importance: Undecided → High
assignee: nobody → Martin Pitt (pitti)
milestone: none → natty-alpha-2
Revision history for this message
Sebastien Bacher (seb128) wrote :

Hello Bryce,

I've just been reading some of those crashes, let's take bug #705295

the stacktrace has

"#2 0xbfa682cc in ?? ()
No symbol table info available."

Which according to the ProcMaps.txt table is
"bfa48000-bfa6a000 rw-p 00000000 00:00 0 [stack]"

so those addresses are in the stack, seems rather a corruption or a broken crash file than apport or gdb issues...

Revision history for this message
Sebastien Bacher (seb128) wrote :

the issue is the same in the other bugs you listed

Martin Pitt (pitti)
Changed in apport (Ubuntu Natty):
milestone: natty-alpha-2 → natty-alpha-3
Revision history for this message
Martin Pitt (pitti) wrote :

Unfortunately the retracer logs don't have these bugs any more, presumably they got lost in the ronne -> osageorange server move for the retracer?

Unfortunately I'm not able to synthetically reproduce an X.org crash with apport info.

$ sudo X :1
[... letting this start, start DISPLAY=:1 xeyes ; then sending a SIGSEGV]

Backtrace:
0: X (xorg_backtrace+0x26) [0x4a1586]
1: X (0x400000+0x6078a) [0x46078a]
2: /lib/libpthread.so.0 (0x7f277e917000+0xfc80) [0x7f277e926c80]
3: /lib/libc.so.6 (__select+0x13) [0x7f277d92b423]
4: X (WaitForSomething+0x19b) [0x45ad7b]
5: X (0x400000+0x2d7c2) [0x42d7c2]
6: X (0x400000+0x21abe) [0x421abe]
7: /lib/libc.so.6 (__libc_start_main+0xfe) [0x7f277d871d1e]
8: X (0x400000+0x21669) [0x421669]
Recieved signal 11 sent by process 27616, uid 0

Caught signal 11 (Segmentation fault). Server aborting

but apport doesn't even get called (according to /var/log/apport.log).

Revision history for this message
Martin Pitt (pitti) wrote :

Is there a more realistic way to synthesize X.org segfaults some how?

Revision history for this message
Martin Pitt (pitti) wrote :

(unmarking as alpha 3 blocker)

Changed in apport (Ubuntu Natty):
milestone: natty-alpha-3 → none
status: New → Incomplete
Revision history for this message
Bryce Harrington (bryce) wrote :

Look at bug #718365 as an example - the trace that apport attaches is a full backtrace (although it is missing the symbols I actually care about, but nevermind that):

Thread 1 (Thread 1328):
#0 0x00007f22a1200b45 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = <value optimized out>
        selftid = <value optimized out>
#1 0x00007f22a1204496 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x200000009, sa_sigaction = 0x200000009}, sa_mask = {__val = {140736973547920, 140736973553062, 10, 139786709954340, 3, 140736973547914, 6, 139786709954344, 2, 140736973547902, 2, 139786709945365, 1, 139786709954340, 3, 140736973547908}}, sa_flags = 12, sa_restorer = 0x7f22a130ef28}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007f22a12395db in __libc_message (do_abort=2, fmt=0x7f22a1310790 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
        ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7fffe150b300, reg_save_area = 0x7fffe150b210}}
        ap_copy = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffe150b300, reg_save_area = 0x7fffe150b210}}
        fd = 2
        on_2 = <value optimized out>
        list = <value optimized out>
        nlist = <value optimized out>
        cp = <value optimized out>
        written = <value optimized out>
#3 0x00007f22a1243416 in malloc_printerr (action=3, str=0x7f22a130d95d "corrupted double-linked list", ptr=<value optimized out>) at malloc.c:6283
        buf = "0000000003f282e0"
        cp = <value optimized out>
#4 0x00007f22a12453fc in _int_free (av=0x7f22a154a1a0, p=0x3f282e0) at malloc.c:4964
        size = 1328
        fb = <value optimized out>
        nextchunk = 0x3f28810
        nextsize = 1120
        nextinuse = <value optimized out>
        prevsize = <value optimized out>
        bck = <value optimized out>
        fwd = 0x3bbb370
        errstr = 0x0
        __func__ = "_int_free"
#5 0x00007f22a1249153 in __libc_free (mem=<value optimized out>) at malloc.c:3738
        ar_ptr = 0x7f22a154a1a0
        p = <value optimized out>
        hook = <value optimized out>
#6 0x00007f229fea9c94 in ?? () from /usr/lib/xorg/modules/extensions/libglx.so
No symbol table info available.
#7 0x000000000044bdec in FreeClientResources ()
No symbol table info available.
#8 0x000000000044bea9 in FreeAllResources ()
No symbol table info available.
#9 0x0000000000421af4 in _start ()
No symbol table info available.

And here is what the retracer attaches:

.
Thread 1 (process 1328):
#0 0x00007f22a1200b45 in *__GI_raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
 resultvar = 0
 pid = <value optimized out>
 selftid = <value optimized out>
#1 0x00007f22a1204496 in *__GI_abort () at abort.c:59
 save_stage = Unhandled dwarf expression opcode 0x9f

Changed in apport (Ubuntu Natty):
milestone: none → natty-alpha-3
Revision history for this message
Bryce Harrington (bryce) wrote :

Frankly it would be more helpful if the retracer were not running, and let me just analyze the cores myself. Currently the retracer is giving erroneous results and deleting the core file, which is making it very hard to diagnose this widely reported (but hard to reproduce) error.

Revision history for this message
Bryce Harrington (bryce) wrote :

Re-reading the last comment, that was more bitchy than intended!

But... is there a way to get the core file from one of these "Xorg assert failure: *** glibc detected *** /usr/bin/X: corrupted double-linked list" bugs? We seem to be getting half a dozen reports a day but the backtraces apport is collecting are incomplete, and it'd be easier to diagnose if I had a core file in hand.

I'm setting this to alpha-3 mainly because this issue is inhibiting solving an X issue I want solved by alpha-3; if I can get a core file and if it helps solve the bug, then the milestone isn't needed.

Revision history for this message
Bryce Harrington (bryce) wrote :

Nevermind, finally found one amongst the dupes.

Changed in apport (Ubuntu Natty):
milestone: natty-alpha-3 → none
Revision history for this message
Bryce Harrington (bryce) wrote :

Hmm, I'm running into more xserver crash bugs with this issue. The retracer posts a truncated (unhelpful) stacktrace (printing 2-3 levels of stack then saying something about dwarf op codes), then deletes the core file from the bug report. If I can get to a dupe bug before the retracer hits it, I can get a useful backtrace out of it by running gdb on it by hand. So it's not that the crash file was corrupted or anything like theorized in comment #2, just that the retracer is broken.

Pitti, could you please disable the retracer from operating on the xorg-server bug reports for now? There seem to be few enough crashers that I can handle them by hand (actually, I'm finding using gdb on the core file directly gives me more interesting info than the retracer gives anyway). Thanks!

Changed in apport (Ubuntu Natty):
status: Incomplete → Triaged
milestone: none → natty-alpha-3
Revision history for this message
Martin Pitt (pitti) wrote :

Ah, bug 718365 is indeed a different kind, thanks for pointing this out! There, the retraced stack trace breaks at "Unhandled dwarf expression opcode 0x9f"; the reason likely is that this would require a newer gdb than we have installed in our retracer chroots. Unfortunately newer gdbs act up on the retracers, so it's not easy to update them.

However, it should at least keep the core dump in this case, and not duplicate bugs on just the short and broken one (which is just an assert failure, probably even from gdb itself). I'll update the stack unwinding logic to include this case as well (which should result in an empty stack trace top), this will then count as a failed retrace, and the bot will keep the core dump attachment. As a second step, I'll try to update the retracer chroots.

Revision history for this message
Martin Pitt (pitti) wrote :

Adding an upstream task for the retracer part, as this is not related to the Ubuntu package.

Changed in apport:
status: New → Triaged
Revision history for this message
Martin Pitt (pitti) wrote :

BTW, if you have a bug report with the apport data, you might find it useful to use "apport-retrace -g <bug number>" yourself. This will use the local gdb and produce better results.

Revision history for this message
Martin Pitt (pitti) wrote :

apport (1.18-0ubuntu2) natty; urgency=low

  * Merge from trunk:
    - Update stack unwind patterns for current glib (slightly changed function
      names), and also ignore a preceding '*'. (LP: #716251)
    - Fix crash_signature() to fail if there is an empty or too short
      StacktraceTop.

 -- Martin Pitt <email address hidden> Sun, 20 Feb 2011 20:31:02 +0100

Changed in apport (Ubuntu Natty):
status: Triaged → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

I tried installing a newer gdb into the chroots (7.2 instead of 6.8), and it still fails with an error, so this doesn't work for now.

Martin Pitt (pitti)
Changed in apport:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.