current trunk, on Windows crashes at startup 50% of the time

Bug #987962 reported by David Mathog
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Inkscape
Invalid
High
Unassigned
Inkscape Devlibs
Triaged
High
Unassigned

Bug Description

Trunk downloaded and built today (Revision 11289 (?) ) crashes on startup on Windows about half the time. The DOS window shows this just before it goes down:

** (inkscape.exe:2760): WARNING **: Inkscape currently only supports color-interpolation-filters = sRGB

(inkscape.exe:2760): GLib-GObject-WARNING **: gsignal.c:2924: signal id `1' is invalid for instance `042965D8'

(inkscape.exe:2760): GLib-GObject-WARNING **: gsignal.c:2924: signal id `1' is invalid for instance `042965D8'

(inkscape.exe:2760): GLib-GObject-WARNING **: gsignal.c:2924: signal id `1' is invalid for instance `042965D8'

(inkscape.exe:2760): GLib-GObject-WARNING **: gsignal.c:2924: signal id `1' is invalid for instance `042965D8'

(inkscape.exe:2760): GLib-GObject-WARNING **: gsignal.c:2924: signal id `1' is invalid for instance `042965D8'

(inkscape.exe:2760): GLib-GObject-WARNING **: gsignal.c:2924: signal id `1' is invalid for instance `042965D8'

When it works the sRGB line is there but the gsignal lines never appear.

Revision history for this message
David Mathog (mathog) wrote :

Downloaded a second set of devlibs (version 29 as shown by revno, previous version was downloaded Feb 29 2012). diff -r on the two - no differences. The gsignal.c message seems to depend on what else is going on in the system. At some point a different enough environment came into effect that that message went away (no recompilation or anything, just opening and closing other applications.) It crashes at the same rate, just without the gsignal.c messages.

This is Windows XP SP3, 32 bit.

su_v (suv-lp)
tags: added: crash regression win32
Revision history for this message
su_v (suv-lp) wrote :

Can you reproduce the same issue with recent development shapshot builds provided by UweSch (latest available is r11265)?
<https://skydrive.live.com/?cid=09706D11303FA52A&id=9706D11303FA52A!128>

> Trunk downloaded and built today (Revision 11289 (?) )
Are these kind of crashes limited to this revision?
(i.e. what was the last revision you built and used on your system - before updating today - which didn't expose this issue?)

Changed in inkscape:
importance: Undecided → High
Revision history for this message
David Mathog (mathog) wrote :

Pretty much. r11265 also fails about 50% of the time. The one from skydrive is being started differently - mine is started through a .bat file, this one starts by just double clicking inkscape.exe or in a Command Prompt shell giving the path to the executable. In any case, when this one fails it (sometimes) pops up a window saying: "Glib-ERROR **:gmem.239 failed to allocate <some memory> aborting.."

Will try a few other versions in there, to see if I can triangulate a bit.

Revision history for this message
David Mathog (mathog) wrote :

R11066 OK
R11167 OK
R11265 bad

So the problem is somewhere after R11167 and at or before R11265.

Revision history for this message
David Mathog (mathog) wrote :

Glancing through the descriptions of ~100 changes, and noting that this crash involves the brief flash of the outline of a window, it looks like the numerous GDK/GTK changes put in by Alex Valavanis in that period might be a good place to start. Especially if he doesn't normally build on windows. If he does build on Windows, can somebody please contact him and find out which was the most recent version he tested there.

Revision history for this message
su_v (suv-lp) wrote :

Confirming the report on behalf of ScislaC (local build r11290 Win7 64bit):
«(…) the HEAP stuff at the very beginning looks like it's probably the issue.»

warning: HEAP[inkscape.exe]:
warning: HEAP: Free Heap block 886f9e0 modified at 886fb60 after it was freed

Program received signal SIGTRAP, Trace/breakpoint trap.
0x777e04e5 in ntdll!TpWaitForAlpcCompletion ()
   from C:\Windows\system32\ntdll.dll
(gdb) bt full
#0 0x777e04e5 in ntdll!TpWaitForAlpcCompletion ()
   from C:\Windows\system32\ntdll.dll
No symbol table info available.
#1 0x0028e9e4 in ?? ()
No symbol table info available.
#2 0x7779b023 in ntdll!AlpcMaxAllowedMessageLength ()
   from C:\Windows\system32\ntdll.dll
No symbol table info available.
#3 0x088730c8 in ?? ()
No symbol table info available.
#4 0x6278a571 in ?? ()
No symbol table info available.
#5 0x00000188 in ?? ()
No symbol table info available.
#6 0x00000000 in ?? ()
No symbol table info available.

Changed in inkscape:
status: New → Confirmed
Revision history for this message
David Mathog (mathog) wrote :

This may not just be one bug.

Built and tested R11217, it was bad, but not as bad. This revision crashes about 1/8 starts, and runs OK the other 7/8. That suggests that other bug(s) between this and R11265 account for the other 3/8 crashes that revision suffers. Great.

Looks like it will take a -g version and some runs in gdb to figure this out.

Revision history for this message
su_v (suv-lp) wrote :

Update from ScislaC:

Crashes no longer occur after upgrading cairo to latest stable (1.12.0) - on a system with cairo, freetype, and libpng from <http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_Factory/noarch/> (the other two libs were necessary when swapping out the cairo dll) - the rest of the devlibs stayed unchanged.

Revision history for this message
David Mathog (mathog) wrote :

Until such time as devlibs is updated with these changes, could we possibly have slightly more specific instructions for doing this? Specifically, which of the many, many packages from that directory were used?

Presumably these were unpacked using rpm on a linux system (since there is no rpm for mingw) and then copied over. Not all of the pieces in each of those packages is currently in release 29 of devlibs. For instance, share/doc/cairomm-1.0 is absent, yet that is present in mingw32-cairomm-devel-1.10.0-1.94.noarch.rpm. Going through these things directory by directory to make sure we get the right pieces is going to be pretty time consuming.

--------------

I also tried running 11289 in gdb, but could not get the darn thing to crash when run in the debugger, even though it fell all over itself when run directly.

Revision history for this message
ScislaC (scislac) wrote :

These libraries are not built in the same environment as our regular devlibs, so I don't recommend that others try this unless they're comfortable with experimenting. These libraries were placed in the build directory post-build, NOT in my devlibs directory.

Now that you know it was not including the devel packages, the libraries at hand should be pretty straightforward from what I asked ~suv to include (as I went for most recent).

cairo = mingw32-libcairo2-1.12.0-1.5.noarch.rpm
freetype = mingw32-freetype-2.4.9-1.12.noarch.rpm
libpng = mingw32-libpng-1.5.10-1.2.noarch.rpm

Note: Your presumption was incorrect, I used 7-zip on Windows to extract the files (they're ridiculously nested in directories).

Revision history for this message
David Mathog (mathog) wrote :

Followed your instructions in comment 10 and the inkscape binary started without crashing 10/10 times. Looks good.

Good to learn that 7-zip can unpack RPM on windows - that will come in handy.

It is probably time for revision 30 of devlibs, with the newer Cairo included.

What led you to the conclusion this was a Cairo issue?

Thanks.

Side note - upgrading the mingw build environment to the latest versions of everything did not independently resolve this issue. However, the mingw build environment does not include libcairo, which is only in devlibs.

Revision history for this message
ScislaC (scislac) wrote :

Don't ask about what led me to that conclusion. The best I can tell you is that a handful of recent commits switched us to using cairo in more places and a ton of bugs have been fixed (and apparently also introduced) in the recent release of cairo 1.12. In the end it was a really lucky first guess. :D

I'm going to update this report to correctly reflect against the devlibs given your confirmation.

Changed in inkscape-devlibs:
status: New → Confirmed
importance: Undecided → High
Changed in inkscape:
status: Confirmed → Invalid
Revision history for this message
David Mathog (mathog) wrote :

Hate to play devil's advocate, but I've been programming for well over 30 years, and could not count the number of times I have seen programs where making minor changes would move incorrect memory accesses from safe to unsafe locations (or vice versa). So, since we still have no actual data on where things were really going wrong, it is entirely possible that all using the new library did was (temporarily) move the illegal access point(s) to safe locations.

If I get some time today will attempt to get inkscape in wine in valgrind running.

Revision history for this message
David Mathog (mathog) wrote :

Regarding valgrind, wine, and inscape - see bug 989201.

Revision history for this message
David Mathog (mathog) wrote :

To characterize this problem a little better ran inkscape.exe with the old libcairo within GDB until it crashed. Which took a while, since it seemed to run better in gdb than without it. Anyway:

Program received signal SIGSEGV, Segmentation fault.
0x68df0c36 in _cairo_path_fixed_fini () from C:\progs\inkscape3\inkscape\libcairo-2.dll
(gdb) bt
#0 0x68df0c36 in _cairo_path_fixed_fini () from C:\progs\inkscape3\inkscape\libcairo-2.dll
#1 0x68dd1f3a in cairo_fill () from C:\progs\inkscape3\inkscape\libcairo-2.dll
#2 0x6c35d6ab in gdk_window_clear_backing_region () from C:\progs\inkscape3\inkscape\libgdk-win32-2.0-0.dll
#3 0x6c35fab6 in gdk_window_begin_paint_region () from C:\progs\inkscape3\inkscape\libgdk-win32-2.0-0.dll
#4 0x0eeda0b3 in gtk_main_do_event () from C:\progs\inkscape3\inkscape\libgtk-win32-2.0-0.dll
#5 0x6c3641cd in _gdk_window_process_updates_recurse () from C:\progs\inkscape3\inkscape\libgdk-win32-2.0-0.dll
#6 0x6c364181 in _gdk_window_process_updates_recurse () from C:\progs\inkscape3\inkscape\libgdk-win32-2.0-0.dll
#7 0x6c35f6a6 in gdk_window_process_updates_internal () from C:\progs\inkscape3\inkscape\libgdk-win32-2.0-0.dll
#8 0x6c361630 in gdk_window_process_updates () from C:\progs\inkscape3\inkscape\libgdk-win32-2.0-0.dll
#9 0x004f782b in sp_desktop_widget_update_zoom (dtw=0x11ed20c0)
#10 0x004c025d in SPDesktop::set_display_area (this=0x1159ed80, x0=<value optimized out>, y0=154.28571428571425,
    x1=896.42857142857144, y1=<value optimized out>, border=0, log=true) at src/desktop.cpp:1656
#11 0x004c046c in SPDesktop::zoom_absolute_keep_point (this=0x1159ed80, cx=375, cy=520, px=0.5, py=0.5,
    zoom=<value optimized out>) at src/desktop.cpp:1656
#12 0x1159ed80 in ?? ()
#13 0xdb6db6dc in ?? ()
#14 0xc0624db6 in ?? ()
#15 0x92492491 in ?? ()
#16 0x40634924 in ?? ()
#17 0xb6db6db7 in ?? ()
#18 0x408c036d in ?? ()
#19 0xdb6db6dc in ?? ()
#20 0x408badb6 in ?? ()
#21 0x00000000 in ?? ()

So it is definitely crashing in libcairo. The comments about some values being optimized out (at stack 10) strike me as odd
though.

Revision history for this message
David Mathog (mathog) wrote :

Traced the problem down to the call to gdk_window_process_updates() in sp_desktop_widget_update_zoom() in
src/widgets/desktop-widget.cpp. Looked up what that gdk function does here:

  http://developer.gnome.org/gdk/stable/gdk-Windows.html#gdk-window-process-updates

where it says:

Normally GDK calls gdk_window_process_all_updates() on your behalf, so there's no need to call this function unless you want to force expose events to be delivered immediately and synchronously (vs. the usual case, where GDK delivers them in an idle handler). Occasionally this is useful to produce nicer scrolling behavior, for example.

So I removed this one, and the one other instance in the same file. These were the only two places in the whole source tree where this function was called. Rebuilt, and no more crashes (this is with the original cairo, freetype and png libraries).
If there is a difference in the behavior of the program it is very subtle.

So it looks like there is a bug in some underlying library, but if we don't call this function, we don't expose it. And there doesn't seem to be any reason to ever call it. Perhaps whoever put these calls in in the first place has a different view.

Patch attached.

jazzynico (jazzynico)
Changed in inkscape-devlibs:
status: Confirmed → Triaged
Revision history for this message
David Mathog (mathog) wrote :

This is one of those bugs that hops around every time you change something. In order to get the program to crash reliably (there's an oxymoron for you) it was necessary to:

1. compile with -mwindows instead of -mconsole
2. use the version of libglib without the 15% cpu fix (bug 871968)

Even then, it was only 50% crashes, and that seemed to change depending on the state of the rest of the system. (Memory is OK, lengthy memtest86+ was run recently with zero errors.)

Anyway, once it would mostly fail reliably (another oxymoron) I started looking at the differences in src/widgets/desktop-widget.cpp when compared to an older version without this problem. And there's the problem - the differences were in unrelated areas. Just to be sure, copied the older one into the newer release, made two small changes to get it to buld

  GTK::ANCHOR_SOUTH

to

 SP_ANCHOR_SOUTH

and

  spinbutton_defocus (GTK_OBJECT (spin));

to

  spinbutton_defocus (GTK_WIDGET (spin));

Rebuilt, and that crashed too, but not quite as often. I think this may be a case where shuffling the code moves some illegal
memory access into/out of harm's way. So while the "fix" in comment 16 works, it may be just because it reliably moves some sensitive area of data to safety.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.