Comment 10 for bug 1406220

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
This came up by our checker for bug inactivity and I beg (all) your pardon, but to action on this is hard without a way to reproduce.

All the fail on the stacktrace top are irrellevant as it is the (mis)-usage by openvpn we would have to spot and that is rather down the trace.

Comparing the linked and the stacktrace here show that they seem not to be the same.
This one here is while initializing a tunnel instance and the following cleanup that fails on freeing resources.
#9 0x00007fa766263f57 in do_close_tun (c=c@entry=0x7fffc95a38d0, force=force@entry=false) at init.c:1546
        tuntap_actual = 0x7fa766df4228 "tun0"
        local = 168305898
        remote_netmask = 168305897
        gc = {list = 0x7fa766df4220}
#10 0x00007fa7662674f3 in close_instance (c=c@entry=0x7fffc95a38d0) at init.c:3512
No locals.
#11 0x00007fa7662679a8 in close_context (c=c@entry=0x7fffc95a38d0, sig=sig@entry=-1, flags=flags@entry=4) at init.c:3687
No locals.
#12 0x00007fa76626863a in init_instance (c=c@entry=0x7fffc95a38d0, env=env@entry=0x7fa766de6a38, flags=flags@entry=4) at init.c:3473
        options = 0x7fffc95a38d0
        child = false
        link_socket_mode = <optimized out>
#13 0x00007fa766269250 in init_instance_handle_signals (c=0x7fffc95a38d0, env=0x7fa766de6a38, flags=4) at init.c:3233
No locals.

While the linked bug in the crash tracker close but more around freeing a key schedule including the openssl bits:
#20 0x00005583969a4586 in tls_ctx_free (ctx=ctx@entry=0x7ffc77eb8170) at ssl_openssl.c:141
No locals.
#21 0x0000558396957528 in key_schedule_free (ks=ks@entry=0x7ffc77eb8138, free_ssl_ctx=<optimized out>) at init.c:1933
No locals.
#22 0x000055839695a567 in do_close_free_key_schedule (free_ssl_ctx=<optimized out>, c=0x7ffc77eb7980) at init.c:2819
No locals.
#23 close_instance (c=c@entry=0x7ffc77eb7980) at init.c:3506
No locals.
#24 0x000055839695ab08 in close_context (c=c@entry=0x7ffc77eb7980, sig=sig@entry=-1, flags=flags@entry=4) at init.c:3687
No locals.
#25 0x000055839695b79a in init_instance (c=c@entry=0x7ffc77eb7980, env=env@entry=0x5583982c9ba8, flags=flags@entry=4) at init.c:3473
        options = 0x7ffc77eb7980
        child = false
        link_socket_mode = <optimized out>

They go down very different paths, free different ressources and therefore seem not directly related to me. Of course they could be of the same root cause being a mem overwrite, but it is not confirm-able right now.

The only good thing I can see here is that the auto reported crash ceased to show up since Xenial. Yet it could as well just have changed it's signature to be detected as the same issue.

I have read through the code a bit but nothing obvious stood out, so without better debugging I can't try to fix. Since it might have been fixed in a latter version as indicated by the crash report I checked the git repo, but there was no change directly pointing to this or similar issues that I considerd having a high chance and worth a try.

As bad as the issue is - unless one is willing to read, read, read code with a high chance to still not find anything what really would be needed is a reliable reproducer.
Like configure it like A,B,C and then run D which will trigger the issue.
That then would have to be run in tools like valgrind to detect the likely happening spurious mem overwrite.