Hi,
This came up by our checker for bug inactivity and I beg (all) your pardon, but to action on this is hard without a way to reproduce.
All the fail on the stacktrace top are irrellevant as it is the (mis)-usage by openvpn we would have to spot and that is rather down the trace.
Comparing the linked and the stacktrace here show that they seem not to be the same.
This one here is while initializing a tunnel instance and the following cleanup that fails on freeing resources.
#9 0x00007fa766263f57 in do_close_tun (c=c@entry=0x7fffc95a38d0, force=force@entry=false) at init.c:1546 tuntap_actual = 0x7fa766df4228 "tun0"
local = 168305898 remote_netmask = 168305897
gc = {list = 0x7fa766df4220}
#10 0x00007fa7662674f3 in close_instance (c=c@entry=0x7fffc95a38d0) at init.c:3512
No locals.
#11 0x00007fa7662679a8 in close_context (c=c@entry=0x7fffc95a38d0, sig=sig@entry=-1, flags=flags@entry=4) at init.c:3687
No locals.
#12 0x00007fa76626863a in init_instance (c=c@entry=0x7fffc95a38d0, env=env@entry=0x7fa766de6a38, flags=flags@entry=4) at init.c:3473
options = 0x7fffc95a38d0
child = false link_socket_mode = <optimized out>
#13 0x00007fa766269250 in init_instance_handle_signals (c=0x7fffc95a38d0, env=0x7fa766de6a38, flags=4) at init.c:3233
No locals.
While the linked bug in the crash tracker close but more around freeing a key schedule including the openssl bits:
#20 0x00005583969a4586 in tls_ctx_free (ctx=ctx@entry=0x7ffc77eb8170) at ssl_openssl.c:141
No locals.
#21 0x0000558396957528 in key_schedule_free (ks=ks@entry=0x7ffc77eb8138, free_ssl_ctx=<optimized out>) at init.c:1933
No locals.
#22 0x000055839695a567 in do_close_free_key_schedule (free_ssl_ctx=<optimized out>, c=0x7ffc77eb7980) at init.c:2819
No locals.
#23 close_instance (c=c@entry=0x7ffc77eb7980) at init.c:3506
No locals.
#24 0x000055839695ab08 in close_context (c=c@entry=0x7ffc77eb7980, sig=sig@entry=-1, flags=flags@entry=4) at init.c:3687
No locals.
#25 0x000055839695b79a in init_instance (c=c@entry=0x7ffc77eb7980, env=env@entry=0x5583982c9ba8, flags=flags@entry=4) at init.c:3473
options = 0x7ffc77eb7980
child = false link_socket_mode = <optimized out>
They go down very different paths, free different ressources and therefore seem not directly related to me. Of course they could be of the same root cause being a mem overwrite, but it is not confirm-able right now.
The only good thing I can see here is that the auto reported crash ceased to show up since Xenial. Yet it could as well just have changed it's signature to be detected as the same issue.
I have read through the code a bit but nothing obvious stood out, so without better debugging I can't try to fix. Since it might have been fixed in a latter version as indicated by the crash report I checked the git repo, but there was no change directly pointing to this or similar issues that I considerd having a high chance and worth a try.
As bad as the issue is - unless one is willing to read, read, read code with a high chance to still not find anything what really would be needed is a reliable reproducer.
Like configure it like A,B,C and then run D which will trigger the issue.
That then would have to be run in tools like valgrind to detect the likely happening spurious mem overwrite.
Hi,
This came up by our checker for bug inactivity and I beg (all) your pardon, but to action on this is hard without a way to reproduce.
All the fail on the stacktrace top are irrellevant as it is the (mis)-usage by openvpn we would have to spot and that is rather down the trace.
Comparing the linked and the stacktrace here show that they seem not to be the same. 0x7fffc95a38d0, force=force@ entry=false) at init.c:1546
tuntap_ actual = 0x7fa766df4228 "tun0"
remote_ netmask = 168305897 0x7fffc95a38d0) at init.c:3512 0x7fffc95a38d0, sig=sig@entry=-1, flags=flags@ entry=4) at init.c:3687 0x7fffc95a38d0, env=env@ entry=0x7fa766d e6a38, flags=flags@ entry=4) at init.c:3473
link_socket_ mode = <optimized out> handle_ signals (c=0x7fffc95a38d0, env=0x7fa766de6a38, flags=4) at init.c:3233
This one here is while initializing a tunnel instance and the following cleanup that fails on freeing resources.
#9 0x00007fa766263f57 in do_close_tun (c=c@entry=
local = 168305898
gc = {list = 0x7fa766df4220}
#10 0x00007fa7662674f3 in close_instance (c=c@entry=
No locals.
#11 0x00007fa7662679a8 in close_context (c=c@entry=
No locals.
#12 0x00007fa76626863a in init_instance (c=c@entry=
options = 0x7fffc95a38d0
child = false
#13 0x00007fa766269250 in init_instance_
No locals.
While the linked bug in the crash tracker close but more around freeing a key schedule including the openssl bits: entry=0x7ffc77e b8170) at ssl_openssl.c:141 entry=0x7ffc77e b8138, free_ssl_ ctx=<optimized out>) at init.c:1933 free_key_ schedule (free_ssl_ ctx=<optimized out>, c=0x7ffc77eb7980) at init.c:2819 0x7ffc77eb7980) at init.c:3506 0x7ffc77eb7980, sig=sig@entry=-1, flags=flags@ entry=4) at init.c:3687 0x7ffc77eb7980, env=env@ entry=0x5583982 c9ba8, flags=flags@ entry=4) at init.c:3473
link_socket_ mode = <optimized out>
#20 0x00005583969a4586 in tls_ctx_free (ctx=ctx@
No locals.
#21 0x0000558396957528 in key_schedule_free (ks=ks@
No locals.
#22 0x000055839695a567 in do_close_
No locals.
#23 close_instance (c=c@entry=
No locals.
#24 0x000055839695ab08 in close_context (c=c@entry=
No locals.
#25 0x000055839695b79a in init_instance (c=c@entry=
options = 0x7ffc77eb7980
child = false
They go down very different paths, free different ressources and therefore seem not directly related to me. Of course they could be of the same root cause being a mem overwrite, but it is not confirm-able right now.
The only good thing I can see here is that the auto reported crash ceased to show up since Xenial. Yet it could as well just have changed it's signature to be detected as the same issue.
I have read through the code a bit but nothing obvious stood out, so without better debugging I can't try to fix. Since it might have been fixed in a latter version as indicated by the crash report I checked the git repo, but there was no change directly pointing to this or similar issues that I considerd having a high chance and worth a try.
As bad as the issue is - unless one is willing to read, read, read code with a high chance to still not find anything what really would be needed is a reliable reproducer.
Like configure it like A,B,C and then run D which will trigger the issue.
That then would have to be run in tools like valgrind to detect the likely happening spurious mem overwrite.