collector cores in TCP Forwarder

Bug #1719990 reported by Anish Mehta
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Committed
High
Prashanth Nageshappa
R4.1
Fix Committed
High
Prashanth Nageshappa
Trunk
Fix Committed
High
Prashanth Nageshappa

Bug Description

These collector cores are seen in CSP V3.1FRS deployment.

The contrail analytics version was 4.0.0-15.

root@canvm(analytics):/# gdb /var/crashes/vizd /var/crashes/core.contrail-collec.2880.canvm.1506359705
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /var/crashes/vizd...done.

warning: core file may not match specified executable file.
[New LWP 3317]
[New LWP 3330]
[New LWP 3318]
[New LWP 2890]
[New LWP 2888]
[New LWP 3329]
[New LWP 3304]
[New LWP 3331]
[New LWP 2884]
[New LWP 2880]
[New LWP 3303]
[New LWP 3314]
[New LWP 3312]
[New LWP 2886]
[New LWP 3328]
[New LWP 3319]
[New LWP 3313]
[New LWP 3316]
[New LWP 3302]
[New LWP 3315]
[New LWP 2887]
[New LWP 2885]
[New LWP 2889]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000673d0c in structured_syslog::StructuredSyslogTcpForwarder::Connected (this=0x2ad9980fcbb0) at controller/src/analytics/structured_syslog_server.cc:905
905 controller/src/analytics/structured_syslog_server.cc: No such file or directory.
(gdb) bt
#0 0x0000000000673d0c in structured_syslog::StructuredSyslogTcpForwarder::Connected (this=0x2ad9980fcbb0) at controller/src/analytics/structured_syslog_server.cc:905
#1 0x00000000006813cd in structured_syslog::StructuredSyslogForwarder::PollTcpForwarder (this=0xfa3b40) at controller/src/analytics/structured_syslog_server.cc:975
#2 0x000000000048a5e9 in operator() (this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#3 Timer::TimerTask::Run (this=0xfa8410) at controller/src/base/timer.cc:44
#4 0x0000000000478757 in TaskImpl::execute (this=0x2ad95a1ddc40) at controller/src/base/task.cc:279
#5 0x00002ad951f09b3a in ?? () from /usr/lib/libtbb.so.2
#6 0x00002ad951f05816 in ?? () from /usr/lib/libtbb.so.2
#7 0x00002ad951f04f4b in ?? () from /usr/lib/libtbb.so.2
#8 0x00002ad951f010ff in ?? () from /usr/lib/libtbb.so.2
#9 0x00002ad951f012f9 in ?? () from /usr/lib/libtbb.so.2
#10 0x00002ad951cd3184 in start_thread (arg=0x2ad97a209700) at pthread_create.c:312
#11 0x00002ad9533f037d in eventfd (count=-1744249640, flags=0) at ../sysdeps/unix/sysv/linux/eventfd.c:42
#12 0x0000000000000000 in ?? ()
(gdb)

The other one is:

value has been optimized out
(gdb) bt
#0 0x00002b7120000258 in ?? ()
#1 0x00000000007476af in TcpServer::ConnectHandler (this=<optimized out>, server=..., session=..., error=...) at controller/src/io/tcp_server.cc:406
#2 0x000000000074beea in operator() (a3=..., a2=..., a1=..., p=<optimized out>, this=0x7ffed3135bf0) at /usr/include/boost/bind/mem_fn_template.hpp:393
#3 operator()<boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, const boost::system::error_code&>, boost::_bi::list1<const boost::system::error_code&> > (a=<synthetic pointer>, f=..., this=0x7ffed3135c00) at /usr/include/boost/bind/bind.hpp:457
#4 operator()<boost::system::error_code> (a1=..., this=0x7ffed3135bf0) at /usr/include/boost/bind/bind_template.hpp:47
#5 operator() (this=0x7ffed3135bf0) at /usr/include/boost/asio/detail/bind_handler.hpp:47
#6 asio_handler_invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code> > (function=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /usr/include/boost/asio/handler_invoke_hook.hpp:64
#7 invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code>, boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> > > (context=..., function=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#8 asio_handler_invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code>, boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code> (this_handler=0x7ffed3135bb0, function=...) at /usr/include/boost/asio/detail/bind_handler.hpp:88
#9 invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code>, boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code> > (context=..., function=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#10 boost::asio::detail::reactive_socket_connect_op<boost::asio::ip::tcp, boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> > >::do_complete (owner=0x18fc4d0, base=<optimized out>) at /usr/include/boost/asio/detail/reactive_socket_connect_op.hpp:100
#11 0x000000000074b7ff in complete (bytes_transferred=0, ec=..., owner=..., this=<optimized out>) at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#12 boost::asio::detail::epoll_reactor::descriptor_state::do_complete (owner=0x18fc4d0, base=0x2b715038d890, ec=..., bytes_transferred=<optimized out>) at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:651
#13 0x000000000072dea4 in complete (bytes_transferred=4, ec=..., owner=..., this=0x2b715038d890) at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#14 boost::asio::detail::task_io_service::do_run_one (this=0x18fc4d0, lock=..., this_thread=..., ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:384
#15 0x000000000072e061 in boost::asio::detail::task_io_service::run (this=0x18fc4d0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
#16 0x000000000072c221 in run (this=0x18ba040, ec=...) at /usr/include/boost/asio/impl/io_service.ipp:66
#17 EventManager::Run (this=0x18ba040) at controller/src/io/event_manager.cc:35
#18 0x00000000004334cb in main (argc=<optimized out>, argv=<optimized out>) at controller/src/analytics/main.cc:448
(gdb)

Tags: analytics csp
Anish Mehta (amehta00)
Changed in juniperopenstack:
importance: Undecided → High
Revision history for this message
Anish Mehta (amehta00) wrote :

Please access the cores from the 172.30.200.176 systems, are per access details provided by Hartmut.

Megh Bhatt (meghb)
information type: Proprietary → Public
Revision history for this message
Prashanth Nageshappa (nprashanth) wrote :
Download full text (26.0 KiB)

After moving to 4.0.1 FRS, 3 crashes were seen one SIGABRT is in _quicksort() and 2 SIGSEGV are in TcpServer::ConnectHandler()

root@canvm(analytics):/etc/contrailctl/sangupta# gdb vizd /var/crashes/core.contrail-collec.14026.canvm.1506609811

[New LWP 14049]
[New LWP 14047]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by usr/bin/contrail-collector'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000077996d in TcpServer::ConnectHandler (this=0x2b72701afd10, server=..., session=..., error=...) at controller/src/io/tcp_server.cc:410
410 controller/src/io/tcp_server.cc: No such file or directory.
(gdb) bt
#0 0x000000000077996d in TcpServer::ConnectHandler (this=0x2b72701afd10, server=..., session=..., error=...) at controller/src/io/tcp_server.cc:410
#1 0x000000000077e19a in operator() (a3=..., a2=..., a1=..., p=<optimized out>, this=0x7fff8b82e070) at /usr/include/boost/bind/mem_fn_template.hpp:393
#2 operator()<boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, const boost::system::error_code&>, boost::_bi::list1<const boost::system::error_code&> > (
    a=<synthetic pointer>, f=..., this=0x7fff8b82e080) at /usr/include/boost/bind/bind.hpp:457
#3 operator()<boost::system::error_code> (a1=..., this=0x7fff8b82e070) at /usr/include/boost/bind/bind_template.hpp:47
#4 operator() (this=0x7fff8b82e070) at /usr/include/boost/asio/detail/bind_handler.hpp:47
#5 asio_handler_invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code> > (function=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /usr/include/boost/asio/handler_invoke_hook.hpp:64
#6 invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> >, boost::system::error_code>, boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi::list4<boost::_bi::value<TcpServer*>, boost::_bi::value<boost::intrusive_ptr<TcpServer> >, boost::_bi::value<boost::intrusive_ptr<TcpSession> >, boost::arg<1> (*)()> > > (context=..., function=...)
    at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#7 asio_handler_invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void, boost::_mfi::mf3<void, TcpServer, boost::intrusive_ptr<TcpServer>, boost::intrusive_ptr<TcpSession>, boost::system::error_code const&>, boost::_bi:...

Revision history for this message
Prashanth Nageshappa (nprashanth) wrote :
Download full text (4.2 KiB)

I see another crash in the 4.0.1FRS installation

root@canvm(analytics):/etc/contrailctl/sangupta# gdb vizd /var/crashes/core.contrail-collec.28561.canvm.1506649563
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00002b87ce5d3c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00002b87ce5d3c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00002b87ce5d7028 in __GI_abort () at abort.c:89
#2 0x00002b87ce6102a4 in __libc_message (do_abort=1,
    fmt=fmt@entry=0x2b87ce722310 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3 0x00002b87ce61be23 in malloc_printerr (ptr=0x2b88080d3cf0,
    str=0x2b87ce71e424 "corrupted size vs. prev_size", action=<optimized out>)
    at malloc.c:4998
#4 malloc_consolidate (av=av@entry=0x2b8808000020) at malloc.c:4167
#5 0x00002b87ce61d8b8 in _int_malloc (av=0x2b8808000020, bytes=20448) at malloc.c:3425
#6 0x00002b87ce61fae0 in __GI___libc_malloc (bytes=20448) at malloc.c:2893
#7 0x00002b87cdddbdad in operator new(unsigned long) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00002b87cde37249 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00002b87cde37e0b in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00002b87cde37ea4 in std::string::reserve(unsigned long) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00000000008288e6 in ReplaceBuf (str="", this=0x22cfe40)
    at tools/sandesh/library/cpp/sandesh_session.cc:505
#12 SetBuf (str="", this=0x22cfe40) at tools/sandesh/library/cpp/sandesh_session.cc:496
#13 SandeshReader::ExtractMsg (this=0x22cfe40, buffer=..., result=0x2b87f57358d0,
    NewBuf=<optimized out>) at tools/sandesh/library/cpp/sandesh_session.cc:556
#14 0x000000000082896f in SandeshReader::OnRead (this=0x22cfe40, buffer=...)
---Type <return> to continue, or q <return> to quit---
    at tools/sandesh/library/cpp/sandesh_session.cc:585
#15 0x000000000076b3f0 in call<SslSession*, boost::asio::const_buffer> (u=<optimized out>,
    b1=<synthetic pointer>, this=<optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:156
#16 operator()<SslSession*> (u=<optimized out>, a1=..., this=<optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:171
#17 operator()<boost::_mfi::mf1<void, TcpSession, boost::asio::const_buffer>, boost::_bi::list1<boost::asio::const_buffer&> > (a=<synthetic pointer>, f=..., this=<optimized out>)
    at /usr/include/boost/bind/bind.hpp:313
#18 operator()<boost::asio::const_buffer> (a1=<synthetic pointer>, this=<optimized out>)
    at /usr/include/boost/bind/bind_template.hpp:32
#19 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, TcpSession, boost::...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36182
Submitter: Prashanth Nageshappa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/36183
Submitter: Prashanth Nageshappa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36184
Submitter: Prashanth Nageshappa (<email address hidden>)

Revision history for this message
Prashanth Nageshappa (nprashanth) wrote :

Fix checked into for this crash issue

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/36183
Committed: http://github.com/Juniper/contrail-controller/commit/b3189161ff722726e7c9d9904ae926ee73e002c8
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit b3189161ff722726e7c9d9904ae926ee73e002c8
Author: Prashanth Nageshappa <email address hidden>
Date: Tue Oct 3 03:15:28 2017 -0700

collector TCP Forwarder crashes while reconnecting

do not use shared ptr for StructuredSyslogTcpForwarder
so that we can call explicit DeleteServer

Change-Id: I26356d941d2e62f299ec45b8dc981043aa2dc3e6
Closes-Bug: #1719990

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/36184
Committed: http://github.com/Juniper/contrail-controller/commit/926653311825fa2b1f1d9f557ce842daa12aed8d
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 926653311825fa2b1f1d9f557ce842daa12aed8d
Author: Prashanth Nageshappa <email address hidden>
Date: Tue Oct 3 03:15:28 2017 -0700

collector TCP Forwarder crashes while reconnecting

do not use shared ptr for StructuredSyslogTcpForwarder
so that we can call explicit DeleteServer

Change-Id: I26356d941d2e62f299ec45b8dc981043aa2dc3e6
Closes-Bug: #1719990

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/36182
Committed: http://github.com/Juniper/contrail-controller/commit/4306d6cfc13d304e2fa834da80048323526cfc04
Submitter: Zuul (<email address hidden>)
Branch: master

commit 4306d6cfc13d304e2fa834da80048323526cfc04
Author: Prashanth Nageshappa <email address hidden>
Date: Tue Oct 3 03:15:28 2017 -0700

collector TCP Forwarder crashes while reconnecting

do not use shared ptr for StructuredSyslogTcpForwarder
so that we can call explicit DeleteServer

Change-Id: I26356d941d2e62f299ec45b8dc981043aa2dc3e6
Closes-Bug: #1719990

Revision history for this message
Prashanth Nageshappa (nprashanth) wrote :

Using shared ptr for tcp forwarder was causing memory corruption, leading to crashes in different location. The fix for this bug addresses crashes listed in 1720289 and 1720290 as well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.