contrail-query-engine crashed with split-db config on scaled setup

Bug #1699861 reported by manishkn
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.1
Fix Committed
High
Arvind
R3.2
In Progress
High
Arvind
R4.0
Invalid
High
Arvind
Trunk
Invalid
High
Arvind

Bug Description

contrail version
3.1.3.0-75

Setup : with split db
VN: ~6k
Interface = ~ 12k

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-query-engine --conf_file /etc/contrail/contrail-query-engine.'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fad522bcc37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007fad522bcc37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fad522c0028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fad522b5bf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fad522b5ca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00000000004ec61b in FindMember (name=0x738125 "name", this=<optimized out>) at build/include/rapidjson/document.h:620
#5 rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::operator[] (
    this=<optimized out>, name=0x738125 "name") at build/include/rapidjson/document.h:233
#6 0x00000000004da7a6 in operator[] (name=0x738125 "name", this=<optimized out>) at build/include/rapidjson/document.h:240
Python Exception <class 'IndexError'> list index out of range:
#7 PostProcessingQuery::PostProcessingQuery (this=0x2acfd30, json_api_data=std::map with 10 elements, main_query=0x2acd880)
    at controller/src/query_engine/query.cc:242
#8 0x00000000004e08e0 in AnalyticsQuery::Init (this=this@entry=0x2acd880, qid="af3b2a28-5769-11e7-9739-0000ac115a02",
Python Exception <class 'IndexError'> list index out of range:
    json_api_data=std::map with 10 elements, or_number=or_number@entry=-1) at controller/src/query_engine/query.cc:549
#9 0x00000000004e2fec in AnalyticsQuery::AnalyticsQuery (this=0x2acd880,
    qid="讥\002\000\000\000\000\060\000\000\000\000\000\000\000\000H\245\002\000\000\000\000\300ؤ\002\000\000\000\000x\023\345R\255\1 77\000\000\000\000\000\000\000\000\000\000\200Ω\002\000\000\000\000@ޭ\002\000\000\000\000 l\250\002\000\000\000\000\n\000\000\000\000\ 000\000\000\371\005\255\002\000\000\000\000\370\005\255\002\000\000\000\000:\006\255\002\000\000\000\000\370\a\255\002\000\000\000\00 0\340\235\346R\255\177\000\000\030\000\000\000\255\177\000\000\370\005\255\002\000\000\000\000P\357\344R\255\177\000\000\006", '\000' <repeats 15 times>, "\002\020\000\000\000\000\000\000\000\000\000\000\255\177", '\000' <repeats 26 times>..., dbif_ptr=...,
    json_api_data=std::map with 518 elements<error reading variable: Cannot access memory at address 0xfefefeff092d637c>,
Python Exception <class 'IndexError'> list index out of range:
    or_number=-1, where_info=<optimized out>, ttlmap=std::map with 4 elements, batch=0, total_batches=16)
    at controller/src/query_engine/query.cc:977
#10 0x00000000004eb334 in QueryEngine::QueryPrepare (this=0x2a526e0, qp=..., chunk_size=std::vector of length 0, capacity 0,
    need_merge=@0x7ffcfbd65840: true, map_output=@0x7ffcfbd6584a: true, where="", wterms=@0x7ffcfbd6584c: 1, select="", post="",
---Type <return> to continue, or q <return> to quit---
    time_period=@0x7ffcfbd65a40: 8118339, table="") at controller/src/query_engine/query.cc:1193
#11 0x000000000049f38c in QEOpServerProxy::QEOpServerImpl::StartPipeline (this=this@entry=0x2a4d200,
    qid="af3b2a28-5769-11e7-9739-0000ac115a02") at controller/src/query_engine/QEOpServerProxy.cc:833
#12 0x00000000004a1781 in QEOpServerProxy::QEOpServerImpl::CallbackProcess (this=0x2a4d200, cnum=<optimized out>,
    c=<optimized out>, r=<optimized out>, privdata=<optimized out>) at controller/src/query_engine/QEOpServerProxy.cc:998
#13 0x000000000055dfba in operator() (a2=0x0, a1=0x2a7c7e0, a0=0x2a52ec0, this=0x7ffcfbd66490)
    at /usr/include/boost/function/function_template.hpp:767
#14 RedisAsyncConnection::RAC_AsyncCmdCallback (c=0x2a52ec0, r=0x2a7c7e0, privdata=0x0)
    at controller/src/analytics/redis_connection.cc:239
#15 0x00000000006ffe43 in __redisRunCallback (cb=0x7ffcfbd665b0, cb=0x7ffcfbd665b0, reply=<optimized out>, ac=0x2a52ec0)
    at build/third_party/hiredis/src/async.c:219
#16 redisProcessCallbacks (ac=0x2a52ec0) at build/third_party/hiredis/src/async.c:417
#17 0x0000000000701269 in redisBoostClient::handle_read (this=0x2a5b020, ec=...)
    at build/third_party/hiredis/hiredis-boostasio-adapter/boostasio.cpp:62
#18 0x0000000000701c34 in call<boost::shared_ptr<redisBoostClient>, boost::system::error_code> (b1=<synthetic pointer>, u=...,
    this=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:156
#19 operator()<boost::shared_ptr<redisBoostClient> > (a1=..., u=..., this=<optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:171
#20 operator()<boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<const boost::system::error_code &, long unsigned int const&> > (a=<synthetic pointer>, f=..., this=<optimized out>) at /usr/include/boost/bind/bind.hpp:313
#21 operator()<boost::system::error_code, long unsigned int> (a2=<optimized out>, a1=..., this=<optimized out>)
    at /usr/include/boost/bind/bind_template.hpp:102
#22 operator() (this=<optimized out>) at /usr/include/boost/asio/detail/bind_handler.hpp:127
---Type <return> to continue, or q <return> to quit---
#23 asio_handler_invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system: :error_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> >, boost::system::error _code, unsigned long> > (function=...) at /usr/include/boost/asio/handler_invoke_hook.hpp:64
#24 invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> >, boost::system::error_code, unsign ed long>, boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<boost::_bi: :value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> > > (context=..., function=...)
    at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37

#25 boost::asio::detail::reactive_null_buffers_op<boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::er ror_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> > >::do_complete (
    owner=<optimized out>, base=<optimized out>) at /usr/include/boost/asio/detail/reactive_null_buffers_op.hpp:75
#26 0x00000000005cafef in complete (bytes_transferred=0, ec=..., owner=..., this=<optimized out>)
    at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#27 boost::asio::detail::epoll_reactor::descriptor_state::do_complete (owner=0x29de3d0, base=0x2a4e0b0, ec=...,
    bytes_transferred=<optimized out>) at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:651
#28 0x00000000005cc7d7 in complete (bytes_transferred=5, ec=..., owner=..., this=0x2a4e0b0)
    at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#29 do_run_one (ec=..., this_thread=..., lock=..., this=0x29de3d0) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:384
#30 boost::asio::detail::task_io_service::run (this=0x29de3d0, ec=...)
    at /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
#31 0x00000000005dad81 in run (this=0x7ffcfbd66f20, ec=...) at /usr/include/boost/asio/impl/io_service.ipp:66
#32 EventManager::Run (this=this@entry=0x7ffcfbd66f20) at controller/src/io/event_manager.cc:35
#33 0x000000000041e8c9 in main (argc=<optimized out>, argv=<optimized out>) at controller/src/query_engine/qed.cc:350

core is stored in /auto/cores/#

host1 ='root@10.87.121.77'
host2 ='root@10.87.121.78'
host3 ='root@10.87.121.79'
host4 ='root@10.87.121.80'
host5 ='root@10.87.121.81'
host6 ='root@10.87.121.82'
host7 ='root@10.87.121.83'
host8 ='root@10.87.121.84'
host9 ='root@10.87.121.85'
host10 ='root@10.87.121.86'

host11 ='root@10.87.121.70'
host12 ='root@10.87.121.71'
host13 ='root@10.87.121.72'
host14 ='root@10.87.121.73'
host15 ='root@10.87.121.74'
host16 ='root@10.87.121.75'
host17 ='root@10.87.121.76'

env.roledefs = {
    'all': [host1,host2,host3,host4,host5,host6,host7,host8,host9,host10,host11,host12,host13,host14,host15,host16,host17],
    'cfgm': [host1,host4,host7],
    'openstack': [host10],
    'webui': [host1,host4,host7],
    'control': [host1,host4,host7],
    'compute': [host11,host12,host13,host14,host15,host16,host17],
    'tsn': [host11,host12,host13,host14],
    'toragent': [host11,host12,host13,host14],
    'collector': [host2,host5,host8],
    'database': [host3,host6,host9],
    'build': [host_build],
}

Revision history for this message
Anish Mehta (amehta00) wrote :

Manish: Is this readily reproducible?

Revision history for this message
Arvind (arvindv) wrote :

Can you also change access to the core file.I am unable to copy it.

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Changed the permission to the core.

Revision history for this message
Arvind (arvindv) wrote :

In the json that gets passed to the QE, the key "filter" has wrong value (a value null is also being passed). QE is passed the wrong value. So handling the error condition.

(gdb) f 7
#7 PostProcessingQuery::PostProcessingQuery (this=0x1bbadc0, json_api_data=std::map with 10 elements = {...},
    main_query=0x1bbbdf0) at controller/src/query_engine/query.cc:242
242 controller/src/query_engine/query.cc: No such file or directory.
(gdb) p json_api_data
$1 = std::map with 10 elements = {["end_time"] = "\"now\"", ["enqueue_time"] = "1498147208691206",
  ["filter"] = "[[{\"name\": \"Type\", \"value\": \"1\", \"op\": 1}, null]]", ["limit"] = "10",
  ["query_metadata"] = "{\"enqueue_time\": 1498147208691036}",
  ["select_fields"] = "[\"MessageTS\", \"Type\", \"Source\", \"ModuleId\", \"Messagetype\", \"Xmlmessage\", \"Level\", \"Category\"]", ["sort"] = "2", ["sort_fields"] = "[\"MessageTS\"]", ["start_time"] = "\"now-10m\"",
  ["table"] = "\"MessageTable\""}
(gdb) f 11
#11 0x000000000049f38c in QEOpServerProxy::QEOpServerImpl::StartPipeline (this=this@entry=0x1b83740,
    qid="dd74c670-5763-11e7-8e23-0000ac115a08") at controller/src/query_engine/QEOpServerProxy.cc:833
833 controller/src/query_engine/QEOpServerProxy.cc: No such file or directory.
(gdb) p qp
$2 = {qid = "dd74c670-5763-11e7-8e23-0000ac115a08", terms = std::map with 10 elements = {["end_time"] = "\"now\"",
    ["enqueue_time"] = "1498147208691206",
    ["filter"] = "[[{\"name\": \"Type\", \"value\": \"1\", \"op\": 1}, null]]", ["limit"] = "10",
    ["query_metadata"] = "{\"enqueue_time\": 1498147208691036}",
    ["select_fields"] = "[\"MessageTS\", \"Type\", \"Source\", \"ModuleId\", \"Messagetype\", \"Xmlmessage\", \"Level\", \"Category\"]", ["sort"] = "2", ["sort_fields"] = "[\"MessageTS\"]", ["start_time"] = "\"now-10m\"",
    ["table"] = "\"MessageTable\""}, maxChunks = 16, query_starttm = 1498147208691845}

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/33188
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33188
Committed: http://github.com/Juniper/contrail-controller/commit/959c2801a0018f2e17be95ea9df1b77006a03140
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 959c2801a0018f2e17be95ea9df1b77006a03140
Author: arvindvis <email address hidden>
Date: Mon Jun 26 15:02:42 2017 -0700

The json passed to QE has incorrect values populated for the
keys "Filter", so handling that case
Closes-Bug:#1699861

Change-Id: I5ddbce80cb53e536585de6543dde6c01e2147d47

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/33203
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/33325
Submitter: Arvind (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33325
Committed: http://github.com/Juniper/contrail-controller/commit/0ca2f477c6f708fa75fdccd89e7d69571f002193
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 0ca2f477c6f708fa75fdccd89e7d69571f002193
Author: arvindvis <email address hidden>
Date: Thu Jun 29 15:34:59 2017 -0700

The json passed to QE has incorrect values populated for the
keys "Filter", so handling that case
Closes-Bug:#1699861

Change-Id: I2780acfbc70ba484acca99eab3657c07ba3533f7

information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/33203
Submitter: Arvind (<email address hidden>)

Revision history for this message
Gavril Ioan Florian (gflorian) wrote :

Hi Arvind,
I was trying to add comment on the review but is seems that I am the only one who can see the comment.
Could you also check if an object type? I am suggesting to add more check, something like this: QE_INVALIDARG_ERROR(json_filter_and[k].IsObject());

Regards,
Gabi Florian

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33203
Committed: http://github.com/Juniper/contrail-controller/commit/77ec58d2fa087e6f29d0dd928f69657c43a07312
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 77ec58d2fa087e6f29d0dd928f69657c43a07312
Author: arvindvis <email address hidden>
Date: Mon Jun 26 15:02:42 2017 -0700

The json passed to QE has incorrect values populated for the
keys "Filter", so handling that case
Closes-Bug: 1699861

Change-Id: I5ddbce80cb53e536585de6543dde6c01e2147d47

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/40431
Submitter: Gavril Ioan Florian (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.