k8s:oberving agent core while running k8s sanity

Bug #1798371 reported by Venkatesh Velpula
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
New
High
Sachchidanand Vaidya
Trunk
New
High
Sachchidanand Vaidya

Bug Description

I hit this core only once on my setup, couldnot reproduce again with the same test.

the test creates two namespaces one of them with custom isolated and spawns few pods in each namespaces and perform reachability checks with in the namespace and across namespaces
core file and binary and symbols are kept @nodem4:/cs-/shared/bugs/1798371/
[root@nodem4 1798371]# ls -ltrh
total 511M
-rwxr-xr-x 1 fedora fedora 25M Oct 17 2018 contrail-vrouter-agent
-r--r--r-- 1 fedora fedora 324M Oct 17 2018 contrail-vrouter-agent.debug
-rw------- 1 fedora fedora 163M Oct 17 2018 core.contrail-vroute.5010.nodec61.1539766364
[root@nodem4 1798371]# pwd
/cs-shared/bugs/1798371
[root@nodem4 1798371]#

Orchestrator :Kubernetes
HOSTOS :centos7.5
SKU :queens
build :5.0-291
deployer :contrail-ansible-deployer
========================================
Topology
========================================
vrouter +k8s_node:

      ip: nodec60
      ip: nodec61

config +control++kubemanager:

      ip: nodeg12(k8s_master)
      ip: nodeg31
      ip: nodec58
========================================
backtrace
===============================================================================================
[root@nodec61 crashes]# gdb contrail-vrouter-agent core.contrail-vroute.5010.nodec61.1539766364
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/crashes/contrail-vrouter-agent...Reading symbols from /var/crashes/contrail-vrouter-agent.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 5065]
[New LWP 5061]
[New LWP 5068]
[New LWP 5066]
[New LWP 5010]
[New LWP 5064]
[New LWP 5063]
[New LWP 5062]
[New LWP 5067]

warning: Could not load shared library symbols for 14 libraries, e.g. /lib64/libtcmalloc.so.4.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fb39ee46350 in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib64/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libcurl-7.29.0-46.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-12.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.el7_5.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.el7_5.x86_64 openldap-2.4.44-15.el7_5.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x00007fb39ee46350 in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib64/libstdc++.so.6
#1 0x0000000000ec685d in _M_insert_ (__v=..., __p=0xa2ccf00, __x=0x0, this=0xa2a3808) at /usr/include/c++/4.8.2/bits/stl_tree.h:1025
#2 std::_Rb_tree<boost::intrusive_ptr<DBTableWalk>, boost::intrusive_ptr<DBTableWalk>, std::_Identity<boost::intrusive_ptr<DBTableWalk> >, std::less<boost::intrusive_ptr<DBTableWalk> >, std::allocator<boost::intrusive_ptr<DBTableWalk> > >::_M_insert_unique (this=0xa2a3808, __v=...) at /usr/include/c++/4.8.2/bits/stl_tree.h:1382
#3 0x0000000000ec619b in insert (__x=..., this=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_set.h:463
#4 AppendWalkReq (ref=..., this=<optimized out>) at controller/src/db/db_table_walk_mgr.h:133
#5 DBTableWalkMgr::WalkTable (this=0x2c9e2c0, walk=...) at controller/src/db/db_table_walk_mgr.cc:104
#6 0x0000000000ec6564 in DBTableWalkMgr::WalkAgain (this=<optimized out>, ref=...) at controller/src/db/db_table_walk_mgr.cc:86
#7 0x0000000000ec00ae in DBTable::WalkAgain (this=this@entry=0x3466d80, walk=...) at controller/src/db/db_table.cc:625
#8 0x0000000000be4132 in AgentSandesh::DoSandeshInternal (this=0xa2b2390, sandesh=..., first=<optimized out>, first@entry=0, last=<optimized out>, last@entry=99)
    at controller/src/vnsw/agent/oper/agent_sandesh.cc:963
#9 0x0000000000be4462 in AgentSandesh::DoSandesh (sandesh=..., first=first@entry=0, last=last@entry=99) at controller/src/vnsw/agent/oper/agent_sandesh.cc:967
#10 0x0000000000be44da in AgentSandesh::DoSandesh (sandesh=...) at controller/src/vnsw/agent/oper/agent_sandesh.cc:971
#11 0x0000000000cf19fd in VmListReq::HandleRequest (this=<optimized out>) at controller/src/vnsw/agent/oper/vm.cc:210
#12 0x0000000000dab46d in Sandesh::ProcessRecv (rsnh=0xa20fc00) at src/contrail-common/sandesh/library/cpp/sandesh.cc:566
#13 0x0000000000dbf8c4 in operator() (a0=0xa20fc00, this=0x7fb397a05af0) at /usr/include/boost/function/function_template.hpp:767
#14 RunQueue (this=0x89abb60) at src/contrail-common/base/queue_task.h:67
#15 QueueTaskRunner<SandeshRequest*, WorkQueue<SandeshRequest*> >::Run (this=0x89abb60) at src/contrail-common/base/queue_task.h:42
#16 0x0000000000e9ad5f in TaskImpl::execute (this=0x89912c0) at src/contrail-common/base/task.cc:281
#17 0x00007fb39f31a66a in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000000 in ?? ()
(gdb) q
===============================================================================================

[root@nodec61 crashes]# contrail-status
Pod Service Original Name State Status
vrouter agent contrail-vrouter-agent running Up About an hour
vrouter nodemgr contrail-nodemgr running Up 3 hours

vrouter kernel module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active

[root@nodec61 crashes]#
===============================================================================================
[root@nodeg12 ~]#
[root@nodeg12 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
nodec60 Ready <none> 4h v1.9.2
nodec61 Ready <none> 4h v1.9.2
nodeg12 NotReady master 4h v1.9.2
[root@nodeg12 ~]# cntrail
-bash: cntrail: command not found
[root@nodeg12 ~]# contrail-status
Pod Service Original Name State Status
                 redis contrail-external-redis running Up 4 hours
analytics alarm-gen contrail-analytics-alarm-gen running Up 4 hours
analytics api contrail-analytics-api running Up 4 hours
analytics collector contrail-analytics-collector running Up 4 hours
analytics nodemgr contrail-nodemgr running Up 4 hours
analytics query-engine contrail-analytics-query-engine running Up 4 hours
analytics snmp-collector contrail-analytics-snmp-collector running Up 4 hours
analytics topology contrail-analytics-topology running Up 4 hours
config api contrail-controller-config-api running Up 3 hours
config device-manager contrail-controller-config-devicemgr running Up 4 hours
config nodemgr contrail-nodemgr running Up 4 hours
config schema contrail-controller-config-schema running Up 4 hours
config svc-monitor contrail-controller-config-svcmonitor running Up 4 hours
config-database cassandra contrail-external-cassandra running Up 4 hours
config-database nodemgr contrail-nodemgr running Up 4 hours
config-database rabbitmq contrail-external-rabbitmq running Up 4 hours
config-database zookeeper contrail-external-zookeeper running Up 4 hours
control control contrail-controller-control-control running Up 4 hours
control dns contrail-controller-control-dns running Up 4 hours
control named contrail-controller-control-named running Up 4 hours
control nodemgr contrail-nodemgr running Up 4 hours
database cassandra contrail-external-cassandra running Up 4 hours
database kafka contrail-external-kafka running Up 4 hours
database nodemgr contrail-nodemgr running Up 4 hours
database zookeeper contrail-external-zookeeper running Up 4 hours
kubernetes kube-manager contrail-kubernetes-kube-manager running Up 29 minutes
webui job contrail-controller-webui-job running Up 4 hours
webui web contrail-controller-webui-web running Up 4 hours

WARNING: container with original name 'contrail-external-redis' have Pod or Service empty. Pod: '' / Service: 'redis'. Please pass NODE_TYPE with pod name to container's env

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail kubernetes ==
kube-manager: backup

== Contrail database ==
kafka: active
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
cassandra: active

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: active
topology: active

== Contrail webui ==
web: active
job: active

== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: backup
api: active
schema: backup

[root@nodeg12 ~]#

==============================================================

[root@nodeg12 ~]# kubectl get pods -n ctest-ns1-62653991
NAME READY STATUS RESTARTS AGE
ctest-busybox-pod-13860984 1/1 Running 0 41s
ctest-busybox-pod-62655884 1/1 Running 0 41s
ctest-busybox-pod-66339756 1/1 Running 0 41s
ctest-nginx-pod-10597171 1/1 Running 0 41s
ctest-nginx-pod-95616880 1/1 Running 0 41s
[root@nodeg12 ~]#
[root@nodeg12 ~]# kubectl get pods -n ctest-ns2-83384090
NAME READY STATUS RESTARTS AGE
ctest-busybox-pod-35297384 1/1 Running 0 52s
ctest-busybox-pod-70715253 1/1 Running 0 52s
ctest-nginx-pod-22667117 1/1 Running 0 52s
ctest-nginx-pod-57507451 1/1 Running 0 52s
[root@nodeg12 ~]# kubectl describe ns ctest-ns2-83384090
Name: ctest-ns2-83384090
Labels: <none>
Annotations: opencontrail.org/network={'project': 'k8s-default', 'domain': 'default-domain', 'name': 'TestVNNamespace'}
Status: Active

No resource quota.

No resource limits.
[root@nodeg12 ~]# kubectl describe ns ctest-ns1-62653991
Name: ctest-ns1-62653991
Labels: <none>
Annotations: <none>
Status: Active

No resource quota.

Tags: k8s vrouter
description: updated
Changed in juniperopenstack:
milestone: none → r5.0.3
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.