Hi All, I need your help and expertise debugging the k8s sanity setup which is in really bad state. Things are messier starting build 15. I observed multiple problems on current attempt. Not sure if they are linked or all are different. Kept the setup in same setup so that you can debug the failures on live setup. K8s HA Setup details: 3 Controller+kube managers: 10.204.217.52(nodeg12) 10.204.217.71(nodeg31) 10.204.217.98(nodec58) 2 Agents/ k8s slave: 10.204.217.100(nodec60) 10.204.217.101(nodec61) Multi interface setup Following are key observations: 1. RabbitMQ cluster formed between nodeg12 and nodeg31. Nodec58 has rabbitmq as inactive. rabbitmq: inactive Docker logs for rabbitmq container on nodec58: {"init terminating in do_boot",{error,{inconsistent_cluster,"Node contrail@nodec58 thinks it's clustered with node contrail@nodeg31, but contrail@nodeg31 disagrees"}}} 2. On all 3 controllers, Cassandra connection not established for 2 hours after provisioning. This issue seems flapping with time and sometimes, I see the services as active too: control: initializing (Database:Cassandra connection down) collector: initializing (Database:Cassandra connection down) 3. If I create a k8s Pod, many a times it results in POD creation failure and instantly vrouter crash happens. The trace is below. Irrespective of crash happens or not, POD creation fails 4. ON CNI of both agent, seeing this error: I : 24646 : 2018/04/17 17:35:44 vrouter.go:79: VRouter request. Operation : GET Url : http://127.0.0.1:9091/vm/7a271412-4237-11e8-8997-002590c55f6a E : 24646 : 2018/04/17 17:35:44 vrouter.go:147: Failed HTTP Get operation. Return code 404 I : 24646 : 2018/04/17 17:35:44 vrouter.go:181: Iteration 14 : Get vrouter failed E : 24633 : 2018/04/17 17:35:49 vrouter.go:287: Error in polling VRouter I : 24633 : 2018/04/17 17:35:49 cni.go:175: Error in Add to VRouter E : 24633 : 2018/04/17 17:35:49 contrail-kube-cni.go:67: Failed processing Add command. E : 24646 : 2018/04/17 17:35:49 vrouter.go:287: Error in polling VRouter I : 24646 : 2018/04/17 17:35:49 cni.go:175: Error in Add to VRouter E : 24646 : 2018/04/17 17:35:49 contrail-kube-cni.go:67: Failed processing Add command. NOTE: Most of the issues observed are on k8s HA multi interface setup. Things are better with Non HA/ single interface setup. Agent crash trace: (gdb) bt full #0 0x00007fb9817761f7 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fb9817778e8 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007fb98176f266 in __assert_fail_base () from /lib64/libc.so.6 No symbol table info available. #3 0x00007fb98176f312 in __assert_fail () from /lib64/libc.so.6 No symbol table info available. #4 0x0000000000c15440 in AgentOperDBTable::ConfigEventHandler(IFMapNode*, DBEntry*) () No symbol table info available. #5 0x0000000000c41714 in IFMapDependencyManager::ProcessChangeList() () No symbol table info available. #6 0x0000000000ea4a57 in TaskTrigger::WorkerTask::Run() () No symbol table info available. #7 0x0000000000e9e64f in TaskImpl::execute() () No symbol table info available. #8 0x00007fb9823458ca in tbb::internal::custom_scheduler::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2 No symbol table info available. #9 0x00007fb9823415b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2 No symbol table info available. #10 0x00007fb982340c8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2 No symbol table info available. #11 0x00007fb98233e67f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2 No symbol table info available. #12 0x00007fb98233e879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2 No symbol table info available. #13 0x00007fb982560e25 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #14 0x00007fb98183934d in clone () from /lib64/libc.so.6 Thanks! Pulkit Tandon