Contrail-vrouter-agent status are timeout on both TSN nodes
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R3.1 |
Fix Committed
|
Critical
|
Hari Prasad Killi | |||
R3.1.1.x |
Invalid
|
Critical
|
Hari Prasad Killi | |||
R3.2 |
Fix Committed
|
Critical
|
Hari Prasad Killi | |||
R3.2.3.x |
In Progress
|
Critical
|
Hari Prasad Killi | |||
R4.0 |
Invalid
|
Critical
|
Hari Prasad Killi | |||
Trunk |
Invalid
|
Critical
|
Hari Prasad Killi |
Bug Description
Hi Team,
Issue: Contrail-
Contrail-version: 3.1.3 Build81
Setup: LAB
Impact: Since customer cannot test any more in the LAB, they would like to treat this issue on high priority.
This issue occurred at Aug-31 00:57 UTC on both TSN nodes in customer setup.
Please find the output of contrail-status on the TSN node.
contrail-status
root@lab3adp-
== Contrail vRouter ==
supervisor-vrouter: active
~~~ snip ~~~
contrail-
contrail-
Timeout error : HTTPConnectionP
contrail-
contrail-
contrail-
TCP connection:
tcp connection of vrouter-agent
root@lab3adp-
tcp 0 0 0.0.0.0:8085 0.0.0.0:* LISTEN
tcp 205 0 127.0.0.1:8085 127.0.0.1:47110 CLOSE_WAIT
tcp 205 0 127.0.0.1:8085 127.0.0.1:48204 CLOSE_WAIT
tcp 205 0 127.0.0.1:8085 127.0.0.1:47615 CLOSE_WAIT
There is no update after 16:07 UTC in theh contrail-
2017-08-31 Thu 16:07:33:948.952 UTC lab3adp-00004nn [Thread 139830359521024, Pid 29146]: XMPP [SYS_NOTICE]: XmppEventLog: Mode Client: Event: Tcp Connected peer ip: 10.1.135.96 ( <email address hidden> ) controller/
2017-08-31 Thu 16:07:33:952.634 UTC lab3adp-00004nn [Thread 139830346925824, Pid 29146]: XMPP [SYS_NOTICE]: XmppEventLog: Mode Client: Event: Tcp Connected peer ip: 10.1.135.97 ( <email address hidden> ) controller/
2017-08-31 Thu 16:07:33:956.338 UTC lab3adp-00004nn [Thread 139830372116224, Pid 29146]: XMPP [SYS_NOTICE]: XmppEventLog: Mode Client: Event: Tcp Connected peer ip: 10.1.135.98 ( <email address hidden> ) controller/
2017-08-31 Thu 16:07:33:959.951 UTC lab3adp-00004nn [Thread 139830367917824, Pid 29146]: XMPP [SYS_NOTICE]: XmppEventLog: Mode Client: Event: Tcp Connected peer ip: 10.1.135.97 ( <email address hidden> ) controller/
2017-08-31 Thu 16:17:03:901.242 UTC lab3adp-00004nn [Thread 139830359521024, Pid 29146]: DiscoveryClient [SYS_NOTICE]: DiscoveryClient
During this time, they also observed loss in BUM Traffic. In order to recover this issue, they have restarted supervisor-config restarted) (Aug-31 00:57 UTC). During this time, core files are generated for the tor-agent at the same time and after that vrouter-core files are generated frequently. Below is the output.
root@lab3adp-
-rw------- 1 root root 2058956800 Aug 31 00:57 core.contrail-
-rw------- 1 root root 1850613760 Aug 31 02:33 core.contrail-
-rw------- 1 root root 1766981632 Aug 31 02:54 core.contrail-
-rw------- 1 root root 1847115776 Aug 31 04:52 core.contrail-
-rw------- 1 root root 1787453440 Aug 31 06:53 core.contrail-
-rw------- 1 root root 1743376384 Aug 31 07:14 core.contrail-
-rw------- 1 root root 1854148608 Aug 31 08:37 core.contrail-
-rw------- 1 root root 1752174592 Aug 31 09:20 core.contrail-
-rw------- 1 root root 1770536960 Aug 31 10:11 core.contrail-
-rw------- 1 root root 1739558912 Aug 31 11:57 core.contrail-
-rw------- 1 root root 1805520896 Aug 31 13:19 core.contrail-
-rw------- 1 root root 1739997184 Aug 31 13:38 core.contrail-
-rw------- 1 root root 1770528768 Aug 31 15:05 core.contrail-
-rw------- 1 root root 1751678976 Aug 31 16:07 core.contrail-
Customer has shared the trace when the issue happened first, second and third time.
1. First Time:
root@lab3adp-
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://
Find the GDB manual and other documentation resources online at:
<http://
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from contrail-
warning: core file may not match specified executable file.
[New LWP 7074]
[New LWP 7078]
[New LWP 7076]
[New LWP 13887]
[New LWP 7077]
[New LWP 7058]
[New LWP 7081]
[New LWP 7075]
[New LWP 7592]
[New LWP 13888]
[New LWP 7080]
[New LWP 7591]
[New LWP 7079]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_
Core was generated by `/usr/bin/
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000e37b66 in VrfExport:
(gdb) bt
#0 0x0000000000e37b66 in VrfExport:
#1 0x0000000000e362d8 in ControllerRoute
#2 0x00000000011d28bf in DBTableWalker:
#3 0x00000000013039e7 in TaskImpl::execute() ()
#4 0x00007f68666b1b3a in ?? () from /usr/lib/
#5 0x00007f68666ad816 in ?? () from /usr/lib/
#6 0x00007f68666acf4b in ?? () from /usr/lib/
#7 0x00007f68666a90ff in ?? () from /usr/lib/
#8 0x00007f68666a92f9 in ?? () from /usr/lib/
#9 0x00007f68668cd182 in start_thread () from /lib/x86_
#10 0x00007f6865ba647d in clone () from /lib/x86_
(gdb) quit
2. Second time
root@lab3adp-
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://
Find the GDB manual and other documentation resources online at:
<http://
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from contrail-
warning: core file may not match specified executable file.
[New LWP 22106]
[New LWP 22100]
[New LWP 22103]
[New LWP 22616]
[New LWP 22102]
[New LWP 22615]
[New LWP 22105]
[New LWP 22083]
[New LWP 22107]
[New LWP 22104]
[New LWP 22101]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_
Core was generated by `/usr/bin/
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f33b4e49cc9 in raise () from /lib/x86_
(gdb) bt
#0 0x00007f33b4e49cc9 in raise () from /lib/x86_
#1 0x00007f33b4e4d0d8 in abort () from /lib/x86_
#2 0x00007f33b4e42b86 in ?? () from /lib/x86_
#3 0x00007f33b4e42c32 in __assert_fail () from /lib/x86_
#4 0x00007f33b5c36510 in pthread_mutex_lock () from /lib/x86_
#5 0x0000000000d470f3 in VnUveEntry:
#6 0x0000000000d36033 in VnUveTable:
#7 0x0000000000c71cb3 in FlowStatsCollec
#8 0x0000000000c746b3 in FlowStatsCollec
#9 0x0000000000c74a6e in FlowStatsCollec
#10 0x0000000000c775bc in boost::
#11 0x0000000000c7c614 in QueueTaskRunner
#12 0x00000000013039e7 in TaskImpl::execute() ()
#13 0x00007f33b5a18b3a in ?? () from /usr/lib/
#14 0x00007f33b5a14816 in ?? () from /usr
3. Third time
root@lab3adp-
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://
Find the GDB manual and other documentation resources online at:
<http://
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from contrail-
warning: core file may not match specified executable file.
[New LWP 33716]
[New LWP 33719]
[New LWP 33696]
[New LWP 33717]
[New LWP 33807]
[New LWP 33808]
[New LWP 33718]
[New LWP 33713]
[New LWP 33715]
[New LWP 33720]
[New LWP 33714]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_
Core was generated by `/usr/bin/
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f85443f1cc9 in raise () from /lib/x86_
(gdb) bt
#0 0x00007f85443f1cc9 in raise () from /lib/x86_
#1 0x00007f85443f50d8 in abort () from /lib/x86_
#2 0x00007f85443eab86 in ?? () from /lib/x86_
#3 0x00007f85443eac32 in __assert_fail () from /lib/x86_
#4 0x00007f85451de510 in pthread_mutex_lock () from /lib/x86_
#5 0x0000000000d470f3 in VnUveEntry:
#6 0x0000000000d36033 in VnUveTable:
#7 0x0000000000c71cb3 in FlowStatsCollec
#8 0x0000000000c746b3 in FlowStatsCollec
#9 0x0000000000c74a6e in FlowStatsCollec
#10 0x0000000000c775bc in boost::
#11 0x0000000000c7c614 in QueueTaskRunner
#12 0x00000000013039e7 in TaskImpl::execute() ()
#13 0x00007f8544fc0b3a in ?? () from /usr/lib/
#14 0x00007f8544fbc816 in ?? () from /usr
I also have researched on this and found related the BUG-https:/
Customer expectation is as below
1. What is the proper way to recover from this issue?
2. What is the root cause why both contrail-
The logs and core file are located at the below location
IP:10.219.48.123, root/Jtaclab123
Path: /home/mehul/
-Regards,
Mehul Patel
Changed in juniperopenstack: | |
importance: | Undecided → Critical |
tags: | added: vrouter |
information type: | Proprietary → Public |
tags: | added: nttc |
Hi Team,
This issue occurred while there is an issue migration is in progress (meaning:Config sync from V1controller to V2controller)
As per them the CRUD is being carried out in parallel.
-Regards,
Mehul Patel