Agent crrashed in TraceBuffer<SandeshTrace>::TraceWrite(SandeshTrace*) ()

Bug #1629012 reported by Vinod Nair
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
High
Hari Prasad Killi
R3.0.2.x
Fix Committed
High
Hari Prasad Killi
R3.0.3.x
Fix Committed
High
Hari Prasad Killi
R3.1
Fix Committed
High
Hari Prasad Killi
R3.2
Fix Committed
High
Hari Prasad Killi
Trunk
Fix Committed
High
Hari Prasad Killi

Bug Description

On a 3.0.3B 65 node agent crashes in TraceBuffer<SandeshTrace>::TraceWrite(SandeshTrace*) ()

bt is as below

Program terminated with signal SIGABRT, Aborted.
#0 0x00007fad5c0edcc9 in __GI___libc_sigaction (sig=28285, act=0x2416, oact=0x6) at ../sysdeps/unix/sysv/linux/x86_64/sigaction.c:53
53 ../sysdeps/unix/sysv/linux/x86_64/sigaction.c: No such file or directory.
(gdb) bt
#0 0x00007fad5c0edcc9 in __GI___libc_sigaction (sig=28285, act=0x2416, oact=0x6) at ../sysdeps/unix/sysv/linux/x86_64/sigaction.c:53
#1 0x00000000008781e7 in TraceBuffer<SandeshTrace>::TraceWrite(SandeshTrace*) ()
#2 0x00007fad5c233d1c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000001215775 in ?? ()
#4 0x00007fa5eebf9880 in ?? ()
#5 0x0000000000000000 in ?? ()
(gdb)

Cores in /cs-shared/bugs/

new cores:
ls -ltr
total 41695684
-rwxrwxrwx 1 naveenn hurricane 37070767095 Sep 29 10:25 core.contrail-vroute.28285.cs-scale-7.1474956941.gz
-rwxrwxrwx 1 vinodnair slt 2303885312 Nov 15 23:47 core.contrail-vroute.15454.cs-scale-6.1479282080
-rwxrwxrwx 1 vinodnair slt 1676009472 Nov 15 23:47 core.contrail-vroute.15510.cs-scale-6.1479282097
-rwxrwxrwx 1 vinodnair slt 1478246400 Nov 15 23:47 core.contrail-vroute.35527.cs-scale-6.1479280297

Tags: vrouter
Vinod Nair (vinodnair)
Changed in juniperopenstack:
importance: Undecided → High
Revision history for this message
Hari Prasad Killi (haripk) wrote :

controller/src/vnsw/agent/oper/vrf.cc:348: bool VrfEntry::DeleteTimeout(): Assertion `0' failed.

flow_table_entries_count_ = 629760
The flow entries are delete marked, but are not deleted. Need to check if dpdk-vrouter was taking time to delete.

Flow table limit is set to 80M.

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Removing 3.0.3.1 milestone as per following comment from Hari:

Couldn’t find any issue checking the core. The flow table seems to be set to 80M !! Not sure how may flows were created, but VRF delete trigger is seen and the flow entries weren’t cleaned up (about 600K were still pending). This results in VRF delete timeout assert. We have to recreate and check if the deletion was slow.

Revision history for this message
Vinod Nair (vinodnair) wrote :

Did not see the original issue , but see below crash once
core in /cs-shared/bugs/1629012/core.contrail-vroute.35527.cs-scale-6.1479280297

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f6c27995c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f6c27995c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f6c27999028 in __GI_abort () at abort.c:89
#2 0x00007f6c2798ebf6 in __assert_fail_base (fmt=0x7f6c27adf3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x12030f8 "req.oper == DBRequest::DB_ENTRY_DELETE", file=file@entry=0x12030c8 "controller/src/vnsw/agent/oper/oper_db.h", line=line@entry=131,
    function=function@entry=0x1203c80 "virtual void AgentOperDBTable::ConfigEventHandler(IFMapNode*, DBEntry*)") at assert.c:92
#3 0x00007f6c2798eca2 in __GI___assert_fail (assertion=0x12030f8 "req.oper == DBRequest::DB_ENTRY_DELETE", file=0x12030c8 "controller/src/vnsw/agent/oper/oper_db.h",
    line=131, function=0x1203c80 "virtual void AgentOperDBTable::ConfigEventHandler(IFMapNode*, DBEntry*)") at assert.c:101
#4 0x0000000000a0711d in AgentOperDBTable::ConfigEventHandler(IFMapNode*, DBEntry*) ()
#5 0x0000000000a0ac64 in IFMapDependencyManager::ProcessChangeList() ()
#6 0x00000000011c9b87 in TaskTrigger::WorkerTask::Run() ()
#7 0x00000000011c5d0f in TaskImpl::execute() ()
#8 0x00007f6c28564b3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f6c28560816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f6c2855ff4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f6c2855c0ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f6c2855c2f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f6c28780184 in start_thread (arg=0x7f6c13fff700) at pthread_create.c:312
#14 0x00007f6c27a5937d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Vinod Nair (vinodnair)
description: updated
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/26267
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/26268
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/26269
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/26270
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26267
Committed: http://github.org/Juniper/contrail-controller/commit/d55a6d84f043b665d1469cf60c717859ca6c62bb
Submitter: Zuul
Branch: master

commit d55a6d84f043b665d1469cf60c717859ca6c62bb
Author: Hari <email address hidden>
Date: Fri Nov 18 20:23:38 2016 +0530

Do DB enqueue from VMI delete

For instance delete from config, we convert it to an add change event
and do a resync. Due to this the assert in case of UUID change is not
valid. Avoiding it by doing an enqueue inside the IFNodeToReq.

Change-Id: Ied32959a86acefda344bc14832d751428b9868ab
closes-bug: #1629012

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26268
Committed: http://github.org/Juniper/contrail-controller/commit/e45ed0ff54ac08e2cd5839ea87f1156605c88d52
Submitter: Zuul
Branch: R3.2

commit e45ed0ff54ac08e2cd5839ea87f1156605c88d52
Author: Hari <email address hidden>
Date: Fri Nov 18 20:23:38 2016 +0530

Do DB enqueue from VMI delete

For instance delete from config, we convert it to an add change event
and do a resync. Due to this the assert in case of UUID change is not
valid. Avoiding it by doing an enqueue inside the IFNodeToReq.

Change-Id: Ied32959a86acefda344bc14832d751428b9868ab
closes-bug: #1629012

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26269
Committed: http://github.org/Juniper/contrail-controller/commit/f2d012bc65e89df2d709a9c8094c26dbc9f7fbe6
Submitter: Zuul
Branch: R3.1

commit f2d012bc65e89df2d709a9c8094c26dbc9f7fbe6
Author: Hari <email address hidden>
Date: Fri Nov 18 20:23:38 2016 +0530

Do DB enqueue from VMI delete

For instance delete from config, we convert it to an add change event
and do a resync. Due to this the assert in case of UUID change is not
valid. Avoiding it by doing an enqueue inside the IFNodeToReq.

Change-Id: Ied32959a86acefda344bc14832d751428b9868ab
closes-bug: #1629012

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26270
Committed: http://github.org/Juniper/contrail-controller/commit/cb0a18d718f8499ebba40d48f4e002edfb37a206
Submitter: Zuul
Branch: R3.0

commit cb0a18d718f8499ebba40d48f4e002edfb37a206
Author: Hari <email address hidden>
Date: Fri Nov 18 20:23:38 2016 +0530

Do DB enqueue from VMI delete

For instance delete from config, we convert it to an add change event
and do a resync. Due to this the assert in case of UUID change is not
valid. Avoiding it by doing an enqueue inside the IFNodeToReq.

Change-Id: Ied32959a86acefda344bc14832d751428b9868ab
closes-bug: #1629012

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/26323
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26323
Committed: http://github.org/Juniper/contrail-controller/commit/75ab9aca1180928667682017e5ed7ea66574f342
Submitter: Zuul
Branch: R3.0.3.x

commit 75ab9aca1180928667682017e5ed7ea66574f342
Author: Hari <email address hidden>
Date: Fri Nov 18 20:23:38 2016 +0530

Do DB enqueue from VMI delete

For instance delete from config, we convert it to an add change event
and do a resync. Due to this the assert in case of UUID change is not
valid. Avoiding it by doing an enqueue inside the IFNodeToReq.

Change-Id: Ied32959a86acefda344bc14832d751428b9868ab
closes-bug: #1629012

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/28339
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/28339
Committed: http://github.org/Juniper/contrail-controller/commit/1b9cc8b2f8614300908051fbac9ec9f3a941c43b
Submitter: Zuul (<email address hidden>)
Branch: R3.0.2.x

commit 1b9cc8b2f8614300908051fbac9ec9f3a941c43b
Author: Hari <email address hidden>
Date: Fri Nov 18 20:23:38 2016 +0530

Do DB enqueue from VMI delete

For instance delete from config, we convert it to an add change event
and do a resync. Due to this the assert in case of UUID change is not
valid. Avoiding it by doing an enqueue inside the IFNodeToReq.

Change-Id: Ied32959a86acefda344bc14832d751428b9868ab
closes-bug: #1629012
(cherry picked from commit 75ab9aca1180928667682017e5ed7ea66574f342)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.