[ubuntu-16.04~ocata-R4.1~5] : contrail-collector cored at sigandset with SIGABRT

Bug #1734860 reported by Pavana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Invalid
Low
Ananth Suryanarayana
Trunk
Fix Committed
Low
Ananth Suryanarayana

Bug Description

Running the above build on a 3 controller multi-node vcenter-only setup and this core was seen on one of them.
Let know if setup details is needed. Core file copied to /cs-shared/bugs/<this bug-id>

Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f09a246e428 in sigandset (dest=0xc9a, left=0xccb, right=0x6) at sigandset.c:33
33 sigandset.c: No such file or directory.
[Current thread is 1 (Thread 0x7f0911bf7700 (LWP 3275))]

contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-config 4.1.0.0-5 5
contrail-config-openstack 4.1.0.0-5 5
contrail-control 4.1.0.0-5 5
contrail-database-common 4.1.0.0-5 5
contrail-dns 4.1.0.0-5 5
contrail-f5 4.1.0.0-5 5
contrail-lib 4.1.0.0-5 5
contrail-nodemgr 4.1.0.0-5 5
contrail-openstack-control 4.1.0.0-5 5
contrail-openstack-webui 4.1.0.0-5 5
contrail-setup 4.1.0.0-5 5
contrail-utils 4.1.0.0-5 5
contrail-web-controller 4.1.0.0-5 5
contrail-web-core 4.1.0.0-5 5
contrail-web-storage 4.1.0.0-5 5
python-contrail 4.1.0.0-5 5
python-neutronclient 1:6.1.0-0ubuntu2~cloud0.1contrail 5

contrail-status
== Contrail Analytics ==
contrail-collector: active
contrail-analytics-api: active
contrail-query-engine: active
contrail-alarm-gen: active
contrail-snmp-collector: active
contrail-topology: active
contrail-analytics-nodemgr: active
========Run time service failures=============
/var/crashes/core.contrail-collec.3226.nodei28.1511845356

Revision history for this message
Pavana (pavanap) wrote :

sm json file used and /etc/contrailctl also attached in the same location - /cs-shared/bugs/1734860

information type: Proprietary → Public
Revision history for this message
Zhiqiang Cui (zcui) wrote :

do not find the folder: /cs-shared/bugs/1734860

And please share /var/log/redis/redis-server.log and /var/log/contrail/contrail-collector* of node on which core file is generated.

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

It is in anamika.englab.juniper.net - Ping me if you do not have anamika credentials.

Revision history for this message
Zhiqiang Cui (zcui) wrote :

From core file, we can’t get bt, according to strings info, we can know assert happen.

controller/src/database/cassandra/cql/cql_if.cc:2010: bool cass::cql::CqlIfImpl::IsTableStatic(const string&): Assertion `impl::GetCassTableClusteringKeyCount(cci_, session_.get(), keyspace_, table, &ck_count)' failed.

But it is not enough to fix the problem.

So please share the setup.
And please share var/log/contrail/* and /var/log/cassadra/* too when the problem happen.

Thank you very very much.

Revision history for this message
Santosh Gupta (sangupta) wrote :

If cqlsh was used at all for query/debugging please paste the screendump of all commands/output.
From the "strings" of core file it looks like the keyspace or tables were deleted from cassandra.

Pavana (pavanap)
tags: added: sanity
removed: sanityblocker
Revision history for this message
Pavana (pavanap) wrote :

Not seen the issue in recent builds, last checked on build 8. Removing sanityblocker tag but keeping the bug open for now. Will check on newer builds and close the bug in case it is seen again

Revision history for this message
Sudheendra Rao (sudheendra-k) wrote :

Same core (but this time on control) is seen on mainline build56 ocata sanity setup.

root@nodem16(controller):/# gdb /usr/bin/contrail-control /var/crashes/saved/core.contrail-contro.6887.nodem16.1512811848
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f5d7f315428 in sigandset (dest=0x1ae7, left=0x1b07, right=0x6) at sigandset.c:33
33 sigandset.c: No such file or directory.
[Current thread is 1 (Thread 0x7f5d463f8700 (LWP 6919))]
(gdb)

The logs and core files are copied to anamika.englab.juniper.net:
/cs-shared/bugs/1734860/logs

Revision history for this message
Zhiqiang Cui (zcui) wrote :

[stack@anamika logs]$ strings core.contrail-contro.6887.nodem16.1512811848 | grep Assert
: src/contrail-common/base/index_map.h:74: void IndexMap<KeyType, ValueType, BitsetType>::Remove(const KeyType&, int, bool) [with KeyType = std::__cxx11::basic_string<char>; ValueType = BgpIfmapInstanceConfig; BitsetType = BitSet]: Assertion `loc->second == values_[index]' failed.
urity-group>8000003</seccontrail-control: src/contrail-common/base/index_map.h:74: void IndexMap<KeyType, ValueType, BitsetType>::Remove(const KeyType&, int, bool) [with KeyType = std::__cxx11::basic_string<char>; ValueType = BgpIfmapInstanceConfig; BitsetType = BitSet]: Assertion `loc->second == values_[index]' failed.
contrail-control: src/contrail-common/base/index_map.h:74: void IndexMap<KeyType, ValueType, BitsetType>::Remove(const KeyType&, int, bool) [with KeyType = std::__cxx11::basic_string<char>; ValueType = BgpIfmapInstanceConfig; BitsetType = BitSet]: Assertion `loc->second == values_[index]' failed.
[stack@anamika logs]$

This should be totally different problem, why you think it is the same problem to 1734860?
Last time, I have pasted that Core is because contrail-collecter “Assert” can’t find table, so Santosh hope know if somebody manually operate the cassadra.

But this one is totally different to that one, I think we should setup a new bug for this? Or do you mind explaining why you think they are same?

Revision history for this message
Sudheendra Rao (sudheendra-k) wrote :

will log different bug for control core seen, as issues are different.
monitor this bug (collector core) for few more days before closing.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/38350
Submitter: Ananth Suryanarayana (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/38350
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/38376
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38350
Committed: http://github.com/Juniper/contrail-common/commit/a7e8831a06050f9047025a22621cba033e28822f
Submitter: Zuul (<email address hidden>)
Branch: master

commit a7e8831a06050f9047025a22621cba033e28822f
Author: Ananth Suryanarayana <email address hidden>
Date: Thu Dec 14 11:29:02 2017 -0800

Do not free entry from values_ vector in index map until bit is freed

IndexMap stores values inside a vector values_ with the positions in the array
managed via a bitmap. IOW, a bit in the bitmap corresponds to an index/position
in the vector. Hence, the value inside the vector must be reset only when the
corresponding bit in the bitmap is reset.

With mvpn changes, bit reset was postponed to Resetbit() but the value was
reset from the vector inline in Remove() API itself. This can cause the
bit-map and vector to go out of sync if new insertions and deletions come in
between.

Verified using existing bgp_xmpp_deferq_test itself. More specific tests for
class IndexMap shall be added in a subsequent commit

Change-Id: I5b6c9d2ba52994d0ad9817480b0ae7f75a190824
Partial-Bug: 1734860

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/38376
Committed: http://github.com/Juniper/contrail-common/commit/fb32a788c9f56fa32ad0ce3d2d8c86404c91948e
Submitter: Zuul (<email address hidden>)
Branch: master

commit fb32a788c9f56fa32ad0ce3d2d8c86404c91948e
Author: Nikhil B <email address hidden>
Date: Fri Dec 15 14:16:35 2017 +0530

Adding testcases for IndexMap

Adding basic testcase for IndexMap.
Also adding a testcase for the failure case we saw recently where
add following by 2 deletes was causing issues. More testcases may
be added later on

Change-Id: I4f526d768c1564b96c91ba2d65f10fd361783d1d
Partial-Bug: #1734860

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.