RHOSP13- R5.0-162- collector is not coming up after stoping to check alarm

Bug #1786042 reported by alok kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Invalid
High
Arvind
Trunk
Invalid
High
Arvind

Bug Description

After stopping collector(in alarm cases), it's not coming up.

Though it was seen after resetting rabbitmq cluster(which was broken) it came up once but now again it's not coming up even when rabbitmq cluster is fine.

db nodemgr also showing msg: "Cassandra state detected DOWN" however cassandra seems to be up.

== Contrail database ==
kafka: active
nodemgr: initializing (Cassandra state detected DOWN. )
zookeeper: active
cassandra: active

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: initializing (Database:overcloud-contrailcontroller-1:Global connection down)
topology: active

Errors seen in collector log:

2018-08-08 Wed 13:39:31:302.587 UTC overcloud-contrailcontroller-1 [Thread 139775151621888, Pid 1]: overcloud-contrailcontroller-1:Global: Initialize: Create/Set KEYSPACE: ContrailAnalyticsCql FAILED
2018-08-08 Wed 13:39:32:858.496 UTC overcloud-contrailcontroller-1 [Thread 139775151621888, Pid 1]: overcloud-contrailcontroller-1:Global: ObjectTableInsert: Addition of overcloud-contrailcontroller-1:Analytics:contrail-collector:0, message UUID 579b6a5f-75cd-41e1-a603-b92797fbc029 ObjectGeneratorInfo into table ObjectValueTable FAILED
2018-08-08 Wed 13:39:32:858.664 UTC overcloud-contrailcontroller-1 [Thread 139775151621888, Pid 1]: overcloud-contrailcontroller-1:Global: MessageTableOnlyInsert: Addition of message: SandeshModuleClientTrace, message UUID: 579b6a5f-75cd-41e1-a603-b92797fbc029 COLUMN FAILED
2018-08-08 Wed 13:39:32:859.743 UTC overcloud-contrailcontroller-1 [Thread 139775155820288, Pid 1]: overcloud-contrailcontroller-1:Global: ObjectTableInsert: Addition of overcloud-contrailcontroller-1:Analytics:contrail-collector:0, message UUID d396ee52-967a-4d4e-8fd4-946cd6bf9a95 ObjectGeneratorInfo into table ObjectValueTable FAILED
2018-08-08 Wed 13:39:32:859.936 UTC overcloud-contrailcontroller-1 [Thread 139775155820288, Pid 1]: overcloud-contrailcontroller-1:Global: MessageTableOnlyInsert: Addition of message: SandeshModuleClientTrace, message UUID: d396ee52-967a-4d4e-8fd4-946cd6bf9a95 COLUMN FAILED

setup info:
This virtualized setup with all the bms(VMs) running on below nodes:
Login for all hypervisors: root
Undercloud: 192.168.122.179 on 10.204.217.133
Controllers hypervisor: 10.204.217.134
Computes hypervisors: 10.204.217.135, 10.204.217.137, 10.204.217.138

target bms(VMs):

(undercloud) [stack@queensa ~]$ openstack server list
+--------------------------------------+--------------------------------+--------+------------------------+----------------+---------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------------------------------+--------+------------------------+----------------+---------------------+
| 58f85277-04ca-4aec-91ce-d5a59ba9e609 | overcloud-contrailcontroller-2 | ACTIVE | ctlplane=192.168.24.14 | overcloud-full | contrail-controller |
| 20d626ff-d15e-48b0-ad06-fba82fa1e5fa | overcloud-contrailcontroller-0 | ACTIVE | ctlplane=192.168.24.19 | overcloud-full | contrail-controller |
| c79d6bfd-4c73-452c-aaf8-03fe08beca1e | overcloud-contrailcontroller-1 | ACTIVE | ctlplane=192.168.24.24 | overcloud-full | contrail-controller |
| 9e65dd37-4e32-466d-900c-014cbed49ee2 | overcloud-novacompute-1 | ACTIVE | ctlplane=192.168.24.20 | overcloud-full | compute |
| 2e2c2b82-c296-4c43-9b36-c1d30859e794 | overcloud-controller-0 | ACTIVE | ctlplane=192.168.24.23 | overcloud-full | control |
| 3106a491-420c-4830-a7c0-a4668305ea16 | overcloud-novacompute-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | compute |
| 541c1f09-31b5-421f-a176-aa3ea137ba90 | overcloud-controller-1 | ACTIVE | ctlplane=192.168.24.13 | overcloud-full | control |
| 077c293a-320f-4ec3-9678-ee774d2dfb92 | overcloud-controller-2 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | control |
| 29615d3b-c9ca-4375-8113-d8339151321a | overcloud-novacompute-2 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | compute |
+--------------------------------------+--------------------------------+--------+------------------------+----------------+---------------------+

to connect to any bms: ssh root@10.204.217.133-> ssh root@192.168.122.179 -> su - stack-> source stackrc-> ssh heat-admin@192.168.24.19, this will connect to cfgm0

Jeba Paulaiyan (jebap)
tags: added: contrail-cloud
Revision history for this message
Arvind (arvindv) wrote :

The controller VM's are provisioned with low memory.
[heat-admin@overcloud-contrailcontroller-1 ~]$ free -h
              total used free shared buff/cache available
Mem: 15G 13G 273M 4.8M 1.9G 1.6G
Swap: 0B 0B

This is not enough to run analytics_cassandra and config_cassandra in the same VM.
I am noticing OutOfMemory Errors in cassandra logs as well.
INFO [ScheduledTasks:1] 2018-08-08 14:27:36,704 MessagingService.java:1238 - MUTATION messages were dropped in last 5000 ms: 125 internal and 121 cross node. Mean internal dropped latency: 545919 ms and Mean cross-node dropped latency: 537531 ms
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
Unable to create java_pid1.hprof: Permission denied

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 1"...
os::fork_and_exec failed: Cannot allocate memory (12)
INFO [ScheduledTasks:1] 2018-08-08 14:35:22,034 MessagingService.java:1238 - REQUEST_RESPONSE messages were dropped in last 5000 ms: 0 internal and 3 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 0 ms

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.