Bootstrapped nodes many times rebooted without reasons

Bug #1401603 reported by Sergey Galkin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Vladimir Kozhukalov
6.0.x
Invalid
High
Vladimir Kozhukalov
6.1.x
Invalid
High
Vladimir Kozhukalov

Bug Description

api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-09_22-41-06
build_number: '49'
feature_groups:
- mirantis
fuellib_sha: 2c99931072d951301d395ebd5bf45c8d401301bb
fuelmain_sha: 3aab16667f47dd8384904e27f70f7a87ba15f4ee
nailgun_sha: 22bd43b89a17843f9199f92d61fc86cb0f8772f1
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'

During cluster deployment 98 HW nodes on ubuntu with mongo+cinder+lvm+neutron-gre all 100 nodes many times rebooted without visible reasons.
Log from jenkins job

00:24:56.089 2014-12-11 17:37:31,421 - __main__ - INFO - Fuel discovered 74 nodes
00:25:16.603 2014-12-11 17:37:51,935 - __main__ - INFO - Fuel discovered 86 nodes
00:25:37.229 2014-12-11 17:38:12,560 - __main__ - INFO - Fuel discovered 91 nodes
00:25:57.725 2014-12-11 17:38:33,056 - __main__ - INFO - Fuel discovered 93 nodes
00:26:18.280 2014-12-11 17:38:53,612 - __main__ - INFO - Fuel discovered 96 nodes
00:26:38.854 2014-12-11 17:39:14,186 - __main__ - INFO - Fuel discovered 97 nodes
00:26:59.486 2014-12-11 17:39:34,817 - __main__ - INFO - Fuel discovered 97 nodes
00:27:20.202 2014-12-11 17:39:55,534 - __main__ - INFO - Fuel discovered 97 nodes
00:27:40.826 2014-12-11 17:40:16,158 - __main__ - INFO - Fuel discovered 97 nodes
00:28:01.423 2014-12-11 17:40:36,755 - __main__ - INFO - Fuel discovered 97 nodes
00:28:22.031 2014-12-11 17:40:57,363 - __main__ - INFO - Fuel discovered 84 nodes
00:28:42.719 2014-12-11 17:41:18,051 - __main__ - INFO - Fuel discovered 72 nodes
00:29:03.370 2014-12-11 17:41:38,702 - __main__ - INFO - Fuel discovered 28 nodes
00:29:24.053 2014-12-11 17:41:59,385 - __main__ - INFO - Fuel discovered 28 nodes
00:29:44.748 2014-12-11 17:42:20,080 - __main__ - INFO - Fuel discovered 22 nodes
00:30:05.402 2014-12-11 17:42:40,733 - __main__ - INFO - Fuel discovered 11 nodes
00:30:26.088 2014-12-11 17:43:01,420 - __main__ - INFO - Fuel discovered 16 nodes
00:30:46.621 2014-12-11 17:43:21,953 - __main__ - INFO - Fuel discovered 19 nodes
00:31:07.302 2014-12-11 17:43:42,634 - __main__ - INFO - Fuel discovered 22 nodes
00:31:27.987 2014-12-11 17:44:03,319 - __main__ - INFO - Fuel discovered 28 nodes
00:31:48.653 2014-12-11 17:44:23,985 - __main__ - INFO - Fuel discovered 34 nodes
00:32:09.216 2014-12-11 17:44:44,548 - __main__ - INFO - Fuel discovered 44 nodes
00:32:29.695 2014-12-11 17:45:05,027 - __main__ - INFO - Fuel discovered 73 nodes
00:32:50.321 2014-12-11 17:45:25,653 - __main__ - INFO - Fuel discovered 90 nodes
00:33:10.965 2014-12-11 17:45:46,297 - __main__ - INFO - Fuel discovered 84 nodes

On nodes on IPMI I seen sysrq kill (see screenshots) 1,2 and 3. after 3 nodes rebooted.

After about ~1 hours this behavior itself ceased

Tags: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Strange logs from
/var/log/docker-logs/remote/10.20.1.93
grep Magra *
kernel.log:2014-12-11T13:31:25.832266+00:00 warning: [ 38.434357] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:25.844259+00:00 warning: [ 38.443407] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:25.844259+00:00 warning: [ 38.443872] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.331982+00:00 warning: [ 62.950304] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.336025+00:00 warning: [ 62.957478] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.336025+00:00 warning: [ 62.957657] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.376020+00:00 warning: [ 62.993924] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.580094+00:00 warning: [ 63.198601] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.592116+00:00 warning: [ 63.213995] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.600010+00:00 warning: [ 63.221362] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T13:31:50.611987+00:00 warning: [ 63.230050] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11

Revision history for this message
Sergey Galkin (sgalkin) wrote :

Snapshot

description: updated
Revision history for this message
Sergey Galkin (sgalkin) wrote :
information type: Private Security → Public Security
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The error message looks similar to that in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1253155

Changed in fuel:
milestone: none → 6.0
assignee: nobody → Vladimir Kozhukalov (kozhukalov)
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

However that bug has been fixed in linux-image-*-3.12.0-4.12, and we use Linux 3.13 in MOS 6.0

Changed in fuel:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

Due to HCF moving to 6.1

Changed in fuel:
milestone: 6.0 → 6.1
Sergey Galkin (sgalkin)
information type: Public Security → Public
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I do not think that this bug is kernel-related. It seems that it could be related to AMQP issues so that rabbitmq server does not receive ACK packet and sends it again and again. But this looks like a very low-probable scenario, so I would put it into incomplete status until it is reproduced again.

Changed in fuel:
milestone: 6.1 → 6.0
status: Confirmed → Incomplete
Revision history for this message
Sergey Galkin (sgalkin) wrote :

seems i don't have reproduce on third deployment.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> seems i don't have reproduce on third deployment

A failure to load kernel modules due to missing/incorrect signatures should have been 100% reproducible.

Revision history for this message
Sergey Galkin (sgalkin) wrote :

Alexei, yes you are right
cd /var/log/docker-logs/remote/10.20.1.96/
[root@fuel 10.20.1.96]# grep Magra *
kernel.log:2014-12-11T18:27:35.496467+00:00 warning: [ 38.887740] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11
kernel.log:2014-12-11T18:27:35.500258+00:00 warning: [ 38.894566] Request for unknown module key 'Magrathea: Glacier signing key: 00842a0bce71d630cf9f9099e1af6973cd2d4cc8' err -11

Issue with modules is separate issue.

Revision history for this message
Anastasia Palkina (apalkina) wrote :

I saw this issue 2-3 times

Revision history for this message
Vladimir Kozhukalov (kozhukalov) wrote :

I agree with Vladimir. The only plausible reason for that is AMQP problems.

summary: - Bootsrtaped nodes many times reboted without reasons
+ Bootstrapped nodes many times reboted without reasons
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 6.0 → 6.1
Changed in fuel:
status: Incomplete → Invalid
Changed in fuel:
status: Invalid → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please do not change the milestone of a bug, it makes a backporting harder and fixes could be lost

summary: - Bootstrapped nodes many times reboted without reasons
+ Bootstrapped nodes many times rebooted without reasons
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

 This bug was incomplete for more than 4 weeks. We cannot investigate it further so we are setting the status to Invalid. If you think it is not correct, please feel free to provide requested information and reopen the bug, and we will look into it further.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.