Fuel for OpenStack

Controller + CephOS node deployment failed.

Bug #1253594 reported by Nikolay Fedotov on 2013-11-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Aleksandr Didenko	Fuel for OpenStack 4.0
	5.0.x	Invalid	High	Fuel Library (Deprecated)	Fuel for OpenStack 5.0.2

Bug Description

ISO: {"release": "3.2.1", "nailgun_sha": "23734e71a448faaa5bbbdb3da4525665987f01c2", "ostf_sha": "c70535553616d3c2f8b0bccced54361c06f76f97", "astute_sha": "df6ddea3abc93fbe1cab9b4534d4d5e9508c95d6", "fuellib_sha": "651f943cdac9d2084000f405125d11f1a99bd22b"}

*steps*
- create environment: CentOS HA, nova flat, ceph for images, ceph for volumes
- add nodes: 3 controller + ceph, 2 compute + ceph, 1 cinder + ceph
- deploy changes

*result*
second controller + ceph faild
puppet errors:
2013-11-21 09:46:59 ERR
Could not find a suitable provider for nova_floating_range
2013-11-21 09:46:50 ERR
(/Stage[main]/Galera/Exec[wait-for-synced-state]/returns) change from notrun to 0 failed: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:260
2013-11-21 09:36:31 ERR
Could not find a suitable provider for nova_floating_range
2013-11-21 09:36:22 ERR
(/Stage[main]/Galera/Exec[wait-for-synced-state]/returns) change from notrun to 0 failed: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:260
2013-11-21 09:25:56 ERR
Could not find a suitable provider for nova_floating_range
2013-11-21 09:25:52 ERR
(/Stage[main]/Glance::Registry/Exec[glance-manage db_sync]) Failed to call refresh: glance-manage db_sync returned 1 instead of one of [0] at /etc/puppet/modules/glance/manifests/registry.pp:141
2013-11-21 09:25:42 ERR
(/Stage[main]/Galera/Exec[wait-for-synced-state]/returns) change from notrun to 0 failed: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:260
2013-11-21 09:20:36 ERR
(/Stage[main]/Galera/Exec[wait-initial-sync]) Failed to call refresh: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q -e Synced -e Initialized && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:249

Tags:

Revision history for this message

Nikolay Fedotov (nfedotov) wrote on 2013-11-21:

fuel-snapshot-2013-11-21_10-53-01.tgz Edit (6.6 MiB, application/x-tar)

Vladimir Kuklin (vkuklin) on 2013-11-21

Changed in fuel:
milestone:	none → 3.2.1
assignee:	nobody → Andrey Korolyov (xdeller)

Mike Scherbakov (mihgen) on 2013-11-25

Changed in fuel:
milestone:	3.2.1 → 4.0

Andrey Korolyov (xdeller) on 2013-11-28

Changed in fuel:
status:	New → Won't Fix

Mike Scherbakov (mihgen) on 2013-11-28

Changed in fuel:
milestone:	4.0 → 3.2.1
status:	Won't Fix → Confirmed

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2013-11-28:

Problem exists especially on low-performance storage and specific of cephish deployments on virtual nodes. Closing as wontfix since it requires just faster storage.

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2013-11-28:

Andrey, is it fixed? Why did you close it? If it is fixed, let's provide more information.

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2013-11-28:

According to Andrey K., Vladimir Kuklin - it happens on virtual environments only due to high load, under which Galera can't sync. So it's more Galera issue which is tracked separately.

Changed in fuel:
status:	Confirmed → Won't Fix

Mike Scherbakov (mihgen) on 2013-11-29

Changed in fuel:
status:	Won't Fix → Triaged

Mike Scherbakov (mihgen) on 2013-11-29

Changed in fuel:
assignee:	Andrey Korolyov (xdeller) → Alexander Didenko (adidenko)

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2013-11-29:

fuel-snapshot-2013-11-29_14-09-45.tgz Edit (5.7 MiB, application/x-tar)

Reproduced on bare-metal (servers with Intel E3-1270 V2 @ 3.50GHz CPUs and 16G RAM)

{"release": "3.2.1", "nailgun_sha": "523b673edadb41f076284f2ca389b6724c64693c", "ostf_sha": "9d8c437198ee051f236c4874e3fd69a985317de2", "astute_sha": "dba7c2b9af67cd81c3f4564c60ca9bf0dac35d8f", "fuellib_sha": "555081c25d85250c04e568f2287768387614ed5d"}

4 hardware nodes: 3 (controller + ceph), 1 (compute + ceph)
Mode: HA
OS: CentOS
Network: Nova-Network
Ceph for images, ceph for volumes

Diagnostic snapshot attached.

Changed in fuel:
status:	Triaged → Confirmed

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2013-12-02:

The problem is intermittent. Sometimes "crmd" daemon dies on 2-nd or 3-rd controller node during deployment, which breaks mysql galera sync and leads to the puppet errors provided in the initial post.

Logs on the failed controller node:
<28>Nov 29 13:38:34 node-4 crmd[11704]: warning: do_exit: Inhibiting respawn by Heartbeat

Logs on the first controller node:
<29>Nov 29 13:38:27 node-2 crmd[4366]: notice: peer_update_callback: do_shutdown of node-4.domain.tld (op 34) is complete
<29>Nov 29 13:38:27 node-2 crmd[4366]: notice: crm_update_peer_state: crm_update_ais_node: Node node-4.domain.tld[100706496] - state is now lost
<29>Nov 29 13:38:28 node-2 crmd[4366]: notice: crm_update_peer_state: crm_update_ais_node: Node node-4.domain.tld[100706496] - state is now member
<29>Nov 29 13:38:35 node-2 crmd[4366]: notice: peer_update_callback: Stonith/shutdown of node-4.domain.tld not matched

This happens during "corosync_setup" puppet agent run stage.

Mike Scherbakov (mihgen) on 2013-12-02

Changed in fuel:
importance:	Critical → High

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2013-12-03:

Moving to 4.0. Looks like it happens rarely and we can't wait any more to keep pushing this forward in 3.2.1

Changed in fuel:
milestone:	3.2.1 → 4.0

Vladimir Kuklin (vkuklin) on 2013-12-13

Changed in fuel:
status:	Confirmed → Fix Committed

Dmitry Pyzhov (dpyzhov) on 2014-01-16

Changed in fuel:
status:	Fix Committed → Fix Released

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-07-09:

Reproduced on ISO #107, version 5.0.1

"build_id": "2014-07-08_13-57-45",
"mirantis": "yes",
"build_number": "107",
"ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
"nailgun_sha": "c0082e3a0e8544bad7bd45c15c5dd8632ea045b5",
"production": "docker",
"api": "1.0", "fuelmain_sha": "b0f5151d12751b9b55dcd69bd1445d0d480012d6",
"astute_sha": "a4edb51661f50c66e247e0b8d00f2d01e0658fe6",
"release": "5.0.1",
"fuellib_sha": "d4cb36208efaf51a7c0ca012fa63d596d4ee2e29"

1. Create new environment (Ubuntu, HA mode)
2. Choose Nova network, vlan manager
3. Choose Ceph for images
4. Choose Ceilometer
5. Add 3 controller+mongo, compute, cinder, 2 ceph
6. Start deployment. It was successful
7. But there is error on first controller (node-15) in puppet.log:

2014-07-08 15:58:29 ERR

Could not find a suitable provider for nova_floating_range

Revision history for this message

Andrew Woodward (xarses) wrote on 2014-08-12:

No log bundle attached, Incomplete.

From appearance I would think that 1536Mb ram and 1 vCPU is to small for controller, ceph-monitor (implicit) and mongo roles

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2014-08-13:

#10

Works fine for {
build_id: "2014-08-11_12-45-06",
mirantis: "yes",
build_number: "169",
ostf_sha: "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
nailgun_sha: "04ada3cd7ef14f6741a05fd5d6690260f9198095",
production: "docker",
api: "1.0",
fuelmain_sha: "43374c706b4fdce28aeb4ef11e69a53f41646740",
astute_sha: "6db5f5031b74e67b92fcac1f7998eaa296d68025",
release: "5.0.1",
fuellib_sha: "a31dbac8fff9cf6bc4cd0d23459670e34b27a9ab"
}

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.