Controller + CephOS node deployment failed.

Bug #1253594 reported by Nikolay Fedotov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Aleksandr Didenko
5.0.x
Invalid
High
Fuel Library (Deprecated)

Bug Description

ISO: {"release": "3.2.1", "nailgun_sha": "23734e71a448faaa5bbbdb3da4525665987f01c2", "ostf_sha": "c70535553616d3c2f8b0bccced54361c06f76f97", "astute_sha": "df6ddea3abc93fbe1cab9b4534d4d5e9508c95d6", "fuellib_sha": "651f943cdac9d2084000f405125d11f1a99bd22b"}

*steps*
- create environment: CentOS HA, nova flat, ceph for images, ceph for volumes
- add nodes: 3 controller + ceph, 2 compute + ceph, 1 cinder + ceph
- deploy changes

*result*
second controller + ceph faild
puppet errors:
2013-11-21 09:46:59 ERR
 Could not find a suitable provider for nova_floating_range
2013-11-21 09:46:50 ERR
 (/Stage[main]/Galera/Exec[wait-for-synced-state]/returns) change from notrun to 0 failed: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:260
2013-11-21 09:36:31 ERR
 Could not find a suitable provider for nova_floating_range
2013-11-21 09:36:22 ERR
 (/Stage[main]/Galera/Exec[wait-for-synced-state]/returns) change from notrun to 0 failed: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:260
2013-11-21 09:25:56 ERR
 Could not find a suitable provider for nova_floating_range
2013-11-21 09:25:52 ERR
 (/Stage[main]/Glance::Registry/Exec[glance-manage db_sync]) Failed to call refresh: glance-manage db_sync returned 1 instead of one of [0] at /etc/puppet/modules/glance/manifests/registry.pp:141
2013-11-21 09:25:42 ERR
 (/Stage[main]/Galera/Exec[wait-for-synced-state]/returns) change from notrun to 0 failed: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:260
2013-11-21 09:20:36 ERR
 (/Stage[main]/Galera/Exec[wait-initial-sync]) Failed to call refresh: /usr/bin/mysql -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q -e Synced -e Initialized && sleep 10 returned 1 instead of one of [0] at /etc/puppet/modules/galera/manifests/init.pp:249

Tags: library
Revision history for this message
Nikolay Fedotov (nfedotov) wrote :
Changed in fuel:
milestone: none → 3.2.1
assignee: nobody → Andrey Korolyov (xdeller)
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: 3.2.1 → 4.0
Changed in fuel:
status: New → Won't Fix
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: 4.0 → 3.2.1
status: Won't Fix → Confirmed
Revision history for this message
Andrey Korolyov (xdeller) wrote :

Problem exists especially on low-performance storage and specific of cephish deployments on virtual nodes. Closing as wontfix since it requires just faster storage.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Andrey, is it fixed? Why did you close it? If it is fixed, let's provide more information.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

According to Andrey K., Vladimir Kuklin - it happens on virtual environments only due to high load, under which Galera can't sync. So it's more Galera issue which is tracked separately.

Changed in fuel:
status: Confirmed → Won't Fix
Mike Scherbakov (mihgen)
Changed in fuel:
status: Won't Fix → Triaged
Mike Scherbakov (mihgen)
Changed in fuel:
assignee: Andrey Korolyov (xdeller) → Alexander Didenko (adidenko)
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Reproduced on bare-metal (servers with Intel E3-1270 V2 @ 3.50GHz CPUs and 16G RAM)

{"release": "3.2.1", "nailgun_sha": "523b673edadb41f076284f2ca389b6724c64693c", "ostf_sha": "9d8c437198ee051f236c4874e3fd69a985317de2", "astute_sha": "dba7c2b9af67cd81c3f4564c60ca9bf0dac35d8f", "fuellib_sha": "555081c25d85250c04e568f2287768387614ed5d"}

4 hardware nodes: 3 (controller + ceph), 1 (compute + ceph)
Mode: HA
OS: CentOS
Network: Nova-Network
Ceph for images, ceph for volumes

Diagnostic snapshot attached.

Changed in fuel:
status: Triaged → Confirmed
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

The problem is intermittent. Sometimes "crmd" daemon dies on 2-nd or 3-rd controller node during deployment, which breaks mysql galera sync and leads to the puppet errors provided in the initial post.

Logs on the failed controller node:
<28>Nov 29 13:38:34 node-4 crmd[11704]: warning: do_exit: Inhibiting respawn by Heartbeat

Logs on the first controller node:
<29>Nov 29 13:38:27 node-2 crmd[4366]: notice: peer_update_callback: do_shutdown of node-4.domain.tld (op 34) is complete
<29>Nov 29 13:38:27 node-2 crmd[4366]: notice: crm_update_peer_state: crm_update_ais_node: Node node-4.domain.tld[100706496] - state is now lost
<29>Nov 29 13:38:28 node-2 crmd[4366]: notice: crm_update_peer_state: crm_update_ais_node: Node node-4.domain.tld[100706496] - state is now member
<29>Nov 29 13:38:35 node-2 crmd[4366]: notice: peer_update_callback: Stonith/shutdown of node-4.domain.tld not matched

This happens during "corosync_setup" puppet agent run stage.

Mike Scherbakov (mihgen)
Changed in fuel:
importance: Critical → High
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Moving to 4.0. Looks like it happens rarely and we can't wait any more to keep pushing this forward in 3.2.1

Changed in fuel:
milestone: 3.2.1 → 4.0
Changed in fuel:
status: Confirmed → Fix Committed
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #107, version 5.0.1

"build_id": "2014-07-08_13-57-45",
"mirantis": "yes",
"build_number": "107",
"ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
"nailgun_sha": "c0082e3a0e8544bad7bd45c15c5dd8632ea045b5",
"production": "docker",
"api": "1.0", "fuelmain_sha": "b0f5151d12751b9b55dcd69bd1445d0d480012d6",
"astute_sha": "a4edb51661f50c66e247e0b8d00f2d01e0658fe6",
"release": "5.0.1",
"fuellib_sha": "d4cb36208efaf51a7c0ca012fa63d596d4ee2e29"

1. Create new environment (Ubuntu, HA mode)
2. Choose Nova network, vlan manager
3. Choose Ceph for images
4. Choose Ceilometer
5. Add 3 controller+mongo, compute, cinder, 2 ceph
6. Start deployment. It was successful
7. But there is error on first controller (node-15) in puppet.log:

2014-07-08 15:58:29 ERR

 Could not find a suitable provider for nova_floating_range

Revision history for this message
Andrew Woodward (xarses) wrote :

No log bundle attached, Incomplete.

From appearance I would think that 1536Mb ram and 1 vCPU is to small for controller, ceph-monitor (implicit) and mongo roles

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Works fine for {
build_id: "2014-08-11_12-45-06",
mirantis: "yes",
build_number: "169",
ostf_sha: "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
nailgun_sha: "04ada3cd7ef14f6741a05fd5d6690260f9198095",
production: "docker",
api: "1.0",
fuelmain_sha: "43374c706b4fdce28aeb4ef11e69a53f41646740",
astute_sha: "6db5f5031b74e67b92fcac1f7998eaa296d68025",
release: "5.0.1",
fuellib_sha: "a31dbac8fff9cf6bc4cd0d23459670e34b27a9ab"
}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.