Swift-ring-builder balance is more than 100.00 after adding two new controllers to cluster

Bug #1409783 reported by Andrey Sledzinskiy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Andrey Sledzinskiy
6.0.x
Won't Fix
High
Fuel Library (Deprecated)

Bug Description

{

    "build_id": "2015-01-05_11-20-53",
    "ostf_sha": "249574cdda0279dc8ec4957a5979651439476e8a",
    "build_number": "44",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "4b325a95b0217a26a17f526cb734b3748cb03e12",
    "production": "docker",
    "fuelmain_sha": "cc47eef01622b8fdf2d8f290cb8dfb46738dc7f5",
    "astute_sha": "18be5cd3b819f3cad4c970ce5f72d3fb211a0969",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.1",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "build_id": "2015-01-05_11-20-53",
                "ostf_sha": "249574cdda0279dc8ec4957a5979651439476e8a",
                "build_number": "44",
                "api": "1.0",
                "nailgun_sha": "4b325a95b0217a26a17f526cb734b3748cb03e12",
                "production": "docker",
                "fuelmain_sha": "cc47eef01622b8fdf2d8f290cb8dfb46738dc7f5",
                "astute_sha": "18be5cd3b819f3cad4c970ce5f72d3fb211a0969",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.1",
                "fuellib_sha": "42df19509c40e2cdc9ede9d89b42188ea27c1b7e"
            }
        }
    },
    "fuellib_sha": "42df19509c40e2cdc9ede9d89b42188ea27c1b7e"

}

Steps:
1. Create next cluster - 1 controller, Flat nova-network, HA, Ubuntu
2. Deploy cluster
3. Check swift-ring-builder health with
swift-ring-builder /etc/swift/object.builder
512 partitions, 3.000000 replicas, 1 regions, 1 zones, 2 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
             0 1 2 10.108.4.2 6000 10.108.4.2 6000 1 1.00 768 0.00
             1 1 2 10.108.4.2 6000 10.108.4.2 6000 2 1.00 768 0.00
4. Add 2 controllers
5. Re-deploy cluster
6. After re-deployment check swift-ring builder health again with
swift-ring-builder /etc/swift/object.builder

Expected - balance is less than 100.00
Actual result - balance is 105.47
512 partitions, 3.000000 replicas, 1 regions, 3 zones, 6 devices, 105.47 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
             0 1 2 10.108.4.2 6000 10.108.4.2 6000 1 1.00 498 94.53
             1 1 2 10.108.4.2 6000 10.108.4.2 6000 2 1.00 526 105.47
             2 1 3 10.108.4.4 6000 10.108.4.4 6000 2 1.00 128 -50.00
             3 1 1 10.108.4.3 6000 10.108.4.3 6000 2 1.00 128 -50.00
             4 1 1 10.108.4.3 6000 10.108.4.3 6000 1 1.00 128 -50.00
             5 1 3 10.108.4.4 6000 10.108.4.4 6000 1 1.00 128 -50.00

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
status: New → Confirmed
assignee: Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Discussion of the solutions for this bug:
https://etherpad.openstack.org/p/swift_ring_rebalance_problem

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/167726

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Aleksandr Didenko (<email address hidden>) on branch: master
Review: https://review.openstack.org/147865
Reason: We decided to go with another solution:
https://review.openstack.org/#/c/167726/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/167726
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8b9e7ece48804ed700e733e57d3434c1fcf1dff2
Submitter: Jenkins
Branch: master

commit 8b9e7ece48804ed700e733e57d3434c1fcf1dff2
Author: Aleksandr Didenko <email address hidden>
Date: Wed Mar 25 19:11:12 2015 +0200

    Setup swift rings rebalance and repush cronjobs

    We should try to make sure the balance is 0, because only 0 balance
    means that all the devices got exact amount of partitions they
    wanted to and our ring is balanced.

    In order to reduce network load we setup cronjobs in a separate
    deployment task. Those cronjobs will rebalance rings periodically
    and rsync them from primary controller to secondaries.

    DocImpact

    Change-Id: I141c7cb581da2da0ef1b47cefaf5b9c485509ecd
    Closes-bug: #1409783

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #304

"build_id": "2015-04-10_22-54-31", "ostf_sha": "c2a76a60ec4ebbd78e508216c2e12787bf25e423", "build_number": "304", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-04-10_22-54-31", "ostf_sha": "c2a76a60ec4ebbd78e508216c2e12787bf25e423", "build_number": "304", "api": "1.0", "nailgun_sha": "69547a71abb4696df7e6f44b1f7864b0535f2df7", "openstack_version": "2014.2-6.1", "production": "docker", "python-fuelclient_sha": "9208ff4a08dcb674ce2df132399a5aa3ddfac21c", "astute_sha": "d96a80b63198a578b2c159edbd76048819039eb0", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "8daac234aea6ac0a98f27871deec039f74f6fdab", "fuellib_sha": "867028fe78837dc2e4635a2cbb976782856964d0"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "69547a71abb4696df7e6f44b1f7864b0535f2df7", "openstack_version": "2014.2-6.1", "production": "docker", "python-fuelclient_sha": "9208ff4a08dcb674ce2df132399a5aa3ddfac21c", "astute_sha": "d96a80b63198a578b2c159edbd76048819039eb0", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "8daac234aea6ac0a98f27871deec039f74f6fdab", "fuellib_sha": "867028fe78837dc2e4635a2cbb976782856964d0"

With 1 controller

root@node-5:~# swift-ring-builder /etc/swift/object.builder
/etc/swift/object.builder, build version 2
512 partitions, 3.000000 replicas, 1 regions, 1 zones, 2 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
             0 1 5 192.168.1.2 6000 192.168.1.2 6000 2 1.00 768 0.00
             1 1 5 192.168.1.2 6000 192.168.1.2 6000 1 1.00 768 0.00
root@node-5:~# exit

After adding 2 controllers and redeployment

root@node-5:~# swift-ring-builder /etc/swift/object.builder
/etc/swift/object.builder, build version 8
512 partitions, 3.000000 replicas, 1 regions, 3 zones, 6 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
             0 1 5 192.168.1.2 6000 192.168.1.2 6000 2 1.00 256 0.00
             1 1 5 192.168.1.2 6000 192.168.1.2 6000 1 1.00 256 0.00
             2 1 9 192.168.1.6 6000 192.168.1.6 6000 2 1.00 256 0.00
             3 1 10 192.168.1.7 6000 192.168.1.7 6000 1 1.00 256 0.00
             4 1 9 192.168.1.6 6000 192.168.1.6 6000 1 1.00 256 0.00
             5 1 10 192.168.1.7 6000 192.168.1.7 6000 2 1.00 256 0.00
root@node-5:~#

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

reproduced on Ubuntu swarm
http://jenkins-product.srt.mirantis.net:8080/job/6.1.system_test.ubuntu.thread_4/92/testReport/%28root%29/ha_flat_scalability/ha_flat_scalability/?
[root@nailgun log]# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2-6.1"
  api: "1.0"
  build_number: "310"
  build_id: "2015-04-13_22-54-31"
  nailgun_sha: "d22c074dec091e5ddd8ea3003c37665058303cd5"
  python-fuelclient_sha: "9208ff4a08dcb674ce2df132399a5aa3ddfac21c"
  astute_sha: "d96a80b63198a578b2c159edbd76048819039eb0"
  fuellib_sha: "8b80657e9ceed8d59c2dff1c11e1481c7e69380e"
  ostf_sha: "c2a76a60ec4ebbd78e508216c2e12787bf25e423"
  fuelmain_sha: "335d3ed09ed79bd37e1f7a90442c4831c8845582"
Check HA mode on scalability

Scenario:
1. Create cluster
2. Add 1 controller node
3. Deploy the cluster
4. Add 2 controller nodes
5. Deploy changes
6. Run network verification
7. Add 2 controller nodes
8. Deploy changes
9. Run network verification
10. Run OSTF

Changed in fuel:
status: Fix Released → Confirmed
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

We should modify system tests to check that ring balance is less than 10 instead checking that it's equal to 0.
Under some circumstances we can have ring balance greater than 0.00 and it's OK. For example if we use 512 partition, 3 replicas for 10 devices. We'll have 1536 partitions in total that should be spread among 10 devices equally. Which is not possible, since 1536 / 10 = 153.6. So some devices will have 154 partitions, others 153 or 152, so in the end ring will have non-zero balance.

Changed in fuel:
status: Confirmed → Triaged
assignee: Aleksandr Didenko (adidenko) → Andrey Sledzinskiy (asledzinskiy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/173789

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/173789
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=a7848a46c7eeacbd46ce76aaa2e65a8bb5acf367
Submitter: Jenkins
Branch: master

commit a7848a46c7eeacbd46ce76aaa2e65a8bb5acf367
Author: asledzinskiy <email address hidden>
Date: Wed Apr 15 15:52:13 2015 +0300

    Increase expected swift balance to 10

    - Under some circumstances we can have ring balance greater than 0.00
    and it's ok so expected number was changed to 10

    Change-Id: I03f9c8831869e2d6aa77d46806da6d6b6f1c0c48
    Closes-Bug: #1409783

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.