Sahara

[Spark] Failed to scale cluster

Bug #1376228 reported by Yaroslav Lobankov on 2014-10-01

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Sahara	Fix Released	High	Andrey Pavlov	Sahara 2015.1.0 "kilo"

Bug Description

Environment:

Devstack with Heat and Neutron.

How to reproduce:

1. Create three node group templates. For example,

{
    "name": "slave-datanode",
    "description": "Test template for worker node",
    "flavor_id": "2",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "node_processes": ["slave", "datanode"],
    "floating_ip_pool": "<ID of public network>"
}

{
    "name": "slave",
    "description": "Test template for worker node",
    "flavor_id": "2",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "node_processes": ["slave"],
    "floating_ip_pool": "<ID of public network>"
}

{
    "name": "datanode",
    "description": "Test template for worker node",
    "flavor_id": "2",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "node_processes": ["datanode"],
    "floating_ip_pool": "<ID of public network>"
}

2. Create a cluster template. For example,

{
    "name": "some-cluster-template",
    "description": "Test cluster template",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "cluster_configs": {
      "HDFS": {
        "dfs.replication": 1
      }
    },
    "node_groups": [
        {
            "name": "master-node",
            "flavor_id": "2",
            "node_processes": ["master", "namenode"],
            "floating_ip_pool": "<ID of public network>",
            "count": 1
        },
        {
            "name": "worker-node-1",
            "node_group_template_id": "<ID of the first node group template>",
            "count": 1
        },
       {
            "name": "worker-node-2",
            "node_group_template_id": "<ID of the second node group template>",
            "count": 1
        },
       {
            "name": "worker-node-3",
            "node_group_template_id": "<ID of the third node group template>",
            "count": 1
        }
    ],
    "neutron_management_network": "<ID of private network>"
}

3. Create a cluster. For example,

{
    "name": "spark-cluster",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "cluster_template_id" : "<ID of the cluster template>",
    "default_image_id": "<ID of an image>",
    "user_keypair_id": "<your key pair name>",
    "description": "Test cluster by Spark plugin"
}

4. Wait for "Active" status for cluster
5. Try to scale cluster. For example,

{
  "resize_node_groups": [
      {
        "name": "worker-node-2",
        "count": 0
      },
      {
        "name": "worker-node-3",
        "count": 0
      }
  ],
  "add_node_groups": [
    {
      "node_group_template_id": "<ID of the second node group template>",
      "count": 1,
      "name": "new-worker-node-2"
    },
    {
      "node_group_template_id": "<ID of the third node group template>",
      "count": 1,
      "name": "new-worker-node-3"
    }
  ]
}

Expected result:

Cluster can be successfully scaled.

Observed result:

Cluster hangs in "Decommissioning" status.

Sergey Reshetnyak (sreshetniak) on 2014-10-08

Changed in sahara:
status:	New → Confirmed
importance:	Undecided → High
milestone:	none → kilo-1

Sergey Lukjanov (slukjanov) on 2014-12-12

Changed in sahara:
milestone:	kilo-1 → kilo-2

Sergey Lukjanov (slukjanov) on 2015-01-26

Changed in sahara:
milestone:	kilo-2 → kilo-3

Andrey Pavlov (apavlov-n) on 2015-02-06

Changed in sahara:
assignee:	nobody → Andrey Pavlov (apavlov-n)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-09: Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/153993

Changed in sahara:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-12: Fix merged to sahara (master)

Reviewed: https://review.openstack.org/153993
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=b94a0570192af332d00bbe0ad57e26375d3490ac
Submitter: Jenkins
Branch: master

commit b94a0570192af332d00bbe0ad57e26375d3490ac
Author: Andrey Pavlov <email address hidden>
Date: Mon Feb 9 14:10:06 2015 +0300

Fixed bug with spark scaling

    * scaling down now works correctly
    * renaming _start_slave_datanode_processes to
      _start_datanode_processes to avoid confusion

Change-Id: I1bf3dea47793e7358d9ba3d632828639cd128c40
Closes-bug: #1376228

Changed in sahara:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2015-03-19

Changed in sahara:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in sahara:
milestone:	kilo-3 → 2015.1.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.