[Spark] Failed to scale cluster

Bug #1376228 reported by Yaroslav Lobankov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Andrey Pavlov

Bug Description

Environment:

Devstack with Heat and Neutron.

How to reproduce:

1. Create three node group templates. For example,

{
    "name": "slave-datanode",
    "description": "Test template for worker node",
    "flavor_id": "2",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "node_processes": ["slave", "datanode"],
    "floating_ip_pool": "<ID of public network>"
}

{
    "name": "slave",
    "description": "Test template for worker node",
    "flavor_id": "2",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "node_processes": ["slave"],
    "floating_ip_pool": "<ID of public network>"
}

{
    "name": "datanode",
    "description": "Test template for worker node",
    "flavor_id": "2",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "node_processes": ["datanode"],
    "floating_ip_pool": "<ID of public network>"
}

2. Create a cluster template. For example,

{
    "name": "some-cluster-template",
    "description": "Test cluster template",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "cluster_configs": {
      "HDFS": {
        "dfs.replication": 1
      }
    },
    "node_groups": [
        {
            "name": "master-node",
            "flavor_id": "2",
            "node_processes": ["master", "namenode"],
            "floating_ip_pool": "<ID of public network>",
            "count": 1
        },
        {
            "name": "worker-node-1",
            "node_group_template_id": "<ID of the first node group template>",
            "count": 1
        },
       {
            "name": "worker-node-2",
            "node_group_template_id": "<ID of the second node group template>",
            "count": 1
        },
       {
            "name": "worker-node-3",
            "node_group_template_id": "<ID of the third node group template>",
            "count": 1
        }
    ],
    "neutron_management_network": "<ID of private network>"
}

3. Create a cluster. For example,

{
    "name": "spark-cluster",
    "plugin_name": "spark",
    "hadoop_version": "1.0.0",
    "cluster_template_id" : "<ID of the cluster template>",
    "default_image_id": "<ID of an image>",
    "user_keypair_id": "<your key pair name>",
    "description": "Test cluster by Spark plugin"
}

4. Wait for "Active" status for cluster
5. Try to scale cluster. For example,

{
  "resize_node_groups": [
      {
        "name": "worker-node-2",
        "count": 0
      },
      {
        "name": "worker-node-3",
        "count": 0
      }
  ],
  "add_node_groups": [
    {
      "node_group_template_id": "<ID of the second node group template>",
      "count": 1,
      "name": "new-worker-node-2"
    },
    {
      "node_group_template_id": "<ID of the third node group template>",
      "count": 1,
      "name": "new-worker-node-3"
    }
  ]
}

Expected result:

Cluster can be successfully scaled.

Observed result:

Cluster hangs in "Decommissioning" status.

Changed in sahara:
status: New → Confirmed
importance: Undecided → High
milestone: none → kilo-1
Changed in sahara:
milestone: kilo-1 → kilo-2
Changed in sahara:
milestone: kilo-2 → kilo-3
Changed in sahara:
assignee: nobody → Andrey Pavlov (apavlov-n)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/153993

Changed in sahara:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/153993
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=b94a0570192af332d00bbe0ad57e26375d3490ac
Submitter: Jenkins
Branch: master

commit b94a0570192af332d00bbe0ad57e26375d3490ac
Author: Andrey Pavlov <email address hidden>
Date: Mon Feb 9 14:10:06 2015 +0300

    Fixed bug with spark scaling

    * scaling down now works correctly
    * renaming _start_slave_datanode_processes to
      _start_datanode_processes to avoid confusion

    Change-Id: I1bf3dea47793e7358d9ba3d632828639cd128c40
    Closes-bug: #1376228

Changed in sahara:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in sahara:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.