Fuel for OpenStack

When scale up, new computes should be disabled by default

Bug #1398817 reported by Mike Scherbakov on 2014-12-03

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Committed	High	Bogdan Dobrelya	Fuel for OpenStack 6.1
	6.0.x	Won't Fix	High	Fuel Library (Deprecated)	Fuel for OpenStack 6.0-updates

Bug Description

enable_new_services should be False in nova.conf when you scale the env (add more computes to already deployed environment).
See details how it works at: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L507-L508

The story is the following:
when we deploy new cluster, we want all services to be started, and we want operational environment right after it is deployed. Then, before running production workloads, we test it with HealthCheck feature of Fuel, create test tenant, and run some test workloads first. Then we start running production workloads.

After some time, user needs to scale up: add more compute hosts. If you simply add new computes with Fuel and deploy, they will be automatically registered in Nova DB. If user wants to start a new VM at that time, it will VERY likely go to the new host, as the less loaded one. Needless to say, that the new host might not be ready to accept production load. Before moving any production load on it, administrator of the cloud has to ensure that the new compute is ready for it. It is another story how to do it.

So, a simple step for now would be:
1) enable computes when you deploy new environment
2) disable new computes, when you add them to already deployed environment
3) update Operations Guide: deploy new compute, check that it is ready for production, and manually enable compute via nova-manage command on controller host.

Please consider the same approach for other OpenStack projects where possible. Config option for Cinder: https://github.com/openstack/cinder/blob/master/cinder/db/api.py#L58

See original description

Tags:

Mike Scherbakov (mihgen) on 2014-12-03

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)
tags:	added: customer-found

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-12-03:

I think it makes sense to backport this to 6.x as well

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Fuel Astute Team (fuel-astute)
status:	New → Triaged

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-12-03:

Looks like enable_new_services should be set to *False* at the *pre deploy* hook in Astute orchestration. Otherwise we will end up with all computes disabled after the very first deployment is done. Also, it could be reasonable to put it back to *True* at the post deploy hook, so the only job would be left to do is to manually enable the new services

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-12-03:

Well, I'm thinking in a wrong way :) Looks like we will get all computes disabled anyway...

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-12-03:

Although, the comment #3 could be relevant if we introduced some new 'scale' action in orchestration logic. Standard deploy action should not manipulate with enable_new_services

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-12-03:

Bogdan, sorry, looks like I was not clear enough:
> when we deploy new cluster, we want all services to be started
> for now would be disable new computes by default - ONLY when you add new computes to existing already deployed env. When you deploy fresh env, computes should be enabled.
I'll fix description to make it more clear.

Please also see this email thread for reference: http://<email address hidden>/msg41238.html
We need similar thing for adding new Cinder nodes: https://github.com/openstack/cinder/blob/master/cinder/db/api.py#L58

description:

updated

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-12-29:

I suppose we can fix it using granular deployment. We can deploy all new env with 'enable_new_services' as false. After deploy we will set this option to true. At now moment we working on it.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-01-29:

@Vladimir, could you please update the current status of this issue?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-01-29:

If it will be resolved as a part of granular deployment feature, please provide a references to the patches

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2015-02-23:

#10

I see two options here:

1. Based on cluster.status == 'operational' make compute services disabled directly in puppet manifests which deploys them
2. Based on same status run separate task on all computes which are deployed after cluster is operational (it can be a part of compute group, or post_deployment task)

I will change assignee of this task, because any of this variant is doable without any changes to astute

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2015-02-23:

#11

If it needs to be fixed in 6.0.1 as well - there should be separate change in osnailyfacter/site.pp

Changed in fuel:
assignee:	Fuel Astute Team (fuel-astute) → Fuel Library Team (fuel-library)

Bogdan Dobrelya (bogdando) on 2015-02-23

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-23:

#12

This bug should be tracked as a blueprint due to the changes to deployment task and some implications. For example, nova host-aggregates could be used to achieve desired behavior, but that would imply only admin user context for related pre-/post-deployment tasks, see http://docs.openstack.org/havana/config-reference/content/host-aggregates.html

Also, enable_new_services could be used, but that would anyway imply additional deployment task, end-user, developer and docs impacts and cannot be tracked as a bug.

Note, that due to the deployment changes and granular deploy requirements, this improvemet cannot be backported for 6.0.

Changed in fuel:
status:	Triaged → Invalid

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-23:

#13

superseded by bp https://blueprints.launchpad.net/fuel/+spec/disable-new-computes

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-23:

#14

I returned it back to confirmed for 6.1 as it could be a quick fix for 6.1 possible (w/o nova host aggregates)

Changed in fuel:
status:	Invalid → Confirmed

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-27:

#16

related upstream bug https://bugs.launchpad.net/nova/+bug/1426332

As far as we cannot use the enable_new_services option yet, the w/a for the 6.1 release should be based on
nova-manage service disable/enable

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-05: Fix proposed to fuel-library (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/161664

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-03-05:

#18

A deployment orchestration has no an ability to trigger some actions at controller(s) after every new compute launched. That means that we should drop enable_new_services paramter and issue nova service-disable locally at computes at deployment stage and enable them back at post deploy

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-03-06:

#20

I tested the w/a for 6.1 https://review.openstack.org/#/c/161664/ and the results are good, new compute services is disabled within few seconds after have started http://paste.openstack.org/show/190436/

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-07:

#22

Updated the fix to follow a more simple way, which is:
- deploy nova-compute and cinder-volume services stopped and disabled;
- run and enable them back as a post-deploy hook

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-14: Fix merged to fuel-library (master)

#23

Reviewed: https://review.openstack.org/161664
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ca54e745d5096fd1eaf9beb4ed654e8c24542c5e
Submitter: Jenkins
Branch: master

commit ca54e745d5096fd1eaf9beb4ed654e8c24542c5e
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Feb 23 16:29:48 2015 +0100

Manage new compute/cinder services state

    * When deploying or scaling an Openstack environment,
      disable all of the nova-compute/cinder-volume services and use
      a separate post-deploy tasks to re-enable them back.
      That should be done to prohibit nova/cinder schedulers to assign
      tasks for compute/cinder nodes until the deployment or scaling
      is finished.
      (Note, that nova-computes fix may be re-implemented later with
      host-aggregates)

* Add enable_volumes parameter (default true) for openstack::cinder

    DocImpact
    Related blueprint disable-new-computes
    Closes-bug: #1398817

Change-Id: Ia63d043753693360a008ec89924cdcdd93c007f3
Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status:	In Progress → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

When scale up, new computes should be disabled by default

Remote bug watches

Bug watches keep track of this bug in other bug trackers.