Can't start deploy for selected nodes

Bug #1592868 reported by Dmitry Guryanov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Bulat Gaifullin
Mitaka
Fix Released
Critical
Bulat Gaifullin

Bug Description

I've added node to a cluster and provisioned it. But after trying to start deploy by a separate command, nothing is happening:

fuel node --node-id 1 --deploy

The message, appeared in logs:

2016-06-15 15:25:14.493 ERROR [7fe36436c880] (manager) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/task/manager.py", line 58, in _call_silently
    to_return = method(task, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 230, in message
    objects.NodeCollection.lock_nodes(nodes)
  File "/usr/lib/python2.7/site-packages/nailgun/objects/node.py", line 1243, in lock_nodes
    instances_ids = [instance.id for instance in instances]
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 578, in get
    value = state._load_expired(state, passive)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/state.py", line 474, in _load_expired
    self.manager.deferred_scalar_loader(self, toload)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 610, in load_scalar_attributes
    (state_str(state)))
DetachedInstanceError: Instance <Node at 0x2220ed0> is not bound to a Session; attribute refresh operation cannot proceed

The importance of this bug is critical because:
1. In large OpenStack environment, it is important to enable our customers to run deployment only on subset of nodes without running deployment for the whole cluster. This allows our users to incrementally deploy large & complex environment. This bug breaks this functionality completely without workaround.
2. This is a regression comparing to previous releases, it breaks feature which was implemented in Fuel 4.0 [1] and was available via CLI since this time.
3. In Fuel 9.0 we extended this functionality by adding support of this feature on UI, see [2]. This feature is also broken.

RCA:
1. This bug was introduced by https://review.openstack.org/#/c/314701/ in attempt to fix https://bugs.launchpad.net/fuel/+bug/1569859.
2. This bug was not detected neither by BVT nor by SWARM because of another issue in 9.0 mos repos - missing uwsgidecorators package, see https://bugs.launchpad.net/fuel/+bug/1572998

References:
[1] https://blueprints.launchpad.net/fuel/+spec/nailgun-separate-provisioning-and-deployment-handlers
[2] https://specs.openstack.org/openstack/fuel-specs/specs/9.0/allow-choosing-nodes-for-provisioning-and-deployment.html

Changed in fuel:
assignee: nobody → Bulat Gaifullin (bgaifullin)
Changed in fuel:
importance: Undecided → Critical
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Dmitry Klenov (dklenov)
tags: added: area-python
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/330023

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/330053

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/330023
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=1e85a771467fda37415a65aa469ab9a68057f2ad
Submitter: Jenkins
Branch: master

commit 1e85a771467fda37415a65aa469ab9a68057f2ad
Author: Bulat Gaifullin <email address hidden>
Date: Wed Jun 15 18:44:38 2016 +0300

    Fixed deployment of selected nodes

    The database session cannot be shared between processes, so
    ORM object cannot be passed to worker process.
    We should pass only ids of nodes
    and re-load the list of selected nodes in worker process.

    Change-Id: I272d9dca9c9c1cb125cf7ad4d2381a056b8d4198
    Closes-Bug: 1592868

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/330053
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=e2b85bafb68c348f25cb7cceda81edc668ba2e64
Submitter: Jenkins
Branch: stable/mitaka

commit e2b85bafb68c348f25cb7cceda81edc668ba2e64
Author: Bulat Gaifullin <email address hidden>
Date: Wed Jun 15 18:44:38 2016 +0300

    Fixed deployment of selected nodes

    The database session cannot be shared between processes, so
    ORM object cannot be passed to worker process.
    We should pass only ids of nodes
    and re-load the list of selected nodes in worker process.

    Change-Id: I272d9dca9c9c1cb125cf7ad4d2381a056b8d4198
    Closes-Bug: 1592868

Andrey Maximov (maximov)
description: updated
Revision history for this message
Andrey Maximov (maximov) wrote :

update:
 we merged fix to upstream master and stable, and created backport for downstream repository.
 next steps
  1.wait till backport passes CI (~1 hr)
  2.merge it to downstream repository
  3.run swarm against it.

Revision history for this message
Andrey Maximov (maximov) wrote :

update:
 1. backport passed CI successfully
 2. fix was merged to downstream repository
 3. downstream ISO build and BVT has been started (ISO 490)
next steps:
 1. wait till downstream ISO build and BVT completes
 2. run swarm against it.

Revision history for this message
Andrey Maximov (maximov) wrote :

update:
 1. downstream ISO build and BVT passed successfully.
 2. SWARM has been started.
next stesp:
 1. wait and analyze swarm results for ISO 490.

tags: added: on-verification
Revision history for this message
dkravchenko (dkravchenko) wrote :

Verified on ISO #477
(I've added node to a cluster and provisioned it, and deploying by a separate command started successfully: http://paste.openstack.org/show/516806/)

tags: removed: on-verification
tags: added: swarm-fail
Andrey Maximov (maximov)
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.