Stop deployment on redeployment after reset failed with Orchestrator error

Bug #1282065 reported by Anastasiia Naboikina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Vladimir Sharshov

Bug Description

{"build_id": "2014-02-19_02-05-26", "mirantis": "no", "build_number": "161", "nailgun_sha": "f97f3edcd8056aba3d4863a93d0d6ea917e23657", "ostf_sha": "f86abe5544b5ffcf621e0c450bca15737c92361f", "fuelmain_sha": "0b9ba969d1cff3d9de78d9feb4fb0f4539fc74de", "astute_sha": "581643fb9ace27282150fa3951660a9796acb867", "release": "4.1", "fuellib_sha": "8f5fc7f397646933ffba3acab8bb665756caa58b"}

Steps to reproduce:
1. Install iso 161 on kvm.
2. Create cluster with the following parameters:
  - CentOS simple;
  - nova network DHCP flat;
  - choose Ceilometer;
  - choose Ceph for images;
  - change common setting for usage of common scheduler;
3. Add the following nodes:
   - 1 controller;
   - 1 compute + ceph;
   - 1 cinder + ceph;
4. Deploy cluster, wait until cluster successfully deploys;
5. Click reset cluster.
6. After reset, change network settings to VLAN.
7. Re-deploy cluster.
8. When controller starts to install OpenStack, stop cluster deployment.
9. Wait until stop finishes.

Expected result:
Cluster is successfully stopped, nodes are not in error state.

Actual result:
Cluster is in failed state, all nodes are in error state. There is an orchestrator error:

2014-02-19 12:20:47 ERR
[1560] Error running RPC method stop_deploy_task: killed thread, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:190:in `run'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:190:in `stop_current_task'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:156:in `stop_deploy_task'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:132:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:85:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:83:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:83:in `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:83:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:78:in `block in perform_service_job'"]

Tags: library
Revision history for this message
Anastasiia Naboikina (anaboikina) wrote :
Evgeniy L (rustyrobot)
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Vladimir Sharshov (vsharshov)
Revision history for this message
Anastasiia Naboikina (anaboikina) wrote :
Revision history for this message
Mike Scherbakov (mihgen) wrote :

I'm curios if this is really High priority issue. Looks like issue happens in some tricky situation... Vladimir, what's the status for it? We need to triage this.

Mike Scherbakov (mihgen)
Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Nikolay Markov (nmarkov)
Nikolay Markov (nmarkov)
Changed in fuel:
importance: High → Medium
Revision history for this message
Nikolay Markov (nmarkov) wrote :

I had a discussion with Vladimir sharshov about this bug, he said it is some kind of really rare case. We'll discuss possible solutions, but this is definitely not a blocker, I updated it's status to "Medium" until further discussion.

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Problem was here:
https://github.com/stackforge/fuel-web/blob/master/naily/lib/naily/dispatcher.rb#L190

This code just run main deploy process which should do some clean up actions. Error 'killed thread' can be raised only if main deploy process already dead. It is really rarely situation (cancel task running in time when deploy task almost done), but i can easily provide one line fix for this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/76098

Changed in fuel:
assignee: Nikolay Markov (nmarkov) → Vladimir Sharshov (vsharshov)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/76098
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=511153a10a8e1d5bbc0bbfd9078eebed04bb22a1
Submitter: Jenkins
Branch: master

commit 511153a10a8e1d5bbc0bbfd9078eebed04bb22a1
Author: Vladimir Sharshov <email address hidden>
Date: Mon Feb 24 13:57:23 2014 +0400

    New way to stop a main thread

    Use the kill instead of raise a custom exception.
    For some reason mcollective capture all exceptions
    if one of node becames inaccessible.

    Bug 1282065 closes because the problem condition was deleted.

    Change-Id: Ia7b9ef9734883a470bea592c398359f75b807d45
    Closes-Bug: #1283812
    Closes-Bug: #1282065

Changed in fuel:
status: In Progress → Fix Committed
tags: added: in progress
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #211
"build_id": "2014-02-26_13-39-45",
"mirantis": "yes",
"build_number": "211",
"nailgun_sha": "ea08cef3e06a72f47cfaa8cd8fe6d034e2cf722e",
"ostf_sha": "8e6681b6d06c7cb20a84c1cc740d5f2492fb9d85",
"fuelmain_sha": "baa8bb07393698f1186cb67bb65f1b93907c59bd",
"astute_sha": "10cccc87f2ee35510e43c8fa19d2bf916ca1fced",
"release": "4.1",
"fuellib_sha": "0a2e5bdc01c1e3bb285acb7b39125101e950ac72"

tags: removed: in progress
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.