Volumes stays in creating status after node with keystone was shutdown

Bug #1496000 reported by Tatyanka
44
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dmitry Mescheryakov
6.0.x
Fix Released
High
Sergii Rizvan
6.1.x
Fix Released
High
Sergii Rizvan
7.0.x
Fix Released
High
Dmitry Mescheryakov

Bug Description

https://product-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.plugins.thread_keystone_separate_services/6/console

Steps:
 Scenario:
            1. Setup master node, install plugin SEPARATE_SERVICE_KEYSTONE_PLUGIN, Create cluster
            2. Add 3 nodes with controller role
            3. Add 3 nodes with keystone role
            4. Add 1 compute and cinder
            5. Verify networks
            6. Deploy the cluster
            7. Verify networks
            8. Run OSTF
            9. Destroy keystone node
           10. Wait HA is working
           11. Run OSTF

Expected Result:
Volumes tests from step 11 failed by timeout,
if we ssh on controller node we will see that it hang in creating status, new created volumes over cinder cli hangs on creating status too, as well as attempts to delete it over force-delete command

At the same time looks like cinder-volume do not get any data about scheduled volumes on it

Workaround: restart on service cinder-volume on volume node helps, and ostf tests got passed

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "295"
  build_id: "295"
  nailgun_sha: "16a39d40120dd4257698795f12de4ae8200b1778"
  python-fuelclient_sha: "2864459e27b0510a0f7aedac6cdf27901ef5c481"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "8e9a9ae51abbbd4edef1432809311004461eec94"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"
/etc/fuel/version.yaml (END)

Upstream bug: https://bugs.launchpad.net/oslo.utils/+bug/1502092

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Ivan Kolodyazhny (e0ne)
Changed in fuel:
assignee: MOS Cinder (mos-cinder) → Ivan Kolodyazhny (e0ne)
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :
Changed in fuel:
assignee: Ivan Kolodyazhny (e0ne) → Dmitry Mescheryakov (dmitrymex)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Talked with Ivan about the issue. From the exception Ivan found it is clear that RPC Executor in cinder-volume died because of unhandled exception. We need to fix this on Oslo side.

Changed in fuel:
status: New → Confirmed
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

reproduced again, so add one more snapshot here

tags: added: customer-found
tags: added: operations
description: updated
Revision history for this message
Roman Rufanov (rrufanov) wrote :

customer found on 6.1 - please provide fix.

tags: added: support
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/oslo.utils (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Dmitry Mescheryakov <email address hidden>
Review: https://review.fuel-infra.org/12417

Changed in fuel:
status: Confirmed → In Progress
description: updated
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/oslo.utils (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/12417
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 318c0f14ca00c96127aabd7c9773f018bec8dcde
Author: Dmitry Mescheryakov <email address hidden>
Date: Fri Oct 2 10:18:48 2015

Make forever_retry_uncaught_exceptions handle its own failures

When an exception occurs inside 'except' clause, it is not handled.
As a result, forever_retry_uncaught_exceptions fails with exception,
while by definition it should not.

For instance, oslo.messaging's RPC server relies on that
function to process any exception. When forever_retry_... fails
to do so, the server thread dies. An example could be
found in referenced bug.

Change-Id: I415a0f49b25b80a264f0bc951f4b926d57a9c9a8
Closes-Bug: #1496000

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
tags: added: on-verification
tags: removed: on-verification
tags: added: on-verification
Dmitry Pyzhov (dpyzhov)
tags: added: area-mos
Revision history for this message
Dmitriy Kruglov (dkruglov) wrote :

Verified on MOS 7.0, custom ISO. The issue is not reproduced.

ISO info:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "17"
  build_id: "17"
  nailgun_sha: "b43ba03685dff284d77aa05f480cde4ce2a5cca8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "4703e7333c89116f4cc0b2ccee5f981683825c54"
  fuel-library_sha: "ba38e4116c23a4b76c826221872d8bd614ea455b"
  fuel-ostf_sha: "b33e3950fda73f5fd8ebe0453121ca7cc0137c40"
  fuelmain_sha: "8ac0749d30aab906282361f2e6daa0a961a0bf6a"

tags: removed: on-verification
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/oslo.utils (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Dmitry Mescheryakov <email address hidden>
Review: https://review.fuel-infra.org/13675

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/oslo.utils (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/13675
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: b03d285eaad1fea64edf4bdb07f241bf28d75c0c
Author: Dmitry Mescheryakov <email address hidden>
Date: Thu Nov 5 11:25:02 2015

Make forever_retry_uncaught_exceptions handle its own failures

When an exception occurs inside 'except' clause, it is not handled.
As a result, forever_retry_uncaught_exceptions fails with exception,
while by definition it should not.

For instance, oslo.messaging's RPC server relies on that
function to process any exception. When forever_retry_... fails
to do so, the server thread dies. An example could be
found in referenced bug.

Change-Id: I415a0f49b25b80a264f0bc951f4b926d57a9c9a8
Closes-Bug: #1496000
(cherry picked from commit 318c0f14ca00c96127aabd7c9773f018bec8dcde)

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The fix is merged into Mitaka (https://review.openstack.org/#/c/230367/), so no need to assign the bug to 9.0.

Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Sergii Rizvan (srizvan) wrote :

Andrey, Roman, please provide us with description how to reproduce this bug on 6.1.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

@Sergii steps to reproduce looks like:
1. Create Ha cluster with 3 controller. Neutron as net provider, cinder lvm for storage (just leave defaults for storage)
2. add compute node, add cinder nodes, set networks, run network verification
3. Deploy cluster
4 When cluster is ready - run healthcheck - all tests should be passed here
5. Destroy controller (I believe it could be primary)
6. Waiting while ha services on controllers recovers after fail (so re-built quorums etc )
7. Run ostf one more time - (volumes tests are failed)
8. ssh on controller and try to create volume manually with cinder cli - volumes stay in creating status (some duplicate report with steps to reproduce is here https://bugs.launchpad.net/fuel/+bug/1513089 and looks like steps are actual for 6.1)

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

verified fir VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "152"
  build_id: "152"
  fuel-nailgun_sha: "e72e94138d159308e85a16c382e90b54c7bc7c79"
  python-fuelclient_sha: "e685d68c1c0d0fa0491a250f07d9c3a8d0f9608c"
  fuel-agent_sha: "07560a9fc3ce5301ace04d2d3e5d68db6ee4f8d5"
  fuel-nailgun-agent_sha: "3e9d17211d65c80bf97c8d83979979f6c7feb687"
  astute_sha: "959b06c5ef8143125efd1727d350c050a922eb12"
  fuel-library_sha: "31f6ae4ced72927287b513e9c4e3a24d367e7736"
  fuel-ostf_sha: "f169d495691ea3d40d3d6d0278265698d3f6ed14"
  fuel-createmirror_sha: "a034dcb06520df58a7338816900a431a6b61d83f"
  fuelmenu_sha: "8a32c53c1fa13b036000f589f96e876277dbd071"
  shotgun_sha: "25dd78a3118267e3616df0727ce746e7dead2d67"
  network-checker_sha: "a57e1d69acb5e765eb22cab0251c589cd76f51da"
  fuel-upgrade_sha: "1e894e26d4e1423a9b0d66abd6a79505f4175ff6"
  fuelmain_sha: "b5eb33ca7147dfda7a943a7f8f58c28e86d63992"

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Sergii Rizvan (srizvan) wrote :

I have reproduced steps, described in https://bugs.launchpad.net/fuel/+bug/1496000/comments/13 on MOS 7.0 and MOS 6.1.

In first case (MOS 7.0) issue was really reproduced: after HA services was recovered from fail, we could not create any cinder volume, because volumes stay in creating status. But on MOS 6.1 this issue was not reproduced: we could create volumes after recovering HA services. Only if we try to create volumes before rabbitmq cluster recovers, we can get cinder volumes in endless creating status, but after rabbitmq recovering there is no such issue.

Also from upstream bug description https://bugs.launchpad.net/oslo.utils/+bug/1502092 I found out, that initially issue was occurred after next code change in oslo.messaging http://paste.openstack.org/show/475157/. In MOS 6.1 oslo.messaging do not contains this change. So this bug is not related to 6.1.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Guys, this bug should have been on MOS-Oslo team, not on Fuel-library.

We bneed the fix for 6.1 as multiple customers has hit it and it didn't get even on 6.x MU roadmap

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Eugene, the issue was invalidated for 6.1 because Sergii was not able to reproduce it here. In my opinion 6.x and 5.x are all affected by this bug as I recall seeing similar symptoms in them.

Revision history for this message
Roman Rufanov (rrufanov) wrote :

Customer found on MOS 6.0

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/oslo.utils (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Dmitry Mescheryakov <email address hidden>
Review: https://review.fuel-infra.org/14422

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/oslo.utils (openstack-ci/fuel-6.0-updates/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0-updates/2014.2
Change author: Dmitry Mescheryakov <email address hidden>
Review: https://review.fuel-infra.org/14492

Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Code review and automated CI passed for 6.1 and 6.0 branches on Dec, 08.
When we may expect the next move on workflow?

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Miroslav, we don't expect MU for 6.0 anymore, as MOS6.0 is out of active support already. For 6.1 the best chance would be with MU5, which is not scheduled yet.

Revision history for this message
Roman Rufanov (rrufanov) wrote :

Customer found issue on 6.0 and need a fix ASAP. Please re-open for MOS 6.0. The issue was reported when it in Active support.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/oslo.utils (openstack-ci/fuel-6.0-updates/2014.2)

Reviewed: https://review.fuel-infra.org/14492
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.0-updates/2014.2

Commit: 901a59a3367afff8c21dfe0b2eb2a65ecf381214
Author: Dmitry Mescheryakov <email address hidden>
Date: Tue Dec 8 13:09:47 2015

Make forever_retry_uncaught_exceptions handle its own failures

When an exception occurs inside 'except' clause, it is not handled.
As a result, forever_retry_uncaught_exceptions fails with exception,
while by definition it should not.

For instance, oslo.messaging's RPC server relies on that
function to process any exception. When forever_retry_... fails
to do so, the server thread dies. An example could be
found in referenced bug.

Change-Id: I415a0f49b25b80a264f0bc951f4b926d57a9c9a8
Closes-Bug: #1496000
(cherry picked from commit 318c0f14ca00c96127aabd7c9773f018bec8dcde)

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/oslo.utils (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/14422
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 25d4a32adc93685f4288c8d8000ca3bc3c465f38
Author: Dmitry Mescheryakov <email address hidden>
Date: Fri Dec 4 16:21:58 2015

Make forever_retry_uncaught_exceptions handle its own failures

When an exception occurs inside 'except' clause, it is not handled.
As a result, forever_retry_uncaught_exceptions fails with exception,
while by definition it should not.

For instance, oslo.messaging's RPC server relies on that
function to process any exception. When forever_retry_... fails
to do so, the server thread dies. An example could be
found in referenced bug.

Change-Id: I415a0f49b25b80a264f0bc951f4b926d57a9c9a8
Closes-Bug: #1496000
(cherry picked from commit 318c0f14ca00c96127aabd7c9773f018bec8dcde)

Revision history for this message
Sergii Rizvan (srizvan) wrote :

Verified on MOS 6.0:
Packages:
python-oslo.utils
Version:
1.0.0-fuel6.0~mira17

Dmitry (dtsapikov)
tags: added: on-verification
Revision history for this message
Dmitry (dtsapikov) wrote :

Verified on 6.1 + mu5

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.