Network info instance cache gets corrupted when booting multiple instances at once

Bug #1422686 reported by Roman Podoliaka
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Roman Podoliaka
6.0.x
Fix Released
High
Denis Meltsaykin
6.1.x
Fix Released
High
Roman Podoliaka
7.0.x
Fix Released
High
MOS Nova

Bug Description

If user tries to boot multiple (e.g. 20+ instances) at once, some of them will have network_info value of instance info cache corrupted: it's missing port information. nova show/list won't show port information for such instances, while e.g. Horizon displays it correctly.

Ports actually exist and are configured correctly - there is connectivity to instances (ICMP/SSH).

Steps to reproduce:

1) boot multiple instance at once either using nova boot, or by the means of a Heat template provided below

2) check the nova list output - networks column will be empty for a few instances

Tags: murano nova
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

A Heat template reproducing this issue.

tags: added: nova
description: updated
description: updated
description: updated
Changed in mos:
status: New → Confirmed
milestone: none → 6.1
importance: Undecided → High
Changed in mos:
assignee: MOS Nova (mos-nova) → Alexander Gubanov (ogubanov)
Changed in mos:
assignee: Alexander Gubanov (ogubanov) → Roman Podoliaka (rpodolyaka)
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/6586

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/6586
Submitter: Artem Silenkov <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 8451d942c112dc2745e8564023a8f5e2c3b88f03
Author: Roman Podoliaka <email address hidden>
Date: Tue May 12 15:59:59 2015

Invalidate network info cache in periodic task

This is meant to be done in _heal_instance_info_cache() periodic
task, but the problem with it is that, if network info cache gets
corrupted due to a race condition, it won't ever be 'healed' again
as we use corrupted info to refresh the cache.

The idea here is to go to Neutron in _heal_instance_info_cache() and
get latest data to update our local cache.

Closes-Bug: #1422686

Change-Id: I390614246a3f03c815c43fe52de0473372480f1f

Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
Alexander Gubanov (ogubanov) wrote :

I've verified it on MOS 6.1 (build 432) - fixed!
Proof: http://paste.mirantis.net/show/425/

Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

Unfortunately, i have encountered this issue again on 432 build =(
http://paste.openstack.org/show/229463/

Snapshot available by link:
https://drive.google.com/a/mirantis.com/file/d/0B3tqtKJGirwwX3ZvcjdkakdhZ2s/view?usp=sharing

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/6840

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/6840
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 12ebcef48d739c9122f8850b2bcdec824fcdd7fc
Author: Roman Podoliaka <email address hidden>
Date: Thu May 21 14:14:01 2015

Fix healing of instance nw info cache for real

This was meant to fixed in 8451d942c112dc2745e8564023a8f5e2c3b88f03,
but it contained a typo, which prevented cache invalidation.

Additionally, this fixes annoying issue with `null` values instead
of network names in `nova show` output.

Closes-Bug: #1422686

Change-Id: I2dd74512e1fe44ae29d48fcbcc77bf86ad12b2b6

tags: added: on-verification
Revision history for this message
Sergey Novikov (snovikov) wrote :

I verified on fuel-6.1-466-2015-05-25_20-55-26.iso, but only for 15 instances with minimal flavor (m1.micro). My virtual env has low capacity for booting more number of instances.

Need verification for more number of instances.

tags: removed: nova on-verification
Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

Looks like the bug is fixed, but not for Murano environment.
Healing instance happen too slow and deployment of Murano Environment failed because one of VM's is without network.
http://paste.openstack.org/show/245321/

Can we do smth else?

tags: added: murano nova
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/7207

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/7207
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 6afa9e5737764f315f37b790e618ad8e7a1377d0
Author: Roman Podoliaka <email address hidden>
Date: Mon Jun 1 13:35:09 2015

Invalidate nw info cache on `network-changed` event

Upon a port update neutron-server notifies nova-api by the means of
`network-changed` event (e.g. when a floating IP is associated with
a port, etc). These events are accumulated in neutron-server and sent
in bulk for multiple instances. In turn, nova-api does an RPC call
to nova-compute passing instances as objects.

The problem is that by the time instances are retrieved from a DB
network_info cache could still be empty. It possible that
network_info will be set by nova-compute correctly first (upon
allocating a network) and then immideately overriden by the data,
that came from nova-api. Thus, network_info cache will be empty
until it's `healed` by a periodic task in nova-compute (by default
every 60 seconds it takes another instance and updates its cache).

While this doesn't affect actual networking setup for instances, it
may cause problems when using Heat templates and relying on output
IP addresses.

Closes-Bug: #1422686

Change-Id: I85274129816991d831176f3db2ecb6f7097e1680

tags: added: on-verification
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Verified on VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "499"
  build_id: "2015-06-02_22-09-35"
  nailgun_sha: "3830bdcb28ec050eed399fe782cc3dd5fbf31bde"
  python-fuelclient_sha: "4fc55db0265bbf39c369df398b9dc7d6469ba13b"
  astute_sha: "cbae24e9904be2ff8d1d49c0c48d1bdc33574228"
  fuel-library_sha: "938f033a5da90aca0c24c89c995cf01707d746d2"
  fuel-ostf_sha: "f899e16c4ce9a60f94e7128ecde1324ea41d09d4"
  fuelmain_sha: "bcc909ffc5dd5156ba54cae348b6a07c1b607b24"

Launched several times batches of 7-10 instances, but on very short on resources environments.
Networks column was filled for all instances in all cases.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/8351

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/8352

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/8353

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/8353
Reason: not needed in stable/kilo

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/8352
Reason: not needed in stable/kilo

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/8351
Reason: not needed in stable/kilo

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Revision history for this message
wangwenjian (send001) wrote :

I want to view the code diff, but when i haven't the permission.when i open the link, the following error occurs. Any one could help me ?

"Code Review - Error
The page you requested was not found, or you do not have permission to view this page."

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.0-updates/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0-updates/2014.2
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/11077

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.0-updates/2014.2)

Reviewed: https://review.fuel-infra.org/11077
Submitter: Denis V. Meltsaykin <email address hidden>
Branch: openstack-ci/fuel-6.0-updates/2014.2

Commit: a66a5c6c54ad578461b2400a421b07e302711445
Author: Roman Podoliaka <email address hidden>
Date: Wed Sep 2 15:27:31 2015

Invalidate network info cache in periodic task

This is meant to be done in _heal_instance_info_cache() periodic
task, but the problem with it is that, if network info cache gets
corrupted due to a race condition, it won't ever be 'healed' again
as we use corrupted info to refresh the cache.

The idea here is to go to Neutron in _heal_instance_info_cache() and
get latest data to update our local cache.

Closes-Bug: #1422686

Change-Id: I390614246a3f03c815c43fe52de0473372480f1f

tags: removed: on-verification
Revision history for this message
Alexander Gubanov (ogubanov) wrote :

Verified on MOS 7.0 (build 257)
Proof: http://pastebin.com/XpvpkijC

Revision history for this message
Vitaly Gusev (vgusev) wrote :

Verified on 6.0 with packets *nova*2014.2-fuel6.0~mira34_all.deb from mirror http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-6.0-updates-stable/ubuntu

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.