Mirantis OpenStack

[L3 HA] After banning active l3-agent all health agents are still standby

Bug #1524822 reported by Kristina Berezovskaia on 2015-12-10

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Fix Released	Medium	Kristina Berezovskaia	Mirantis OpenStack 8.0-updates
	9.x	Fix Released	Medium	Ann Taraday	Mirantis OpenStack 9.0

Bug Description

/var/lib/neutron/ha_confs/d20aa7e4-c009-4ae1-a7e5-88f7c209822a/state shows backup for all agents

Steps:
1) Create net1, subnet
2) Create net2, subnet
3) Create router, set gatawey and interfaces to both nets
4) boot vm1 in net1 ans associate floating
5) boot vm2 in net2
7) start ping vm1 from vm2 by floating and internal
8) ban active l3 agent
9) wait some time
10) Check ping
Expected result: ping is available
Current result: ping isn't available

Find on:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "264"
  build_id: "264"
  fuel-nailgun_sha: "0e09dce510927f2cc490b898e5fe3f813bd791be"
  python-fuelclient_sha: "f033192b84263f0e699458a4274289a5198ae7e4"
  fuel-agent_sha: "660c6514caa8f5fcd482f1cc4008a6028243e009"
  fuel-nailgun-agent_sha: "a33a58d378c117c0f509b0e7badc6f0910364154"
  astute_sha: "48fd58676debcc85951db68df6d77c22daa55e52"
  fuel-library_sha: "ab7e51f345ffb7c256e0f61addcf86553d7c3867"
  fuel-ostf_sha: "23b7ae2a1a57de5a3e1861ffb7805394ca339cc2"
  fuel-mirror_sha: "6534117233a5bdc51d7d47361bc7d511e4b11e6f"
  fuelmenu_sha: "fcb15df4fd1a790b17dd78cf675c11c279040941"
  shotgun_sha: "a0bd06508067935f2ae9be2523ed0d1717b995ce"
  network-checker_sha: "a3534f8885246afb15609c54f91d3b23d599a5b1"
  fuel-upgrade_sha: "1e894e26d4e1423a9b0d66abd6a79505f4175ff6"
  fuelmain_sha: "26adf12c320936a97a9b0a84169a6e58c530e848"
(3 controllers, 2 compute, neutron+vxlan+l3 ha)

This problem are not always reproduced.
Attach logs from l3 and server from all controllers

Tags:

Revision history for this message

Kristina Berezovskaia (kkuznetsova) wrote on 2015-12-10:

neutron_logs_standby.tar Edit (3.6 MiB, application/x-tar)

Alexander Ignatov (aignatov) on 2015-12-14

Changed in mos:
status:	New → Confirmed

Revision history for this message

Elena Ezhova (eezhova) wrote on 2015-12-23:

Seems this needs another repro and an env where the bug was reproduced.

Changed in mos:
assignee:	MOS Neutron (mos-neutron) → Kristina Kuznetsova (kkuznetsova)
status:	Confirmed → Incomplete

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-25:

No longer fixing Medium bugs in 8.0. MOS Neutron team, please give it another try in 9.0

tags:

added: area-neutron
removed: neutron

Revision history for this message

Kristina Berezovskaia (kkuznetsova) wrote on 2016-02-01:

neutron_l3_ha_logs.tar Edit (1.7 MiB, application/x-tar)

Reproduced one more time on 8.0. In this case I destroyed controller with active l3 agent instead of ban l3 agent.

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "478"
  build_id: "478"
  fuel-nailgun_sha: "ae949905142507f2cb446071783731468f34a572"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "481ed135de2cb5060cac3795428625befdd1d814"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "420c6fa5f8cb51f3322d95113f783967bde9836e"
  fuel-ostf_sha: "ab5fd151fc6c1aa0b35bc2023631b1f4836ecd61"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "fac143f4dfa75785758e72afbdc029693e94ff2b"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "6c6b088a3d52dd0eaf43d59f3a3a149c93a07e7e"
(l2+vxlan+l3)

Alexander Ignatov (aignatov) on 2016-03-16

Changed in mos:
status:	Incomplete → Won't Fix
no longer affects:	mos/8.0.x
Changed in mos:
milestone:	8.0 → 8.0-updates

Revision history for this message

Yury Tregubov (ytregubov) wrote on 2016-03-22:

The problem is reproducible. Seen already on several 9.0 mitaka iso: 59 79 and 89.

However ping between VMs created on the affected router works fine.

Kristina Berezovskaia (kkuznetsova) on 2016-03-30

tags:

added: keep-in-9.0

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-03-31:

The situation that we see in MOS 9.0 (agents stuck in standby state, connection working fine) is caused by absence of cleanup-script, which is already on review https://review.fuel-infra.org/#/c/18773/, as soon it will be merged this should be fixed.

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-04-07:

Change https://review.fuel-infra.org/#/c/18773/ merged

Revision history for this message

Kristina Berezovskaia (kkuznetsova) wrote on 2016-05-30:

Verify on 9.0
cat /etc/fuel_build_id:
355
cat /etc/fuel_build_number:
355
cat /etc/fuel_release:
9.0
cat /etc/fuel_openstack_version:
mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
fuel-release-9.0.0-1.mos6345.noarch
fuel-bootstrap-cli-9.0.0-1.mos282.noarch
fuel-migrate-9.0.0-1.mos8383.noarch
rubygem-astute-9.0.0-1.mos745.noarch
fuel-provisioning-scripts-9.0.0-1.mos8704.noarch
network-checker-9.0.0-1.mos72.x86_64
fuel-mirror-9.0.0-1.mos136.noarch
fuel-openstack-metadata-9.0.0-1.mos8704.noarch
fuel-notify-9.0.0-1.mos8383.noarch
nailgun-mcagents-9.0.0-1.mos745.noarch
python-fuelclient-9.0.0-1.mos315.noarch
fuelmenu-9.0.0-1.mos270.noarch
fuel-9.0.0-1.mos6345.noarch
fuel-utils-9.0.0-1.mos8383.noarch
fuel-setup-9.0.0-1.mos6345.noarch
fuel-library9.0-9.0.0-1.mos8383.noarch
shotgun-9.0.0-1.mos88.noarch
fuel-agent-9.0.0-1.mos282.noarch
fuel-ui-9.0.0-1.mos2695.noarch
fuel-ostf-9.0.0-1.mos934.noarch
fuel-misc-9.0.0-1.mos8383.noarch
python-packetary-9.0.0-1.mos136.noarch
fuel-nailgun-9.0.0-1.mos8704.noarch
(vxlan+l2+l3, 3 controller and 2 compute)

Steps:
1) Create net1, subnet
2) Create net2, subnet
3) Create router, set gatawey and interfaces to both nets
4) boot vm1 in net1 and associate floating
5) boot vm2 in net2
7) start ping vm1 from vm2 by floating and internal
8) ban active l3 agent
9) wait some time
10) Check ping and status ACTIVE for one of other agents
11) Ban one more active agent
12) Check ping and status ACTIVE for another agent
13) Clear 2 banned agents
14) Repeat steps 8-13 sevral times
Ping is available and one agent is always in Active state

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.