[system-tests] Need to fix destructive tests disconnect controllers

Bug #1386702 reported by Tatyanka
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Tatyanka
5.1.x
Won't Fix
High
Fuel QA Team
6.0.x
Won't Fix
High
Fuel QA Team

Bug Description

in tests
 def ha_disconnect_controllers(self):
        """Disconnect controllers and check pacemaker status is correct

        Scenario:
            1. Disconnect eth3 of the first controller
            2. Check pacemaker status
            3. Revert environment
            4. Disconnect eth3 of the second controller
            5. Check pacemaker status
            6. Run OSTF

we run coomand
 remote.check_call('ifconfig eth2 down')

As result rabbit can not get ready cluster(there is no connectivity to the nodeby management net)

It is not valid case, so we need to change it

Tags: system-tests
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Unassigned it because I don't know what valid case is. Need to discuss the purpose of this test

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

we can block traffic via iptables to the management interface

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/136277

Changed in fuel:
status: New → In Progress
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote : Re: [System tests] Need to fix destructive tests disconnect controllers

Looks like the tests 'ha_destroy_controllers' and 'ha_disconnect_controllers' doesn't cover cases we want to test.
These tests just check that pacemaker can mark a controller as 'offline', but don't check if the cluster is still operational.

I suggest to rewrite these tests to the following scenario:

1) Revert a snapshot
2) Destroy or disconnect (depends on the test) the first controller,
3) assert_pacemaker() that the controller marked as 'offline'
4) Wait on a different controller for 'pacemaker' resources to become operational and vip__* resources migrated to the working controllers.
5) Run 'smoke' OSTF tests to make sure that the cluster is still operational.
6) Start or restore connectivity to the first controller,
7) Wait until pacemaker get the controller as 'online' (with assert_pacemaker() )
8) Wait for pacemaker resources to become operational on all controllers,
9) Run 'sanity' and 'smoke' OSTF tests.
10) Repeat the same from 1) to 9) for the second controller.

This will test:
1) How a cluster is continues working without connectivity to the primary or secondary controller?
2) How a cluster is recovering when the lost controller appears online after booting / restoring connection and how the lost controller may affect the cluster of two controllers.

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

I guess it would be better to create a separate bug report for the purpose.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

http://jenkins-product.srt.mirantis.net:8080/view/5.1_swarm/job/5.1_fuelmain.system_test.ubuntu.ha_neutron_destructive/46/consoleFull

as result
root@node-1:~# ip a | grep eth2
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
10: br-eth2: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
root@node-1:~#
10: br-eth2: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
root@node-1:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
Error: unable to connect to node 'rabbit@node-1': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@node-1']

rabbit@node-1:
  * connected to epmd (port 4369) on node-1
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on node-1
  * suggestion: start the node

current node details:
- node name: 'rabbitmqctl22820@node-1'
- home dir: /var/lib/rabbitmq
- cookie hash: soeIWU2jk2YNseTyDSlsEA==

root@node-1:~#

Changed in fuel:
status: In Progress → Triaged
assignee: Kirill Omelchenko (komelchenko) → Fuel QA Team (fuel-qa)
milestone: 6.0.1 → 6.1
summary: - [System tests] Need to fix destructive tests disconnect controllers
+ [system-tests] Need to fix destructive tests disconnect controllers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/168936

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Tatyanka (tatyana-leontovich)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/168936
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=0382f647dd3b9f3e567c12201a7706af56789cb9
Submitter: Jenkins
Branch: master

commit 0382f647dd3b9f3e567c12201a7706af56789cb9
Author: Tatyana Leontovich <email address hidden>
Date: Wed Mar 11 20:51:49 2015 +0200

    Remove shutdown of eth2 in disconnect scenario

    Replace shutdown of eth2 interface to block traffic br-mgmt

    Change-Id: I21dd9947add43a4395fc405ff1ec7689ddb8e92a
    Closes-bug: 1386702

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-main (master)

Change abandoned by Nastya Urlapova (<email address hidden>) on branch: master
Review: https://review.openstack.org/136277

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

MOS5.1 and MOS6.0 are no longer supported, moving to Won't Fix.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.