Default gateway from storage network is used if 'Assign public network to all nodes' feature is enabled

Bug #1404809 reported by Artem Panchenko
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Ihor Kalnytskyi
6.0.x
Fix Committed
High
Aleksey Kasatkin

Bug Description

api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-18_01-32-01
build_number: '56'
feature_groups:
- mirantis
fuellib_sha: 73332192a257ea02c40a39885c502ad1ebdf3eda
fuelmain_sha: 45caacadb878abfbd9d60e134d72229698b469c9
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'

This issue was reproduced on CI during system tests (Neutron HA with public network on computes):

http://jenkins-product.srt.mirantis.net:8080/view/6.0_swarm/job/6.0_fuelmain.system_test.ubuntu.ha_neutron/64/testReport/junit/%28root%29/deploy_neutron_gre_ha_with_public_network/deploy_neutron_gre_ha_with_public_network/
http://jenkins-product.srt.mirantis.net:8080/view/6.0_swarm/job/6.0_fuelmain.system_test.ubuntu.ha_neutron/64/testReport/junit/%28root%29/deploy_neutron_vlan_ha_with_public_network/deploy_neutron_vlan_ha_with_public_network/
http://jenkins-product.srt.mirantis.net:8080/view/6.0_swarm/job/6.0_fuelmain.system_test.centos.ha_neutron/65/#showFailuresLink
http://jenkins-product.srt.mirantis.net:8080/view/6.0_swarm/job/6.0_fuelmain.system_test.centos.ha_neutron/65/testReport/junit/%28root%29/deploy_neutron_gre_ha_with_public_network/deploy_neutron_gre_ha_with_public_network/

All tests failed with the following error:

'Check internet connectivity from a compute (failure)'

Steps to reproduce:

1. Create new environment (HA, Neutron, Cinder LVM for volumes). Add 3 controllers and 2 computes.
2. Enable 'Assign public network to all nodes' feature on settings tab
3. Deploy changes. Check internet connectivity from controllers and computes

Expected result:

- nodes are able to rich hosts in Internet via public network

Actual:

- internet hosts are unreachable

Here you can see that routing on nodes is configured incorrectly:

http://paste.openstack.org/show/153767/

Internet connection works fine when traffic is going via public interface:

http://paste.openstack.org/show/153768/

This is 'astute.yaml' file from controller:

http://paste.openstack.org/show/153770/

As you can see 'default_gateway: true' is assigned to br-storage interface. I guess this issue could be caused by this commit: https://github.com/stackforge/fuel-web/commit/5f91157daa6798ff522ca9f6d34e7e135f150a90 which fixed another bug: https://bugs.launchpad.net/fuel/+bug/1403560

Diagnostic snapshot is attached.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
description: updated
Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

> I guess this issue could be caused by this commit: https://github.com/stackforge/fuelweb/commit/5f91157daa6798ff522ca9f6d34e7e135f150a90

Nope, it was introduced a time ago. This commit just move 6.0-specific serialisation code to the specific serializer so the default_gateway won't be serialised for older releases.

As for this issue, we already have the fix - https://review.openstack.org/#/c/142136/

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

@Igor,

seems you are right. I wasn't able to reproduce the issue manually: when a gateway for storage network is undefined everything works fine. I think we can lower bug priority to medium and target it to 6.0.1

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Ok, the looks like we have the following scenario:

1. Currently, we iterate over endpoints and set default_gateway=True for the first bride with gateway.

https://github.com/stackforge/fuel-web/blob/29b5a9c8de15016757672a1a63cebd64dc76a408/nailgun/nailgun/orchestrator/deployment_serializers.py#L794-L797

2. In case of enabled 'Assign public network to all nodes' setting, we have next bridges in the endpoints dict - 'br-prv', 'br-fw-admin', 'br-mgmt', 'br-ex', 'br-storage'. These are keys, so during iteration we perform .keys() which returns the following iteration order:

>>> x.keys()
['br-prv', 'br-storage', 'br-ex', 'br-mgmt', 'br-fw-admin']

the 'br-prv' will be skipped, since it doesn't have gateway
the 'br-storage' will be used as a default_gateway since it has a gateway

3. In case of disabled 'Assign public network to all nodes' we have next bridges in the endpoints dict - 'br-fw-admin', 'br-mgmt', 'br-storage' and the .keys() method returns the following list:

>>> y.keys()
['br-fw-admin', 'br-storage', 'br-mgmt']

Conclusion:

Those cases that works, only works by accident. We can't rely on dict order (which is unpredictable because of hash-based sort).

Solution: apply this patch https://review.openstack.org/#/c/142136/

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Agree with Igor, the patch https://review.openstack.org/#/c/142136/ will help.
Also common deployments without that fix aren't affected by the issue: it's impossible (and unnecessary) to set gateway for 'storage'/'management' network in UI.
But, if someone attempts to use 'multiple-cluster-networks' feature (available only for clouds with Neutron+Gre networks) he will get broken routing to Internet on nodes with 'public' networks (by default controller nodes): they wont be accessible via public IP and will try to reach Internet hosts via gateway in 'storage' network. However, at the same time Internet on nova instances and floating IPs will work fine, because routing inside Neutron's router namespace will be correct.

Dmitry Pyzhov (dpyzhov)
tags: added: release-notes
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Confirmed → Won't Fix
tags: added: on verification
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on {

    "build_id": "2015-01-15_11-05-35",
    "ostf_sha": "92ad9f8e4c509c82e07ceb093b5d579205c76014",
    "build_number": "62",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "d243ec084d6ab230845541d0451ebb285f007a8e",
    "production": "docker",
    "fuelmain_sha": "",
    "astute_sha": "82125b0eef4e5a758fd4afa8917812e09a1f7dac",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.1",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "build_id": "2015-01-15_11-05-35",
                "ostf_sha": "92ad9f8e4c509c82e07ceb093b5d579205c76014",
                "build_number": "62",
                "api": "1.0",
                "nailgun_sha": "d243ec084d6ab230845541d0451ebb285f007a8e",
                "production": "docker",
                "fuelmain_sha": "",
                "astute_sha": "82125b0eef4e5a758fd4afa8917812e09a1f7dac",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.1",
                "fuellib_sha": "89f7c94d65f75ebff01898b40aa3931bd52a8a61"
            }
        }
    },
    "fuellib_sha": "89f7c94d65f75ebff01898b40aa3931bd52a8a61"

}

tags: removed: on verification
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/6.1.x
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Igor Kalnitsky (ikalnitsky)
milestone: 6.0 → 6.1
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Need to backport it to 6.0.1
logs of failure are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :
tags: added: release-notes-done
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-docs (stable/6.1)

Related fix proposed to branch: stable/6.1
Review: https://review.openstack.org/194961

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-docs (stable/6.1)
Download full text (45.4 KiB)

Reviewed: https://review.openstack.org/194961
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=0e26e7d7cc153d179ec34985645dd23cdd239ddb
Submitter: Jenkins
Branch: stable/6.1

commit 5cc5f0c643aebecaf3bf4580535a3ea7c3334a6c
Author: Mike Scherbakov <email address hidden>
Date: Tue Jun 23 13:43:35 2015 -0700

    Removed streamlined patching backend pieces

    Change-Id: I955e76ccdbd12a9145f4e9b689f80bdf9fcaf929

commit 563c4b5c78ebfcb1f4f91047c2919f6270f9a1d4
Author: Mike Scherbakov <email address hidden>
Date: Tue Jun 23 13:30:30 2015 -0700

    Removed outdated patching guide

    Change-Id: I76180c277789ade9c5ebedd19fe2092847c0b7d9

commit 8d120c14bec1ab41d448683ad146a3053a57c4ee
Author: Irina Povolotskaya <email address hidden>
Date: Tue Jun 23 19:59:11 2015 +0300

    Add dual hypervisor ref arch into 6.1 docs

    Change-Id: I900c24c9de878eafadbfc995aa879b7f55737fac

commit feebd1592d3305b64bbdfd0bc5fe108190aef120
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 18:38:17 2015 +0300

    [OPs guide] Running Ceilometer section edits

    1. conf file extract is updated
    2. note is updated

    Closes-bug: 1467817
    Change-Id: I0217e164108e0ba6c1397045a5e57d13ff429223

commit 44a93f9dead7511a3461ec35248dbb689c81eafd
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 18:04:40 2015 +0300

    [RN6_1] Final changes

    1. capitalization
    2. 2014.2 to 2014.2.2
    3. general improvements

    Change-Id: I45057e90c90550559f66bc67ccdf97a559fd9000

commit bb41389cae58084285688853281516b659686422
Author: evkonstantinov <email address hidden>
Date: Tue Jun 23 16:45:35 2015 +0300

    Update patching decription

    Update patching description with
    the standard Linux commands.

    Change-Id: Ia1a8346639c468fdfce15a11d2430bf3a4731244

commit bf3018fae3f2e564413d33aba6cdebf8868f0b4e
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 15:55:49 2015 +0300

    [RN6_1] Clean up

    1. Rearranges sections
    2. Improves RST
    3. Changes titles order

    Change-Id: I6110bf515667d3d6ba08ad35ff5d593dbc96641e

commit 1c7e4457808e8f2d6c56fdf31252170972e444b9
Author: Maria Zlatkova <email address hidden>
Date: Tue Jun 23 15:26:28 2015 +0300

    Replaces VBOX screenshots

    This patch:
    - replaces VBOX screenshots
    - changes the link for Download Mirantis VirtualBox scripts
     to https://docs.mirantis.com/openstack/fuel/fuel-master/#downloads

    Change-Id: I58dede960c5c3355d39b07ff44b757403f6af02c
    Closes-Bug: #1467872

commit 0a568bf53fc0e25d1d692d5d74b4a7b4d983bbcc
Author: evkonstantinov <email address hidden>
Date: Tue Jun 23 14:01:55 2015 +0300

    6.1 --separate repos

    change wording and add links to the
    separate repos feature.

    Change-Id: Ib5d0778a0d8f1534f79ed2f553574cb69a3150b0

commit 95a188b21cbdd064d92696b7920e6a0105fe0c56
Author: Maria Zlatkova <email address hidden>
Date: Tue Jun 23 12:07:28 2015 +0300

    Corrects the output 'pcs status'

    Changes the example outputs to appropriate ones.

    Change-Id: Ib6d83...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.