Fuel network check test reports failure for Neutron DHCP servers

Bug #1463935 reported by Chris Clason
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Łukasz Oleś
6.0.x
Won't Fix
High
Fuel Python (Deprecated)
6.1.x
Won't Fix
High
Fuel Python (Deprecated)
7.0.x
Fix Released
High
Łukasz Oleś

Bug Description

When adding additional nodes to an existing cloud deployed by Fuel, the end users may run the network validation check to make sure the new nodes and/or additional switches are properly configured. If the cloud has active tenants, the Fuel network check detects the tenant DHCP servers and reports a network check failure.

One suggestion is to parse out the DHCP servers detected with MAC addresses from the virtual MAC address range set by Fuel for clouds that are already deployed and do not cause these to fail the check. This does not require access to the OpenStack environment.

If access to the environment is permissible (user provides admin credentials), neutron can be queried to exclude the specific DHCP server MAC's used by tenants. This is probably overkill, but is one option.

Revision history for this message
Chris Clason (cclason) wrote :

Issue confirmed on 6.0, have not testing on 6.1 RC yet.

tags: added: module-netcheck
Changed in fuel:
milestone: none → 6.0.1
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
tags: added: release-notes
Revision history for this message
Dima Shulyak (dshulyak) wrote :

Every packet that goes to qdhcp namespace is tagged with local-significant vlan, e.g 1/2/3.
And dhcpchecker generates dhcp discover packets without any vlan tags. So, it is unclear why such problem exists.
But, as a workaround we can filter dhcp offer based on neutron mac addresses mask (exactly like Chris Clason suggests)

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Fix for this bug:

Add additional validation in nailgun, near:
https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/rpc/receiver.py#L1010-1012

That will check that mac not starts with pattern specified in - https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/fixtures/openstack.yaml#L432

Which gets copied to network config for cluster

Igor Marnat (imarnat)
tags: added: fuel-to-mos
Revision history for this message
Andrey Shestakov (ashestakov) wrote :

Neutron's dnsmasq should not response to requests from unknown MAC addresses
Can you provide diagnostic snapshot with logs?

Revision history for this message
Andrey Shestakov (ashestakov) wrote :

I didn’t manage to get the root cause problem. Do you have more details?

Revision history for this message
Chris Clason (cclason) wrote : Re: [Bug 1463935] Re: Fuel network check test reports failure for Neutron DHCP servers

I don't have access to the environment where this happened, but it was just
a standard MOS 6.0 install using Neutron + VLAN. If you ran our network
check, it reported failures for DHCP servers seen on the network. I traced
them back and confirmed they were the DHCP servers for the running tenant
environments.

On Thu, Aug 6, 2015 at 8:10 AM Andrey Shestakov <email address hidden>
wrote:

> I didn’t manage to get the root cause problem. Do you have more details?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1463935
>
> Title:
> Fuel network check test reports failure for Neutron DHCP servers
>
> Status in Fuel for OpenStack:
> Triaged
> Status in Fuel for OpenStack 6.0.x series:
> Triaged
> Status in Fuel for OpenStack 6.1.x series:
> Triaged
> Status in Fuel for OpenStack 7.0.x series:
> Triaged
>
> Bug description:
> When adding additional nodes to an existing cloud deployed by Fuel,
> the end users may run the network validation check to make sure the
> new nodes and/or additional switches are properly configured. If the
> cloud has active tenants, the Fuel network check detects the tenant
> DHCP servers and reports a network check failure.
>
> One suggestion is to parse out the DHCP servers detected with MAC
> addresses from the virtual MAC address range set by Fuel for clouds
> that are already deployed and do not cause these to fail the check.
> This does not require access to the OpenStack environment.
>
> If access to the environment is permissible (user provides admin
> credentials), neutron can be queried to exclude the specific DHCP
> server MAC's used by tenants. This is probably overkill, but is one
> option.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1463935/+subscriptions
>
--
Chris Clason
Principal Architect
http://www.mirantis.com
Mountain View, CA
Mobile +1.408.409.0295

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

My 5 cents on this.

I spent quite a time during 6.1 trying to debug this issue. The repro steps were the following:

* Deploy neutron + vlan env (1 controller, 1 compute)
* Add one new node to this env. (just discovery, bootstrap, nothing else)
* Launch new VM instance, so Nuutron's DHCP will be actiavted.
* Run network checker and catch the bug with probability of 50%.

I don't remember for sure what was wrong, but indeed we've got a message from neutron's dhcp. IIRC, we have catched not even DHCPOFFER, but BOOTP (but I may wrong.. it was so time ago).

Still the issue is present and I think it'd be better to figure our why exactly we've got response from neutron's dhcp instead to workaround it in network checker.

Revision history for this message
Łukasz Oleś (loles) wrote :

Ok, I think I know what happens.

Lease time for the nova instances is 10 minutes. So each 10 min every instance sends DHCPREQUEST. When network verification is running and instance sends DHCPREQUEST we will catch it.

I can reproduce this issue every time by running manually dhcp client on instance when network verification is started.

The solution will be to ignore answers from this instances by filtering the MACs. We can use base_mac from Neutron L2 Configuration setting.

I'm not sure if it's HIGH.

Revision history for this message
Chris Clason (cclason) wrote :

Yeah, I don't think it's high either, don't think I was the one that set it
to that priority. It's not causing many support calls, more like a medium
or low nice to fix eventually status...

On Wed, Aug 12, 2015 at 12:46 AM Łukasz Oleś <email address hidden> wrote:

> Ok, I think I know what happens.
>
> Lease time for the nova instances is 10 minutes. So each 10 min every
> instance sends DHCPREQUEST. When network verification is running and
> instance sends DHCPREQUEST we will catch it.
>
> I can reproduce this issue every time by running manually dhcp client on
> instance when network verification is started.
>
> The solution will be to ignore answers from this instances by filtering
> the MACs. We can use base_mac from Neutron L2 Configuration setting.
>
> I'm not sure if it's HIGH.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1463935
>
> Title:
> Fuel network check test reports failure for Neutron DHCP servers
>
> Status in Fuel for OpenStack:
> In Progress
> Status in Fuel for OpenStack 6.0.x series:
> Triaged
> Status in Fuel for OpenStack 6.1.x series:
> Triaged
> Status in Fuel for OpenStack 7.0.x series:
> In Progress
>
> Bug description:
> When adding additional nodes to an existing cloud deployed by Fuel,
> the end users may run the network validation check to make sure the
> new nodes and/or additional switches are properly configured. If the
> cloud has active tenants, the Fuel network check detects the tenant
> DHCP servers and reports a network check failure.
>
> One suggestion is to parse out the DHCP servers detected with MAC
> addresses from the virtual MAC address range set by Fuel for clouds
> that are already deployed and do not cause these to fail the check.
> This does not require access to the OpenStack environment.
>
> If access to the environment is permissible (user provides admin
> credentials), neutron can be queried to exclude the specific DHCP
> server MAC's used by tenants. This is probably overkill, but is one
> option.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1463935/+subscriptions
>
--
Chris Clason
Principal Architect
http://www.mirantis.com
Mountain View, CA
Mobile +1.408.409.0295

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/212478

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/212478
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=5b5c8ef7fca8d46213f148581afe5ccef16d19c1
Submitter: Jenkins
Branch: master

commit 5b5c8ef7fca8d46213f148581afe5ccef16d19c1
Author: Łukasz Oleś <email address hidden>
Date: Thu Aug 13 13:06:18 2015 +0200

    Listen only for answers to a host

    Without this change listener could catch any DISCOVER-OFFER
    communication. Even if it is between two another servers.
    This may lead to false negativce.

    Now listener will only catch answers sent to its requests.

    Change-Id: I70821de6f7ffab441bd4cab4d8ec3ccb1351c10b
    Closes-bug: #1463935

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-docs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/223472

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-docs (master)

Reviewed: https://review.openstack.org/223472
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=d957edef0b2356c9a6d2810f26f5a682427098f5
Submitter: Jenkins
Branch: master

commit d957edef0b2356c9a6d2810f26f5a682427098f5
Author: evkonstantinov <email address hidden>
Date: Tue Sep 15 11:13:10 2015 +0300

    Add tenant DHCP network verification resolved issue to relnotes

    Change-Id: I3ff198f7475d0cf8524192cf6b96d9a0bb00844c
    Related-Bug:#1463935

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

on verification

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

{"build_id": "301", "build_number": "301", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "301", "build_number": "301", "api": "1.0", "fuel-library_sha": "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd", "nailgun_sha": "4162b0c15adb425b37608c787944d1983f543aa8", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "50e90af6e3d560e9085ff71d2950cfbcca91af67", "production": "docker", "python-fuelclient_sha": "486bde57cda1badb68f915f66c61b544108606f3", "astute_sha": "6c5b73f93e24cc781c809db9159927655ced5012", "fuel-ostf_sha": "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c", "release": "7.0", "fuelmain_sha": "a65d453215edb0284a2e4761be7a156bb5627677"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd", "nailgun_sha": "4162b0c15adb425b37608c787944d1983f543aa8", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "50e90af6e3d560e9085ff71d2950cfbcca91af67", "production": "docker", "python-fuelclient_sha": "486bde57cda1badb68f915f66c61b544108606f3", "astute_sha": "6c5b73f93e24cc781c809db9159927655ced5012", "fuel-ostf_sha": "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c", "release": "7.0", "fuelmain_sha": "a65d453215edb0284a2e4761be7a156bb5627677"} verified

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

steps to verify:
1. ready env with 3 controllers + 2 computes
2. create instance
3. add new node to the env (without re-deploy)
4. Run network checker in the loop (50 times) an assert that it is pass
5. Provision node from step 3 and repeat step 4

Changed in fuel:
status: Fix Committed → Fix Released
no longer affects: fuel/6.0.1-updates
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0-updates as there is no delivery channel for Fuel fixes in 6.0

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Won't fix for 6.x series as they are unsupported and there was no progress on the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.