Verify Networks fails with unusable error message

Bug #1471085 reported by Rob Neff
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Sharshov
6.1.x
Won't Fix
Medium
Denis Puchkin

Bug Description

Steps, this a new Fuel Master using Fuel 6.1

1. HA Environment
2. 3 Controllers, 6 Compute Nodes, and 6 Ceph Nodes
3. Run Verify Networks before Deploying (have not deployed yet)

Verify Networks Runs (for quite a long time, it should have a 5 minute timeout)

Then it fails with an error message and says to look in the Astute Log:

This is what the Astute Log says:

2015-07-03 03:49:18 ERR
[594] Error running RPC method verify_networks: c7c32b04-f976-44e6-82d4-b84911b98077: MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 39 - Reason: execution expired
ID: 33 - Reason: execution expired
ID: 5 - Reason: execution expired
ID: 1 - Reason: execution expired
ID: 55 - Reason: execution expired
ID: 54 - Reason: execution expired
ID: 44 - Reason: execution expired
ID: 36 - Reason: execution expired
ID: 8 - Reason: execution expired
ID: 2 - Reason: execution expired
ID: 4 - Reason: execution expired
ID: 7 - Reason: execution expired
ID: 3 - Reason: execution expired
ID: 35 - Reason: execution expired
ID: 53 - Reason: execution expired
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:112:in `check_results_with_retries'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:60:in `method_missing'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/network.rb:135:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:167:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:159:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:129:in `block in verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

Reason:
This error is because there is no connectivity to the repositories during network checker which means there is probably a misconfiguration in the networking. What is happening is that network checker is configuring the network interfaces as part and then attempts to request all the repositories defined for the environment. Unfortunately there is not timeout on the urlaccesscheck when it goes to make the url request.

The default we use on the fuel-library side for similar functionaility after we run l23network is to wait at most 180 seconds per url and at most a 60 second open timeout and a 60 second read timeout.

https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/osnailyfacter/lib/puppet/parser/functions/url_available.rb

Because there is no timeout on the urlaccesschecker it's using the defaults which is greater than the timeout on the mcollective task

Workaround:
Go to the Settings tab and remove repos that aren't available from the nodes.

Revision history for this message
Rob Neff (rob-neff) wrote :

The Fuel 6.0 error log gave a much better error message.

Revision history for this message
Rob Neff (rob-neff) wrote :

How do I match the ID to a server?

Also, how do I know which network test failed and which passed? I can ping all of my physical servers running the bootstrap.

Revision history for this message
Rob Neff (rob-neff) wrote :

This time I limited it to 3 HA Controller nodes and 1 Compute node. Same problem.

2015-07-03 05:17:53 ERR
[577] Error running RPC method verify_networks: edd0820c-9362-4bdc-aec0-98134657019b: MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 39 - Reason: execution expired
ID: 33 - Reason: execution expired
ID: 8 - Reason: execution expired
ID: 54 - Reason: execution expired
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:112:in `check_results_with_retries'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:60:in `method_missing'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/network.rb:135:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:167:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:159:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:129:in `block in verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-07-03 05:17:53 ERR
[577] MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 39 - Reason: execution expired
ID: 33 - Reason: execution expired
ID: 8 - Reason: execution expired
ID: 54 - Reason: execution expired

Revision history for this message
Rob Neff (rob-neff) wrote :

With simple one controller & one compute node:

2015-07-03 05:17:53 ERR
[577] Error running RPC method verify_networks: edd0820c-9362-4bdc-aec0-98134657019b: MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 39 - Reason: execution expired
ID: 33 - Reason: execution expired
ID: 8 - Reason: execution expired
ID: 54 - Reason: execution expired
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:112:in `check_results_with_retries'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:60:in `method_missing'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/network.rb:135:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:167:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:159:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:129:in `block in verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-07-03 05:17:53 ERR
[577] MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 39 - Reason: execution expired
ID: 33 - Reason: execution expired
ID: 8 - Reason: execution expired
ID: 54 - Reason: execution expired

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Hello, please attach diagnostic snapshot.

Changed in fuel:
milestone: none → 7.0
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Rob Neff (rob-neff) wrote :

Btw, I accidentally pasted the same log twice. When I did just a controller and compute, there were only 2 ID's:

ID: 8 - Reason: execution expired
ID: 33 - Reason: execution expired

I've attached the Fuel Diagnostic Snapshot.

Revision history for this message
Boris (boris-mimeur) wrote :
Download full text (3.4 KiB)

Just wanted to report the same issue affecting Fuel 6.1 build from the 3rd of July.
I initially hit the issue on Fuel 6.1 released on the 23rd.

My setup is simple:
1 Controller (Controller + Mongo DB)
3 Compute nodes
1 Storage node

Network is setup the same way as on 6.0 where the same setup has a PASS for networking test so I know the physical network is not involved at this point.

It seems it fails on the controller node according to the astute logs.

- Log-

[632] Error running RPC method verify_networks: b91e8f64-99ac-41da-a681-8bd3e4d4d370: MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 2 - Reason: execution expired
, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:112:in `check_results_with_retries'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/mclient.rb:60:in `method_missing'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/network.rb:135:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:167:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:159:in `check_repositories_with_setup'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:129:in `block in verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:127:in `verify_networks'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

[632] MCollective call failed in agent 'net_probe', method 'check_repositories_with_setup', failed nodes:
ID: 2 - Reason: execution expired

[root@fuel ~]# fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|----------|------------------|---------|-------------|-------------------|-------|-------------------|--------|---------
2 | discover | PCTRL1 (6f:d0) | 1 | 10.110.0.8 | 0c:c4:7a:1f:6f:d0 | | controller, mongo | True | 1
5 | discover | Untitled (b2:b0) | 1 | 10.110.0.9 | 0c:c4:7a:3a:b2:b0 | | compute | True | 1
4 | discover | PSTOR1 (9d:bc) | 1 | 10.110.0.5 | 0c:c4:7a:31:9d:bc | ...

Read more...

Revision history for this message
Rob Neff (rob-neff) wrote :

I attached a diagnostic snapshot on 7/3, but the bug still shows Incomplete, so I am changing to "New" since I provided all of the information.

Please let me know if more information is needed.

Changed in fuel:
status: Incomplete → New
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Python Team (fuel-python)
status: New → Confirmed
importance: Medium → High
tags: added: customer-found
Revision history for this message
Rob Neff (rob-neff) wrote :

Is there a workaround?

tags: added: module-astute
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Vladimir Sharshov (vsharshov)
tags: added: tricky
Revision history for this message
Alex Schultz (alex-schultz) wrote :

This error is because there is no connectivity to the repositories during network checker which means there is probably a misconfiguration in the networking. What is happening is that network checker is configuring the network interfaces as part and then attempts to request all the repositories defined for the environment. Unfortunately there is not timeout on the urlaccesscheck when it goes to make the url request.

The default we use on the fuel-library side for similar functionaility after we run l23network is to wait at most 180 seconds per url and at most a 60 second open timeout and a 60 second read timeout.

https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/osnailyfacter/lib/puppet/parser/functions/url_available.rb

Because there is no timeout on the urlaccesschecker it's using the defaults which is greater than the timeout on the mcollective task

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Also if you have a repository that is unavailable, you may need to remove that from your environment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/206153

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/207134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Vladimir Sharshov (<email address hidden>) on branch: master
Review: https://review.openstack.org/207134
Reason: Already done by: https://review.openstack.org/#/c/205698/

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/206153
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=745fea3968189801233f6d0ec9fe4b49df6f3a04
Submitter: Jenkins
Branch: master

commit 745fea3968189801233f6d0ec9fe4b49df6f3a04
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Jul 27 19:53:07 2015 +0300

    New message about repository check fail

    Done:
    - change message about error;
    - add into mcollective log info about perfomed command;
    - add tests.

    Change-Id: I0c0c9b67157879e05fd4c43f67477d8de3b8f549
    Related-Bug: #1471085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/205698
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=d02af0200e8a67a0597d5301f8281a536f60f413
Submitter: Jenkins
Branch: master

commit d02af0200e8a67a0597d5301f8281a536f60f413
Author: Alex Schultz <email address hidden>
Date: Fri Jul 24 15:21:06 2015 -0500

    Add timeout value to 'urlaccesscheck check'

    This change adds a configurable timeout to 'urlaccesscheck check' to
    ensure the script does not hang if there are connectivity issues with
    the URL that is being checked. The default timeout is 60 seconds but
    is configurable via a command line option.

    This change also adds netifaces to the network_checker requirements.txt
    because it was not added as part of
    I910c15c2b39a917eb8428bb69271b5dde364b639 so test and installation may
    fail on a clean environment.

    Change-Id: I363712a6501cc21b89601d5373b5633d5518ca81
    Closes-Bug: 1478138
    Related-Bug: 1471085
    Related-Bug: 1476831
    Closed-Bug: 1478622

Revision history for this message
Andrew Woodward (xarses) wrote :

customer found also applies to 6.1, backport required

tags: added: on-verification
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Verified

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "284"
  build_id: "284"
  nailgun_sha: "5c33995a2e6d9b1b8cdddfa2630689da5084506f"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "8283dc2932c24caab852ae9de15f94605cc350c6"
  fuel-library_sha: "f81fdabe6c05be7a3d11d88a7c3a8f3931921c73"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.1 as there is no separate change to fix this particular issue.

Roman Rufanov (rrufanov)
tags: added: support
Revision history for this message
Luz Cazares (luz-cazares) wrote :

Hello I'm having this issue, I have a fuel master node with V.6.1 already deployed... is there a workaround?

description: updated
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Confirmed for 6.1-updated as now there is fix for 7.0 available

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (stable/6.1)

Related fix proposed to branch: stable/6.1
Review: https://review.openstack.org/281757

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/6.1)

Related fix proposed to branch: stable/6.1
Review: https://review.openstack.org/281789

Revision history for this message
Denis Puchkin (dpuchkin) wrote :

decrease to medium importance and won't fix because no data corruption or failure of a significant feature without workaround

tags: added: wontfix-low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-astute (stable/6.1)

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/281757
Reason: This branch (stable/6.1) is at End Of Life

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (stable/6.1)

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/281789
Reason: This branch (stable/6.1) is at End Of Life

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.