Tripleo deployment fails "cannot connect to cluster"

Bug #1831841 reported by Adam Ratcliff
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
New
Undecided
Unassigned

Bug Description

** Description
My new containers overcloud deployment fails at ansible overcloud config with a puppet error
Each node throws the "Cannot connect to cluster (is it running?)" error at deployment playbook Step 1

** Steps to reproduce

deploy rocky with network isolation - single nic vlans, ceph-ansible.

openstack overcloud deploy --templates -r templates/roles_data.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml --ntp-server 172.26.1.60 --timeout 190 --control-scale 1 --compute-flavor compute --compute-scale 2 --ceph-storage-flavor ceph-storage --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml -e templates/network-environment-overrides.yaml

repeatable with:
ansible-playbook -i inventory.yaml --private-key .ssh/id_rsa --become config-download/deploy_steps_playbook.yaml

** Expected result

Successful overcloud deployment

** Actual result

From the overcloud deploy / ansible console
...
2019-06-06 04:41:29,034 p=33230 u=mistral | TASK [Debug output for task: Run puppet host configuration for step 1] *********
2019-06-06 04:41:29,034 p=33230 u=mistral | Thursday 06 June 2019 04:41:29 +0000 (0:00:04.891) 0:06:30.104 *********
2019-06-06 04:41:29,120 p=33230 u=mistral | fatal: [overcloud-controller-0]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "error: Could not connect to cluster (is it running?)"
    ]
}
2019-06-06 04:41:29,198 p=33230 u=mistral | fatal: [overcloud-novacompute-1]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "error: Could not connect to cluster (is it running?)"
    ]
}
2019-06-06 04:41:29,287 p=33230 u=mistral | fatal: [overcloud-novacompute-0]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "error: Could not connect to cluster (is it running?)"
    ]
}
2019-06-06 04:41:29,339 p=33230 u=mistral | fatal: [overcloud-cephstorage-0]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "error: Could not connect to cluster (is it running?)"
    ]
}

From a node, Logging in to node and running puppet apply locally (ref http://hardysteven.blogspot.com/2018/02/debugging-tripleo-revisited-heat.html )

...
Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::tripleo::profile::base::time::ntp for overcloud-novacompute-0.localdomain (file: /var/lib/tripleo-config/puppet_step_config.pp, line: 37, column: 1) on node overcloud-novacompute-0.localdomain

** Environment

Centos7 SELinux Enforcing

Cluster of 5 (1 director, VM running on a dedicated node, 1 controller, 2 compute, 1 ceph).

** Logs & configs
See attached

Revision history for this message
Adam Ratcliff (adamjr) wrote :
Revision history for this message
Adam Ratcliff (adamjr) wrote :

Apparently the workaround is to build overcloud image(s) prior to deployment because somehow the puppet dependencies are then met. Testing now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.