tripleo-quickstart failts at overcloud_prep_images.sh with Exception introspecting nodes

Bug #1785089 reported by David Rabel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
High
Unassigned

Bug Description

I did this on a fresh KVM virtual machine with nested virtualization enabled, 16GB RAM, 50GB HDD, 4 Cores, Centos 7 installed:

$ curl -O https://raw.githubusercontent.com/openstack/tripleo-quickstart/master/quickstart.sh
$ bash quickstart.sh --install-deps
$ bash quickstart.sh 127.0.0.2

After some hours it fails at overcloud_prep_images.sh with Exception introspecting nodes:

$ sudo -u stack virt-cat -d undercloud /home/stack/overcloud_prep_images.log

2018-08-02 15:01:54 | + source /home/stack/stackrc
2018-08-02 15:01:54 | +++ set
2018-08-02 15:01:54 | +++ awk '{FS="="} /^OS_/ {print $1}'
2018-08-02 15:01:54 | ++ NOVA_VERSION=1.1
2018-08-02 15:01:54 | ++ export NOVA_VERSION
2018-08-02 15:01:54 | ++ OS_PASSWORD=67012aa9c45bdce89538978188b1765e486be2ce
2018-08-02 15:01:54 | ++ export OS_PASSWORD
2018-08-02 15:01:54 | ++ OS_AUTH_TYPE=password
2018-08-02 15:01:54 | ++ export OS_AUTH_TYPE
2018-08-02 15:01:54 | ++ OS_AUTH_URL=https://192.168.24.2:13000/
2018-08-02 15:01:54 | ++ PYTHONWARNINGS='ignore:Certificate has no, ignore:A true SSLContext object is not available'
2018-08-02 15:01:54 | ++ export OS_AUTH_URL
2018-08-02 15:01:54 | ++ export PYTHONWARNINGS
2018-08-02 15:01:54 | ++ OS_USERNAME=admin
2018-08-02 15:01:54 | ++ OS_PROJECT_NAME=admin
2018-08-02 15:01:54 | ++ COMPUTE_API_VERSION=1.1
2018-08-02 15:01:54 | ++ IRONIC_API_VERSION=1.34
2018-08-02 15:01:54 | ++ OS_BAREMETAL_API_VERSION=1.34
2018-08-02 15:01:54 | ++ OS_NO_CACHE=True
2018-08-02 15:01:54 | ++ OS_CLOUDNAME=undercloud
2018-08-02 15:01:54 | ++ export OS_USERNAME
2018-08-02 15:01:54 | ++ export OS_PROJECT_NAME
2018-08-02 15:01:54 | ++ export COMPUTE_API_VERSION
2018-08-02 15:01:54 | ++ export IRONIC_API_VERSION
2018-08-02 15:01:54 | ++ export OS_BAREMETAL_API_VERSION
2018-08-02 15:01:54 | ++ export OS_NO_CACHE
2018-08-02 15:01:54 | ++ export OS_CLOUDNAME
2018-08-02 15:01:54 | ++ OS_IDENTITY_API_VERSION=3
2018-08-02 15:01:54 | ++ export OS_IDENTITY_API_VERSION
2018-08-02 15:01:54 | ++ OS_PROJECT_DOMAIN_NAME=Default
2018-08-02 15:01:54 | ++ export OS_PROJECT_DOMAIN_NAME
2018-08-02 15:01:54 | ++ OS_USER_DOMAIN_NAME=Default
2018-08-02 15:01:54 | ++ export OS_USER_DOMAIN_NAME
2018-08-02 15:01:54 | ++ '[' -z '' ']'
2018-08-02 15:01:54 | ++ export PS1=
2018-08-02 15:01:54 | ++ PS1=
2018-08-02 15:01:54 | ++ export 'PS1=${OS_CLOUDNAME:+($OS_CLOUDNAME)} '
2018-08-02 15:01:54 | ++ PS1='${OS_CLOUDNAME:+($OS_CLOUDNAME)} '
2018-08-02 15:01:54 | ++ export CLOUDPROMPT_ENABLED=1
2018-08-02 15:01:54 | ++ CLOUDPROMPT_ENABLED=1
2018-08-02 15:01:54 | + openstack overcloud image upload
2018-08-02 15:03:07 | Image "overcloud-full-vmlinuz" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+------------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+------------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | a940326a-7020-4f9a-9520-679e4fbf4773 | overcloud-full-vmlinuz | aki | 6234048 | active |
2018-08-02 15:03:07 | +--------------------------------------+------------------------+-------------+---------+--------+
2018-08-02 15:03:07 | Image "overcloud-full-initrd" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+-----------------------+-------------+----------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+-----------------------+-------------+----------+--------+
2018-08-02 15:03:07 | | b76caa28-d7d6-4654-96a6-cd710659a713 | overcloud-full-initrd | ari | 54712255 | active |
2018-08-02 15:03:07 | +--------------------------------------+-----------------------+-------------+----------+--------+
2018-08-02 15:03:07 | Image "overcloud-full" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+----------------+-------------+------------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+----------------+-------------+------------+--------+
2018-08-02 15:03:07 | | 60524702-fb98-4f67-aa2a-04348e1205a6 | overcloud-full | qcow2 | 1432289280 | active |
2018-08-02 15:03:07 | +--------------------------------------+----------------+-------------+------------+--------+
2018-08-02 15:03:07 | Image "bm-deploy-kernel" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | ea62719e-5f71-4cc2-901a-a739fbfa206b | bm-deploy-kernel | aki | 6234048 | active |
2018-08-02 15:03:07 | +--------------------------------------+------------------+-------------+---------+--------+
2018-08-02 15:03:07 | Image "bm-deploy-ramdisk" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+-------------------+-------------+-----------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+-------------------+-------------+-----------+--------+
2018-08-02 15:03:07 | | 448134e5-a07d-45c4-aef2-16573a0ea959 | bm-deploy-ramdisk | ari | 390504215 | active |
2018-08-02 15:03:07 | +--------------------------------------+-------------------+-------------+-----------+--------+
2018-08-02 15:03:07 | + openstack overcloud node import instackenv.json
2018-08-02 15:03:19 | Waiting for messages on queue 'tripleo' with no timeout.
2018-08-02 15:03:54 | Started Mistral Workflow tripleo.baremetal.v1.register_or_update. Execution ID: a5d74d3a-a8ba-4c06-884c-48c2f98a6483
2018-08-02 15:03:54 |
2018-08-02 15:03:54 |
2018-08-02 15:03:54 | 2 node(s) successfully moved to the "manageable" state.
2018-08-02 15:03:54 | Successfully registered node UUID e5a28fa9-b91c-49b7-81a7-840df045a1ea
2018-08-02 15:03:54 | Successfully registered node UUID 49ab1038-e3c2-4b5e-b6e1-f394d748d404
2018-08-02 15:03:54 | + openstack overcloud node introspect --all-manageable
2018-08-02 15:04:03 | Waiting for messages on queue 'tripleo' with no timeout.
2018-08-02 16:06:22 | Exception introspecting nodes: {u'status': u'RUNNING', u'node_uuids': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'failed_introspection': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'result': None, u'introspected_nodes': {u'49ab1038-e3c2-4b5e-b6e1-f394d748d404': {u'uuid': u'49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:12'}, u'e5a28fa9-b91c-49b7-81a7-840df045a1ea': {u'uuid': u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:09'}}, u'message': u'Retrying 2 nodes that failed introspection. Attempt 2 of 3 ', u'introspection_attempt': 2}
2018-08-02 16:06:22 | Waiting for introspection to finish...
2018-08-02 16:06:22 | Started Mistral Workflow tripleo.baremetal.v1.introspect_manageable_nodes. Execution ID: d7040ff7-9cb6-4843-bbc3-86edf247769a
2018-08-02 16:06:22 | Introspection of node 49ab1038-e3c2-4b5e-b6e1-f394d748d404 timed out.
2018-08-02 16:06:22 | Introspection of node e5a28fa9-b91c-49b7-81a7-840df045a1ea timed out.
2018-08-02 16:06:22 | Retrying 2 nodes that failed introspection. Attempt 2 of 3
2018-08-02 16:06:22 | Introspection of node e5a28fa9-b91c-49b7-81a7-840df045a1ea timed out.
2018-08-02 16:06:22 | Introspection of node 49ab1038-e3c2-4b5e-b6e1-f394d748d404 timed out.
2018-08-02 16:06:22 | Retrying 2 nodes that failed introspection. Attempt 3 of 3
2018-08-02 16:06:22 | Introspection of node 49ab1038-e3c2-4b5e-b6e1-f394d748d404 timed out.
2018-08-02 16:06:22 | Introspection of node e5a28fa9-b91c-49b7-81a7-840df045a1ea timed out.
2018-08-02 16:06:22 | Retry limit reached with 2 nodes still failing introspection
2018-08-02 16:06:22 | {u'status': u'RUNNING', u'node_uuids': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'failed_introspection': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'result': None, u'introspected_nodes': {u'49ab1038-e3c2-4b5e-b6e1-f394d748d404': {u'uuid': u'49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:12'}, u'e5a28fa9-b91c-49b7-81a7-840df045a1ea': {u'uuid': u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:09'}}, u'message': u'Retrying 2 nodes that failed introspection. Attempt 2 of 3 ', u'introspection_attempt': 2}

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

What is the generated undercloud-parameter-defaults.yaml (you can find it in the stack user home dir, or in the generated tarball by the end of undercloud deployment)? It should look like http://logs.openstack.org/18/589218/1/check/tripleo-ci-centos-7-undercloud-containers/21b7cda/logs/undercloud/home/zuul/undercloud-parameter-defaults.yaml.txt.gz

Does undercloud have its provisioning interface (it is likely eth1) included into br-ctlplane ovs bridge?

There had been a few containerized Ironic related fixes in tripleo heat templates master branch (Rocky), so you may want to retry with the latest t-h-t packages (once we have a promotion build with the recent t-h-t patches...).

Changed in tripleo:
status: New → Incomplete
milestone: none → rocky-rc1
importance: Undecided → High
Revision history for this message
David Rabel (rabel-b1) wrote :

So the current quickstart.sh is not working?

What would be the exact step to retry with the latest t-h-t packages?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The latest packages should be automatically picked up via quickstart. But you can also tweak it via -e release=current

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Sorry, I think the right arguments to pick the most recent packages and container images is
quickstart.sh ... -R master -e dlrn_hash_tag=current

Revision history for this message
David Rabel (rabel-b1) wrote :

With that parameters I get a different error:

$ bash quickstart.sh -R master -e dlrn_hash_tag=current 127.0.0.2
[...]
TASK [fetch-images : Get image expected checksum] ******************************
task path: /home/centos/.quickstart/tripleo-quickstart/roles/fetch-images/tasks/fetch.yml:70
Monday 13 August 2018 14:11:41 +0000 (0:00:00.245) 0:03:59.092 *********
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using
`result|failed` instead use `result is failed`. This feature will be removed in
 version 2.9. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
fatal: [127.0.0.2]: FAILED! => {"changed": true, "cmd": ["curl", "-sfL", "https://images.rdoproject.org/master/rdo_trunk/488107cdd0ae6fd9a6e51741c4bdd7cd5fb34cdb_d4217a48/overcloud-full.tar.md5"], "delta": "0:00:00.919391", "end": "2018-08-13 14:11:42.685405", "msg": "non-zero return code", "rc": 22, "start": "2018-08-13 14:11:41.766014", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Revision history for this message
David Rabel (rabel-b1) wrote :

Destroyed everything and ran quickstart.sh again without those parameters.

undercloud-parameter-defaults.yaml looks like this:

# sudo -u stack virt-cat -d undercloud /home/stack/undercloud-parameter-defaults.yaml
{
    "parameter_defaults": {},
    "resource_registry": {
        "OS::TripleO::Undercloud::Net::SoftwareConfig": "/usr/share/openstack-tripleo-heat-templates/net-config-undercloud.yaml"
    }
}

I'd have a closer look at the undercloud VM, but I can't SSH it:
# ssh -i /home/centos/.quickstart/id_rsa_undercloud stack@192.168.23.30
ssh_exchange_identification: read: Connection reset by peer

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Good news is that undercloud-parameter-defaults.yaml looks correct :)
I'll try to reproduce that on my local env.

Note, you can try virsh console or virt-manager GUI to access VMs with root creds (those can be set via -e modify_image_vc_root_password=r00tme or the like, I think)

Revision history for this message
David Rabel (rabel-b1) wrote :

:)

Something else seems to be wrong with my undercloud VM. It somehow crashed and now when I started it stays in "paused" state forever. Still got 4gb of free memory, so that shouldn't be the problem.

Changed in tripleo:
milestone: rocky-rc1 → stein-1
Revision history for this message
David Rabel (rabel-b1) wrote :

Meanwhile: Could you tell me any parameters or older versions of the quickstart script so I can use it anyway?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I couldn't reproduce that on my local devbox libvirt setup, introspection passed:

http://paste.openstack.org/show/wVpkQp4hG78x3On3q0gI/

Note, I'm used to run quickstart from a wrapper centos7:latest container as I do not have Centos installed on my devbox, but that does not really matter. My setup basically repeats the command

quickstart.sh ... -R master -e dlrn_hash_tag=current-tripleo

but just given:
* custom libvirt provisioning params,
* custom (non stack) user,
* custom local_working and working directories,
* custom patches to support SSH-less localhost deployments [0]
* config/environments/dev_privileged_libvirt.yml for privileged libvirt mode
* custom vbmc_libvirt_uri, which I needed in order to SSH from undercloud to my virthost with HOST_BREXT_IP=192.168.23.1

So you can prolly just ignore all of that and use keep using virthost 127.0.0.2 instead of localhost.

Anyway, here is the command and deployment logs I was testing with.

A) Libvirt provision finished, just interrupted due to the way I ran it from a container (omits direct editing of authorized_hosts of virthost) - see _quickstart.log tarball attached

B) Restarted with no teardown, also checks idempotency (after I manually updated virthost's authorized_hosts) - see _quickstart_continue.log tarball attached
...which is, basically, the original command from A had been added:
<...>
-v -e undercloud_use_custom_boot_images=true \
-e undercloud_custom_initrd=${IMAGECACHE}/overcloud-full.initrd \
-e undercloud_custom_vmlinuz=${IMAGECACHE}/overcloud-full.vmlinuz \
-e force_cached_images=true -e image_cache_expire_days=300 \
-T none localhost

The step B actually failed to run overcloud-prep-images.sh cuz of missing stackrc by my custom working_dir. I fixed that manually by copying it by the needed path, then retried overcloud-prep-images.sh, and it has passed.

[0] https://review.openstack.org/#/q/topic:localcon+(status:open+OR+status:merged)

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I wonder if mismatching flavors could be the cause of timing out introspection? See https://bugs.launchpad.net/tripleo/+bug/1788875

What is outputs for

 openstack flavors list
 openstack baremetal node list
 openstack baremetal node show <insert_controller/compute_name>
?

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.