[8.0] Deployment timeout when building ubuntu ibp image

Bug #1659823 reported by Vladimir Jigulin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Vladimir Jigulin

Bug Description

1. Create cluster. Set install Ceilometer option
2. Add 1 node with controller role
3. Add 1 nodes with compute role
4. Deploy the cluster

Expected result:
Deployment should be successfully

Actual result: (full trace: http://paste.openstack.org/show/6cWebHh9GYpNW4pWc2d8/)
Task 'deploy' has incorrect status. error != ready, 'Provision has failed. Failed to execute hook 'shell' command: cd / && fa_build_image --image_build_dir /var/lib/fuel/ibp --log-file /var/log/fuel-agent-env-1.log --data_driver nailgun_build_image --input_data '{"image_data": {"/boot": {"container": "gzip", "uri": "http://10.109.0.2:8080/targetimages/env_1_ubuntu_1404_amd64-boot.img.gz", "format": "ext2"}, "/": {"container": "gzip", "uri": "http://10.109.0.2:8080/targetimages/env_1_ubuntu_.............................../ubuntu/snapshots/8.0-latest", "priority": 1050, "suite": "mos8.0-security", "type": "deb"}], "codename": "trusty"}'
Task: 97755674-623e-47f9-8ce5-df527b296073: shell timeout error: execution expired
Task timeout: 3600, Retries: 1'

Reproducibility:
tempest 09.02.2017 (2/4)
tempest 26.01.2017 (4/4)
tempest 21.01.2017 (2/4)
swarm 26.01.2017 19 tests (5%) <- ssd disks

Part of /var/log/docker-logs/astute/astute.log: http://paste.openstack.org/show/596723/

Revision history for this message
Vladimir Jigulin (vjigulin) wrote :
description: updated
Changed in mos:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Maintenance (mos-maintenance)
Changed in mos:
assignee: MOS Maintenance (mos-maintenance) → Alexey Stupnikov (astupnikov)
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

It looks like a problem with network connectivity or fuel settings. I think that we need to restart the test first and check everything later.

description: updated
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

It turns out that rabbitmq can't set correct permissions for mcollective in time and mcollective fails to establish its connection (it just hangs waiting for rabbit's response). We start containers in a strict order based on their interdependecies, use puppet manifests to configure them correctly and check services for every container started. There is explicit check for mcollective permissions, but for some reason it doesn't work as it should. The next step is to compare logs in /var/log/puppet/ /var/log/docker-logs/puppet/ dirs and figure out what is going wrong out there.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :
Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

Reproduced again on 8.0 MU-4 iso #04/09/17 15:25 SWARM:
deploy_heat_ha
deploy_murano_ha_with_tun
deploy_sahara_ha_tun

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

It turns out that our QA code reboots master node during its update [1]. As a result, we start containers with 2 different methods: 'dockerctl start all' command and with systemd. As a result, mcollective container sometimes starts before rabbit init is finished and is unable to use it.

[1] https://github.com/openstack/fuel-qa/blob/stable-mu/8.0/fuelweb_test/models/environment.py#L726

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :
tags: added: maintenance non-release
Changed in mos:
status: Confirmed → In Progress
Changed in mos:
status: In Progress → Fix Committed
assignee: Alexey Stupnikov (astupnikov) → Vladimir Jigulin (vjigulin)
Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.