Mirantis OpenStack

[8.0] Deployment timeout when building ubuntu ibp image

Bug #1659823 reported by Vladimir Jigulin on 2017-01-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Fix Released	High	Vladimir Jigulin	Mirantis OpenStack 8.0-updates

Bug Description

1. Create cluster. Set install Ceilometer option
2. Add 1 node with controller role
3. Add 1 nodes with compute role
4. Deploy the cluster

Expected result:
Deployment should be successfully

Actual result: (full trace: http://paste.openstack.org/show/6cWebHh9GYpNW4pWc2d8/)
Task 'deploy' has incorrect status. error != ready, 'Provision has failed. Failed to execute hook 'shell' command: cd / && fa_build_image --image_build_dir /var/lib/fuel/ibp --log-file /var/log/fuel-agent-env-1.log --data_driver nailgun_build_image --input_data '{"image_data": {"/boot": {"container": "gzip", "uri": "http://10.109.0.2:8080/targetimages/env_1_ubuntu_1404_amd64-boot.img.gz", "format": "ext2"}, "/": {"container": "gzip", "uri": "http://10.109.0.2:8080/targetimages/env_1_ubuntu_.............................../ubuntu/snapshots/8.0-latest", "priority": 1050, "suite": "mos8.0-security", "type": "deb"}], "codename": "trusty"}'
Task: 97755674-623e-47f9-8ce5-df527b296073: shell timeout error: execution expired
Task timeout: 3600, Retries: 1'

Reproducibility:
tempest 09.02.2017 (2/4)
tempest 26.01.2017 (4/4)
tempest 21.01.2017 (2/4)
swarm 26.01.2017 19 tests (5%) <- ssd disks

Part of /var/log/docker-logs/astute/astute.log: http://paste.openstack.org/show/596723/

See original description

Tags:

Revision history for this message

Vladimir Jigulin (vjigulin) wrote on 2017-01-27:

fail_error_deploy_ceilometer_ha_one_controller_multirole_diagnostic-logs_2017_01_27__03_29_50.tgz Edit (7.2 MiB, application/x-tar)

Vladimir Jigulin (vjigulin) on 2017-01-27

description:

updated

Vitaly Sedelnik (vsedelnik) on 2017-01-30

Changed in mos:
status:	New → Confirmed
importance:	Undecided → High
assignee:	nobody → MOS Maintenance (mos-maintenance)

Alexey Stupnikov (astupnikov) on 2017-01-30

Changed in mos:
assignee:	MOS Maintenance (mos-maintenance) → Alexey Stupnikov (astupnikov)

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-01-30:

It looks like a problem with network connectivity or fuel settings. I think that we need to restart the test first and check everything later.

Vladimir Jigulin (vjigulin) on 2017-02-10

description:

updated

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-02-15:

It turns out that rabbitmq can't set correct permissions for mcollective in time and mcollective fails to establish its connection (it just hangs waiting for rabbit's response). We start containers in a strict order based on their interdependecies, use puppet manifests to configure them correctly and check services for every container started. There is explicit check for mcollective permissions, but for some reason it doesn't work as it should. The next step is to compare logs in /var/log/puppet/ /var/log/docker-logs/puppet/ dirs and figure out what is going wrong out there.

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-03-29:

A job that should be fixed:
https://patching-ci.infra.mirantis.net/view/8.0.swarm/job/8.0.system_test.ubuntu.services_ha_one_controller

Revision history for this message

Vladimir Jigulin (vjigulin) wrote on 2017-04-11:

fail_error_deploy_heat_ha_diagnostic-logs_2017_04_10__06_09_02.tgz Edit (7.6 MiB, application/x-tar)

Reproduced again on 8.0 MU-4 iso #04/09/17 15:25 SWARM:
deploy_heat_ha
deploy_murano_ha_with_tun
deploy_sahara_ha_tun

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-04-18:

It turns out that our QA code reboots master node during its update [1]. As a result, we start containers with 2 different methods: 'dockerctl start all' command and with systemd. As a result, mcollective container sometimes starts before rabbit init is finished and is unable to use it.

[1] https://github.com/openstack/fuel-qa/blob/stable-mu/8.0/fuelweb_test/models/environment.py#L726