Undercloud heat's max_json_body_size value is too low (again)

Bug #1741310 reported by Alan Bishop
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Luke Short

Bug Description

This appears to be a repeat of bug 1667697.

Scenario:

Downstream release based on stable/pike.
Error occurs when deploying a small overcloud (1 control, 1 compute)
- Overcloud nodes are real hardware
- Non-trivial network configuration
- Cinder services running in containers

The error:

 u'message': u"Failed to run action [action_ex_id=0bd20d27-42f0-46f2-9d6f-16c9ff6757df, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 240}']\n ERROR: Request limit exceeded: JSON body size (2099504 bytes) exceeds maximum allowed size (2097152 bytes).",
 u'status': u'FAILED'}

Workaround:

I was borrowing access to the systems, and didn't have time to do much more than patch a workaround. I increased max_json_body_size in the undercloud's /etc/heat/heat.conf, restarted the undercloud, and was then able to deploy the overcloud (problem went away).

Details:

% openstack overcloud deploy \
 --templates ~/pilot/templates/overcloud \
 -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
 -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
 -e ~/pilot/templates/overcloud/environments/network-isolation.yaml \
 -e ~/pilot/templates/network-environment.yaml \
 -e ~/pilot/templates/node-placement.yaml \
 -e ~/docker_registry_containerized_cinder.yaml \
 -e ~/containerized-cinder.yaml \
 --control-flavor baremetal \
 --compute-flavor baremetal \
 --control-scale 1 \
 --compute-scale 1 \
 --ntp-server 192.168.120.201

Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 0935a6c7-fe79-4bee-b3be-745d0de83502
Waiting for messages on queue '443bee40-4ea3-4414-9b3e-066355a917ce' with no timeout.
WARNINGS
[u"7 nodes with profile None won't be used for deployment now", u"7 nodes with profile None won't be used for deployment now"]
Configuration has 2 warnings, fix them before proceeding.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: d28e093c-93d5-4f8a-9725-ce5deece2548
Plan updated.
Processing templates in the directory /tmp/tripleoclient-yZj8Ob/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: fd697e56-3a57-44ef-b05c-0a05cd6a066b
Deploying templates in the directory /tmp/tripleoclient-yZj8Ob/tripleo-heat-templates
Started Mistral Workflow tripleo.deployment.v1.deploy_plan. Execution ID: acb28c92-5c08-4bb1-8304-3da672b3bca6
{u'execution': {u'created_at': u'2018-01-04 17:37:28',
                u'id': u'acb28c92-5c08-4bb1-8304-3da672b3bca6',
                u'input': {u'container': u'overcloud',
                           u'queue_name': u'dc65d443-2f0b-4123-b482-da5a8fa91e88',
                           u'run_validations': False,
                           u'skip_deploy_identifier': False,
                           u'timeout': 240},
                u'name': u'tripleo.deployment.v1.deploy_plan',
                u'params': {u'namespace': u''},
                u'spec': {u'description': u'Deploy the overcloud for a plan.\n',
                          u'input': [u'container',
                                     {u'run_validations': False},
                                     {u'timeout': 240},
                                     {u'skip_deploy_identifier': False},
                                     {u'queue_name': u'tripleo'}],
                          u'name': u'deploy_plan',
                          u'tags': [u'tripleo-common-managed'],
                          u'tasks': {u'add_validation_ssh_key': {u'input': {u'container': u'<% $.container %>',
                                                                            u'queue_name': u'<% $.queue_name %>'},
                                                                 u'name': u'add_validation_ssh_key',
                                                                 u'on-complete': [{u'run_validations': u'<% $.run_validations %>'},
                                                                                  {u'create_swift_rings_backup_plan': u'<% not $.run_validations %>'}],
                                                                 u'type': u'direct',
                                                                 u'version': u'2.0',
                                                                 u'workflow': u'tripleo.validations.v1.add_validation_ssh_key_parameter'},
                                     u'create_swift_rings_backup_plan': {u'input': {u'container': u'<% $.container %>',
                                                                                    u'queue_name': u'<% $.queue_name %>',
                                                                                    u'use_default_templates': True},
                                                                         u'name': u'create_swift_rings_backup_plan',
                                                                         u'on-error': u'create_swift_rings_backup_plan_set_status_failed',
                                                                         u'on-success': u'get_heat_stack',
                                                                         u'type': u'direct',
                                                                         u'version': u'2.0',
                                                                         u'workflow': u'tripleo.swift_rings_backup.v1.create_swift_rings_backup_container_
plan'},
                                     u'create_swift_rings_backup_plan_set_status_failed': {u'name': u'create_swift_rings_backup_plan_set_status_failed',
                                                                                           u'on-success': u'send_message',
                                                                                           u'publish': {u'message': u'<% task(create_swift_rings_backup_pl
an).result %>',
                                                                                                        u'status': u'FAILED'},
                                                                                           u'type': u'direct',
                                                                                           u'version': u'2.0'},
                                     u'deploy': {u'action': u'tripleo.deployment.deploy',
                                                 u'input': {u'container': u'<% $.container %>',
                                                            u'skip_deploy_identifier': u'<% $.skip_deploy_identifier %>',
                                                            u'timeout': u'<% $.timeout %>'},
                                                 u'name': u'deploy',
                                                 u'on-error': u'set_deployment_failed',
                                                 u'on-success': u'send_message',
                                                 u'type': u'direct',
                                                 u'version': u'2.0'},
                                     u'get_heat_stack': {u'action': u'heat.stacks_get stack_id=<% $.container %>',
                                                         u'name': u'get_heat_stack',
                                                         u'on-error': u'deploy',
                                                         u'on-success': [{u'set_stack_in_progress': u'<% "_IN_PROGRESS" in task(get_heat_stack).result.sta
ck_status %>'},
                                                                         {u'deploy': u'<% not "_IN_PROGRESS" in task(get_heat_stack).result.stack_status %
>'}],
                                                         u'type': u'direct',
                                                         u'version': u'2.0'},
                                     u'run_validations': {u'input': {u'group_names': [u'pre-deployment'],
                                                                     u'plan': u'<% $.container %>',
                                                                     u'queue_name': u'<% $.queue_name %>'},
                                                          u'name': u'run_validations',
                                                          u'on-error': u'set_validations_failed',
                                                          u'on-success': u'create_swift_rings_backup_plan',
                                                          u'type': u'direct',
                                                          u'version': u'2.0',
                                                          u'workflow': u'tripleo.validations.v1.run_groups'},
                                     u'send_message': {u'action': u'zaqar.queue_post',
                                                       u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>',
                                                                                                       u'message': u"<% $.get('message', '') %>",
                                                                                                       u'status': u"<% $.get('status', 'SUCCESS') %>"},
                                                                                          u'type': u'tripleo.deployment.v1.deploy_plan'}},
                                                                  u'queue_name': u'<% $.queue_name %>'},
                                                       u'name': u'send_message',
                                                       u'on-success': [{u'fail': u'<% $.get(\'status\') = "FAILED" %>'}],
                                                       u'retry': u'count=5 delay=1',
                                                       u'type': u'direct',
                                                       u'version': u'2.0'},
                                     u'set_deployment_failed': {u'name': u'set_deployment_failed',
                                                                u'on-success': u'send_message',
                                                                u'publish': {u'message': u'<% task(deploy).result %>',
                                                                             u'status': u'FAILED'},
                                                                u'type': u'direct',
                                                                u'version': u'2.0'},
                                     u'set_stack_in_progress': {u'name': u'set_stack_in_progress',
                                                                u'on-success': u'send_message',
                                                                u'publish': {u'message': u'The Heat stack is busy.',
                                                                             u'status': u'FAILED'},
                                                                u'type': u'direct',
                                                                u'version': u'2.0'},
                                     u'set_validations_failed': {u'name': u'set_validations_failed',
                                                                 u'on-success': u'send_message',
                                                                 u'publish': {u'message': u'<% task(run_validations).result %>',
                                                                              u'status': u'FAILED'},
                                                                 u'type': u'direct',
                                                                 u'version': u'2.0'}},
                          u'version': u'2.0'}},
 u'message': u"Failed to run action [action_ex_id=0bd20d27-42f0-46f2-9d6f-16c9ff6757df, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 240}']\n ERROR: Request limit exceeded: JSON body size (2099504 bytes) exceeds maximum allowed size (2097152 bytes).",
 u'status': u'FAILED'}

Revision history for this message
Alan Bishop (alan-bishop) wrote :

At shardy's suggestion, I'm attaching a tarball copy of all the custom env files. They are pretty innocuous, and don't do any get_file on large external files.

One thing to note is the overcloud deploy command specifies "--templates ~/pilot/templates/overcloud," but that directory is an exact copy of the stock templates in /usr/share/openstack-tripleo-heat-templates (this is quirk of the customer's installation tooling).

I suspect the custom templates directory increases file names enough to bloat the json body. The problem does not occur if I eliminate the custom templates directory from the deploy. In fact, removing the containerized-cinder.yaml env file is just enough to slip below the current json body limit.

Changed in tripleo:
milestone: none → queens-3
status: New → Incomplete
status: Incomplete → Triaged
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Luke Short (ekultails)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Luke Short (ekultails) wrote :

This was fixed on 2018-05-03.

https://review.opendev.org/#/c/558354/

The fix is available in >= Rocky. It sets the Heat variable value on the Undercloud: `HeatMaxJsonBodySize: 4194304` (4MB up from 2MB).

Revision history for this message
Luke Short (ekultails) wrote :

Backport for Queens sent upstream: https://review.opendev.org/669708

Changed in tripleo:
assignee: nobody → Luke Short (ekultails)
Revision history for this message
Luke Short (ekultails) wrote :

There is already a fix in for Queens so all supported OpenStack releases address this issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.