Error is difficult to parse when failure happens in sub-workflow

Bug #1621418 reported by Julie Pichon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Triaged
Medium
Unassigned

Bug Description

We have a workflow that calls another workflow that calls an Ironic action. If the Ironic action fails for some reason, by the time the message has floated back up to the end user there is a lot of extra information added (about the intermediate actions) that makes the error difficult to understand, and difficult to parse because it's all just one big string.

An example is with the 'provide' workflow we use at https://github.com/openstack/tripleo-common/blob/cbe0db/workbooks/baremetal.yaml#L91 . It calls 'set_node_state' which then calls an Ironic action.

We want to stop returning a hard-coded string to the end-user (bug 1620949). At https://github.com/openstack/tripleo-common/blob/cbe0db/workbooks/baremetal.yaml#L114 , I replaced the message with <% task(set_nodes_available).result %> in order to give actual feedback to the user on what went wrong. This is what the 'result' for 'message' looks like by default, with 1 failure:

Failure caused by error in tasks: set_provision_state\n\n set_provision_state [task_ex_id=26e7aed9-09a2-4769-a1d6-2c41a40f5742] -> Failed to run action [action_ex_id=b715740f-0c68-4505-bc74-3bf37afd9007, action_cls=\'<class \'mistral.actions.action_factory.IronicAction\'>\', attributes=\'{u\'client_method_name\': u\'node.set_provision_state\'}\', params=\'{u\'state\': u\'provide\', u\'node_uuid\': u\'45dda8e9-f553-4301-ab0e-5c465a425136\', u\'configdrive\': None, u\'cleansteps\': None}\']\n IronicAction.node.set_provision_state failed: <class \'ironicclient.common.apiclient.exceptions.BadRequest\'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available".\n [action_ex_id=b715740f-0c68-4505-bc74-3bf37afd9007, idx=0]: Failed to run action [action_ex_id=b715740f-0c68-4505-bc74-3bf37afd9007, action_cls=\'<class \'mistral.actions.action_factory.IronicAction\'>\', attributes=\'{u\'client_method_name\': u\'node.set_provision_state\'}\', params=\'{u\'state\': u\'provide\', u\'node_uuid\': u\'45dda8e9-f553-4301-ab0e-5c465a425136\', u\'configdrive\': None, u\'cleansteps\': None}\']\n IronicAction.node.set_provision_state failed: <class \'ironicclient.common.apiclient.exceptions.BadRequest\'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available".\n'

While there's useful information for developers debugging actions in there, as far as the end user is concerned only the initial action that failed is sufficient information (really, only --- IronicAction.node.set_provision_state failed: <class \'ironicclient.common.apiclient.exceptions.BadRequest\'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available". --- would be even better.)

I would like to be able to extract information on the original error directly from the workflow, but I'm not sure if it's currently possible. (Perhaps if Mistral returned the errors as a list so it's possible to get to the initial action that caused the problem straight away?)

It's entirely possible I'm misunderstanding how to work with workflows here. Is there a better way to give useful error feedback to the end-user in this kind of case?

Changed in mistral:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.