Can't re-commission a commissioning node

Bug #1323291 reported by Raphaël Badin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Raphaël Badin

Bug Description

The problem might become less severe when the upcoming work on robustness is complete (because commissioning will timeout eventually), but right now, if a node fails to commission or takes a really long time to commission, it's not possible to re-start the commissioning process.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Fixing this won't be as simple as you might think. Bear in mind we might need to power-cycle, but that means knowing the power status, for which we have no knowledge at this time.

tags: added: node-lifecycle
Revision history for this message
Raphaël Badin (rvb) wrote :

> Fixing this won't be as simple as you might think. Bear in mind we might need to power-cycle, but that means knowing the
> power status, for which we have no knowledge at this time.

What you're talking about is a different problem: it's about monitoring the result of a power action and it's indeed part of the upcoming node-lifecycle work.

This particular bug is only about the nodes workflow and, more precisely, the ability for a user to put a node back into the "DECLARED" status when it's "COMMISSIONING".

Changed in maas:
milestone: none → 1.6.0
assignee: nobody → Raphaël Badin (rvb)
status: Triaged → Fix Committed
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1323291] Re: Can't re-commission a commissioning node

On Tuesday 27 May 2014 13:13:00 you wrote:
> > Fixing this won't be as simple as you might think. Bear in mind we might
> > need to power-cycle, but that means knowing the power status, for which
> > we have no knowledge at this time.
>
> What you're talking about is a different problem: it's about monitoring
> the result of a power action and it's indeed part of the upcoming node-
> lifecycle work.

No, actually, I'm not :)

If a machine is commissioning and it is stuck, MAAS has no knowledge of what
its power state is. It needs to know that so it can take the right corrective
action on the node.

In addition, if there's a problem with the power control itself, the node is
not salvageable and needs to be marked as such.

> This particular bug is only about the nodes workflow and, more
> precisely, the ability for a user to put a node back into the "DECLARED"
> status when it's "COMMISSIONING".

Like I said, you can change the status in MAAS but if you don't know what
state the hardware is in, it's meaningless.

Revision history for this message
Raphaël Badin (rvb) wrote :

On 05/28/2014 01:51 AM, Julian Edwards wrote:
> On Tuesday 27 May 2014 13:13:00 you wrote:
>>> Fixing this won't be as simple as you might think. Bear in mind we might
>>> need to power-cycle, but that means knowing the power status, for which
>>> we have no knowledge at this time.
>>
>> What you're talking about is a different problem: it's about monitoring
>> the result of a power action and it's indeed part of the upcoming node-
>> lifecycle work.
>
> No, actually, I'm not :)
>
> If a machine is commissioning and it is stuck, MAAS has no knowledge of what
> its power state is. It needs to know that so it can take the right corrective
> action on the node.
>
> In addition, if there's a problem with the power control itself, the node is
> not salvageable and needs to be marked as such.

Very true. But again, this particular bug is about letting the *user*
mark the node as "declared" again, for the sake of getting out of a
situation where the only option would be to hack the DB manually (!).

>> This particular bug is only about the nodes workflow and, more
>> precisely, the ability for a user to put a node back into the "DECLARED"
>> status when it's "COMMISSIONING".
>
> Like I said, you can change the status in MAAS but if you don't know what
> state the hardware is in, it's meaningless.

It's not entirely meaningless for the reason mentioned above. Of
course, the situation as a whole and the maintainability of a MAAS
cluster in the face of failures will be greatly improved by the upcoming
robustness work.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.