rack controller remained in degraded state

Bug #1781241 reported by Jason Hobbs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned

Bug Description

This is with 2.4.0-6981-g011e51b7a-0ubuntu1~18.04.1.

After an HA deployment, one rack controller (qtmb3d) remained in a degraded state for over 20 minutes (the test timed out after that) (92% connected).

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
description: updated
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Jason,

Is this still an issue? have you seen it again?

Changed in maas:
status: New → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Also, even if degraded, MAAS should be able to continue to work. Are there any issues (e.g. machines failed to deploy, commission, etc).

Changed in maas:
milestone: none → 2.5.0alpha2
summary: - rack controller remained in degraded state
+ [2.4] rack controller remained in degraded state
Changed in maas:
milestone: 2.5.0alpha2 → 2.5.0beta1
Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1781241] Re: [2.4] rack controller remained in degraded state

We haven't seen it in a while, but reporting 'degraded' is a bug
either way, even if we're not seeing any failures. We have to be able
to rely on maas to tell us if it's healthy or not. If the flag doesn't
matter, then get rid of it.
On Thu, Aug 23, 2018 at 8:56 AM Andres Rodriguez
<email address hidden> wrote:
>
> ** Changed in: maas
> Milestone: 2.5.0alpha2 => 2.5.0beta1
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1781241
>
> Title:
> [2.4] rack controller remained in degraded state
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> This is with 2.4.0-6981-g011e51b7a-0ubuntu1~18.04.1.
>
> After an HA deployment, one rack controller (qtmb3d) remained in a
> degraded state for over 20 minutes (the test timed out after that)
> (92% connected).
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1781241/+subscriptions

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [2.4] rack controller remained in degraded state

Being in degraded state doesn't mean there's a *bug*. Being degraded means that there are a lot of things that could be wrong, bug or not. If there are no failures, that's good, because despite MAAS being in a degraded state, MAAS is working just fine.

e.g. a MAAS will be in degraded state if it is expecting 3 region controllers, but one of them is down. This doesn't mean there's a bug, this means there's a potential problem and MAAS is not working in its full potential because of a missing region.

Changed in maas:
milestone: 2.5.0beta1 → 2.5.0beta2
Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1781241] Re: [2.4] rack controller remained in degraded state

Sure - it means there is something we should investigate. We should never
be in a degraded state on a fresh deploy in our test lab.

On Wed, Sep 5, 2018 at 4:21 PM Andres Rodriguez <email address hidden>
wrote:

> Being in degraded state doesn't mean there's a *bug*. Being degraded
> means that there are a lot of things that could be wrong, bug or not. If
> there are no failures, that's good, because despite MAAS being in a
> degraded state, MAAS is working just fine.
>
> e.g. a MAAS will be in degraded state if it is expecting 3 region
> controllers, but one of them is down. This doesn't mean there's a bug,
> this means there's a potential problem and MAAS is not working in its
> full potential because of a missing region.
>
> ** Changed in: maas
> Milestone: 2.5.0beta1 => 2.5.0beta2
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1781241
>
> Title:
> [2.4] rack controller remained in degraded state
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> This is with 2.4.0-6981-g011e51b7a-0ubuntu1~18.04.1.
>
> After an HA deployment, one rack controller (qtmb3d) remained in a
> degraded state for over 20 minutes (the test timed out after that)
> (92% connected).
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1781241/+subscriptions
>

Changed in maas:
milestone: 2.5.0beta2 → 2.5.0rc1
Changed in maas:
milestone: 2.5.0rc1 → 2.5.x
Revision history for this message
Adam Collard (adam-collard) wrote : Re: [2.4] rack controller remained in degraded state

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Alexander Balderson (asbalderson) wrote :

We saw this again on maas_2.7.0-8232-g.6e1dba4ab-0ubuntu1~18.04.1

Revision history for this message
Alexander Balderson (asbalderson) wrote :
Changed in maas:
status: Expired → New
Changed in maas:
assignee: Blake Rouse (blake-rouse) → nobody
milestone: 2.5.x → none
summary: - [2.4] rack controller remained in degraded state
+ rack controller remained in degraded state
Revision history for this message
Michael Skalka (mskalka) wrote :

Saw this again here: https://solutions.qa.canonical.com/#/qa/testRun/d044f76b-a0be-48f3-9bc0-f9502541323b

From the logs DNS on the first maas node went down and the node never recovered. Link to the artifacts from the deployment at the bottom.

Revision history for this message
Michael Skalka (mskalka) wrote :
Revision history for this message
Joshua Genet (genet022) wrote :

Saw this again again here with the same issues as Michael's run above: https://solutions.qa.canonical.com/#/qa/testRun/c2bd543c-afd2-4212-87d2-afd89cac8bd0

Dougal Matthews (d0ugal)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Konstantinos Kaskavelis (kaskavel) wrote :

Closing this due to inactivity (low number of occurrences, and no hit for more than one year)

Changed in maas:
status: Triaged → Invalid
tags: added: solutions-qa-expired
Revision history for this message
Marian Gasparovic (marosg) wrote :

We hit it again today. One of the controllers shows a lot of messages not able to reach postgres VIP and regiond is down. That controller is shown as down and two others as degraded

Running 3.2/stable snap (3.2.6-12016-g.19812b4da) on jammy

https://oil-jenkins.canonical.com/artifacts/72c89cde-5289-46eb-9600-ec5c338d30ca/generated/generated/maas/logs-2023-01-11-07.49.25.tgz

Changed in maas:
status: Invalid → New
Changed in maas:
status: New → Triaged
milestone: none → 3.4.0
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.