failure to parse valid JSON commissioning data
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
High
|
Unassigned | ||
3.2 |
Triaged
|
High
|
Unassigned | ||
3.3 |
Triaged
|
High
|
Unassigned | ||
3.4 |
Triaged
|
High
|
Unassigned |
Bug Description
MAAS 3.2.10, upgraded stepwise from 2.9.3
Customer reports that the rackd machines go through a recommissioning step after the upgrade. During this recommissioning, something is not detected correctly from the JSON gathered by 20-maas-
File "/usr/lib/
if parent_
builtins.
This rackd server's topology had not changed and was working in the prior MAAS version, so the cause of the error is unclear. The workaround was to completely remove the controller and redeploy it with the same fabric/vlan/subnet assignments.
Ideally, MAAS should never fail on commissioning unless there is a true hardware error or environmental condition that prevents the ability to further deploy the machine. In cases where a detected network topology does not match the database, the result should be the creation of extra fabrics/
This could relate to the VLAN-subnet mapping changes in LP 2031482, which this customer is also experiencing.
Further troubleshooting information is available in the support case, which I'm happy to pass along out of band.
Changed in maas: | |
importance: | Undecided → High |
milestone: | none → 3.2.x |
Just adding some more information to this issue. The problem occurs when running this code in /usr/lib/ python3/ dist-packages/ metadataserver/ builtin_ scripts/ network. py:
def update_ vlan_interface( node, name, network, links):
"""Update a VLAN interface.
:param name: Name of the interface. "vlan"] ["vid"] "vlan"] ["lower_ device" ] objects. get(
node_config= node.current_ config, name=parent_name vlan_from_ links(node, links) nic.vlan. fabric_ id != vlan.fabric_id: <---- fails here
:param network: Network settings from commissioning data.
"""
vid = network[
parent_name = network[
parent_nic = Interface.
)
links_vlan = get_interface_
if links_vlan:
vlan = links_vlan
if parent_
That if condition just selects if an error message is shown or not and should not make the commissioning fail. The customer solved the issue by patching this code and removing that if line.
Looking at the json file there are four interfaces with vlans, two with lower_device eno1 and two with lower device bond0. Both of those parent devices have null vlan. That is my hypothesis as to why parent_ nic.vlan. fabric_ id can be NoneType.