power_driver parameter is not preserved

Bug #1958451 reported by Marian Gasparovic
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alexsander de Souza
3.2
Fix Released
High
Alexsander de Souza
3.3
Triaged
Medium
Unassigned

Bug Description

We received new servers which don't work correctly with IPMI LAN_2 power driver, MAAS complains that no rack can access server BMC.

When I enlist the node with "power_parameters_power_driver=LAN" (IPMI 1.5), machine is commissioned but ends up with red power icon and power error. Subsequently it fails deployment, with error about wrong cipher suite.

I realized after the machine is enlisted, its power driver is changed to LAN_2. When I tried again I saw it was correct at the early stage of commissioning and incorrect during testing phase.

When I manually change power driver to LAN (IPMI 1.5) I can deploy the machine. After machine is released, parameter is still correct.

MAAS is snap 3.1/stable

Related branches

Revision history for this message
Alberto Donato (ack) wrote :

Could you please paste the output of the 30-maas-01-bmc-config commissioning script for the machine?

Alberto Donato (ack)
Changed in maas:
status: New → Incomplete
Revision history for this message
Marian Gasparovic (marosg) wrote :

modprobe: ERROR: could not insert 'ipmi_si': No such device
ERROR: Failed to commit `User3:Password'
INFO: Loading IPMI kernel modules...
INFO: Checking for HP Moonshot...
INFO: Checking for IPMI...
INFO: IPMI detected!
INFO: Reading current IPMI BMC values...
INFO: Configuring IPMI Lan_Channel...
INFO: Configuring IPMI Lan_Channel_Auth...
INFO: Lan_Channel_Auth settings unavailable!
INFO: Configuring IPMI cipher suite ids...
INFO: Gathering supported cipher suites and current configuration...
INFO: BMC supports the following ciphers - [1, 2, 3, 6, 7, 8, 11, 12, 15, 16, 17]
INFO: Current cipher suite configuration - XXXXXXXXXXXXXXX
INFO: New cipher suite configuration - XXXXXXXXXXXXXXX
INFO: MAAS will use IPMI cipher suite id "17" for BMC communication
WARNING: No K_g BMC key found or configured, communication with BMC will not use a session key!
INFO: Configuring IPMI Serial_Channel...
INFO: Serial_Channel settings unavailable!
INFO: Configuring IPMI SOL_Conf...
INFO: Found existing IPMI user "maas"!
INFO: Configuring IPMI BMC user "maas"...
INFO: IPMI user number - User3
INFO: IPMI user privilege level - Administrator
INFO: IPMI Version - LAN_2_0
INFO: IPMI boot type - efi

Changed in maas:
status: Incomplete → New
Revision history for this message
Alberto Donato (ack) wrote :

During commissioning, MAAS detects that IPMI2. is available and thus uses that.

From the previous output no cipher suite is enabled and MAAS falls back to 17, but can't really comunicate with the BMC.

Also note that you can use the `skip_bmc_config` flag when adding the machine from the API to prevent MAAS from configuring the BMC (thus overriding the provided config).

Changed in maas:
status: New → Invalid
Changed in maas:
status: Invalid → Triaged
importance: Undecided → High
Revision history for this message
Adam Collard (adam-collard) wrote :

Apparently `skip_bmc_config` has no effect - let's ensure that MAAS users who specify IPMI details get them persisted after commissioning

Changed in maas:
assignee: nobody → Alexsander de Souza (alexsander-souza)
status: Triaged → In Progress
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :
Download full text (3.3 KiB)

we found two issues while debugging this:

1) the user forces an IPMI version

maas root machines create hostname=test-13 \
    zone=zone1 architecture=arm64/generic \
    mac_addresses=XX:XX:XX:XX:XX:XX \
    power_type=ipmi \
    power_parameters_power_address=XXX.XXX.XXX.XXX \
    power_parameters_power_user=admin \
    power_parameters_power_pass=admin \
    power_parameters_power_driver=LAN \
    power_parameters_power_boot_type=efi

The observed behaviour is the machine powers up, commissioning is successful and then MAAS fails to power it off. What actually happens is that MAAS powers up the machine with the used-supplied power-parameters, runs 30-bmc-config with the default parameter values (maas_auto_ipmi_user and maas_auto_ipmi_user_password) and overwrites the user values with the detected ones (changing the power driver to LAN_2), and then it fails to power it off because the new values are invalid.

The fix is to change the script to use existing power parameters if present.

2) the user skips BMC configuration adding skip_bmc_config=1 to the command line

MAAS is running the script even when unsolicited in some scenarios. I'm still looking for what triggers this bug. The observed behaviour is that MAAS crashes when the script reports back its results, which is unexpected because the script status is skipped:

2022-02-04 16:13:35 maasserver: [error] ################################ Exception: ################################
2022-02-04 16:13:35 maasserver: [error] Traceback (most recent call last):
  File "/snap/maas/18203/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/snap/maas/18203/lib/python3.8/site-packages/maasserver/utils/views.py", line 284, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/snap/maas/18203/lib/python3.8/site-packages/maasserver/api/support.py", line 56, in __call__
    response = super().__call__(request, *args, **kwargs)
  File "/snap/maas/18203/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 20, in inner_func
    response = func(*args, **kwargs)
  File "/snap/maas/18203/usr/lib/python3.8/dist-packages/piston3/resource.py", line 197, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/snap/maas/18203/usr/lib/python3.8/dist-packages/piston3/resource.py", line 195, in __call__
    result = meth(request, *args, **kwargs)
  File "/snap/maas/18203/lib/python3.8/site-packages/maasserver/api/support.py", line 308, in dispatch
    return function(self, request, *args, **kwargs)
  File "/snap/maas/18203/lib/python3.8/site-packages/metadataserver/api.py", line 817, in signal
    target_status = process(node, request, status)
  File "/snap/maas/18203/lib/python3.8/site-packages/metadataserver/api.py", line 641, in _process_commissioning
    self._store_results(
  File "/snap/maas/18203/lib/python3.8/site-packages/metadataserver/api.py", line 529, in _store_results
    script_result.store_result(
  File "/snap...

Read more...

Changed in maas:
milestone: none → next
Changed in maas:
milestone: next → 3.2.0
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

In the upcoming release (3.2) we intend to help the user to detect the original problem, which was all ciphers being disabled either by the default BMC configuration or by a previous MAAS release.

While debugging this we came across two MAAS limitations that we aim to address in future releases:

1) power parameters detected by enlistment/commissioning scripts override values supplied by the user. Whether this is a bug or a feature is up for debate, and there's a workaround using a custom script to patch the file BMC_CONFIG_PATH with the desired values before it is uploaded.

2) the user cannot skip BMC configuration during enlistment. This is a result of the target host using an anonymous endpoint to download from MAAS scripts run before the enlistment, so MAAS is unaware of the host id and cannot give it instructions to skip any script. A possible fix is to split the BCM handling in two scripts, one to detect current power parameters and another to configure them. The former always run during enlistment and the latter runs during commissioning when the host id is already known.

Changed in maas:
milestone: 3.2.0 → 3.2.0-beta5
status: Fix Committed → Fix Released
no longer affects: maas/trunk
Changed in maas:
status: Fix Released → Fix Committed
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.0-beta1
Alberto Donato (ack)
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.