MAAS CLI command fails with nonce already used error

Bug #2033532 reported by Moises Emilio Benzan Mora
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Unassigned
3.6
Invalid
High
Unassigned

Bug Description

Similar to bug 1704489, during a test run using MAAS 3.3.4-13189-g.f88272d1e, we got the following error while executing a maas cli command:

2023-08-29-20:05:53 root DEBUG [localhost]: maas root machines read
2023-08-29-20:06:01 root ERROR [localhost] Command failed: maas root machines read
2023-08-29-20:06:01 root ERROR 1[localhost] STDOUT follows:
Authorization Error: 'Nonce already used: 124824781700065151201693339557'
2023-08-29-20:06:01 root ERROR 2[localhost] STDERR follows:
b''

Test run: https://solutions.qa.canonical.com/testruns/ba136e8c-6ad1-4f53-9e8c-098e15a47da6/
Artifacts: https://oil-jenkins.canonical.com/artifacts/ba136e8c-6ad1-4f53-9e8c-098e15a47da6/index.html

description: updated
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

Future Occurrences can be found in: https://solutions.qa.canonical.com/bugs/2033532

Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 3.3.x
Changed in maas:
milestone: 3.3.x → 3.5.x
summary: - [3.3.X] MAAS CLI command fails with nonce already used error
+ MAAS CLI command fails with nonce already used error
tags: added: bug-council
Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

Bumping to high as we have an environment to debug it now.
[Jacopo] We can check the DB for when this Nounce has been used before.

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Current status & what we see happening:
MAAS client issues a request to MAAS API with a unique nonce. This request appears to be stuck for 52 seconds, after which it drops with HTTP 419 (page expired). MAAS client appears to retry the connection using cached parameters, which leads to the "nonce already used" error.

There is nothing with a 52 second timeout in MAAS that we know of, so we suspect there is an intermediate party (caching proxy?) that is involved. We requested a tcpdump from the client side capturing the traffic when this issue is reproducible.

It appears that a restart of MAAS in TOR3 reduced the probability of occurrence of this issue.

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

The root cause has been tracked to the unintentional use of an internal proxy, that introduced a delay and a retry with an already-used nonce. The occurrence of this issue in the test lab has dropped off significantly (to zero, as far as we know) after the clients have been pointed to the appropriate MAAS endpoints instead of the internal proxy. Closing this for now based on this investigation. If the issue reoccurs for whatever reason, we will reopen the investigation.

no longer affects: maas/3.3
no longer affects: maas/3.4
no longer affects: maas/3.5
tags: removed: bug-council
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.