Maas with RBAC fails random commands with: Expecting value: line 1 column 1 (char 0)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
Medium
|
Unassigned | ||
3.3 |
Triaged
|
Medium
|
Unassigned | ||
3.4 |
Triaged
|
Medium
|
Unassigned |
Bug Description
I have a Maas deployment setup with RBAC configauth. I'm having some problems with the stability of this configuration, arbitrary commands from both Maas and the juju maas controller end with `500 Internal Server Error (Expecting value: line 1 column 1 (char 0))`, e.g.:
```
2022-06-16-12:41:08 root ERROR [localhost] Command failed: juju bootstrap --bootstrap-
2022-06-16-12:41:08 root ERROR [localhost] STDOUT follows:
2022-06-16-12:41:08 root ERROR [localhost] STDERR follows:
Creating Juju controller "foundations-maas" on maas_cloud
Looking for packaged Juju agent version 2.9.31 for amd64
Located Juju agent version 2.9.31-ubuntu-amd64 at https:/
Launching controller instance(s) on maas_cloud...
ERROR failed to bootstrap model: cannot start bootstrap instance: unexpected: ServerError: 500 Internal Server Error (Expecting value: line 1 column 1 (char 0))
```
In this particular command I am trying to bootstrap the juju maas controller. It fails most of the time on different places in the bootstrapping process with the same error. I did manage to successfully bootstrap once, but then juju failed again on this error when deploying a bundle.
Another example (on the maas host):
```
ubuntu@infra1$ maas root vm-hosts read type=virsh --debug
500 Internal Server Error
Content-Length: 41
Content-Type: text/plain; charset=utf-8
Date: Thu, 16 Jun 2022 13:17:23 GMT
Server: TwistedWeb/18.9.0
Status: 500
Vary: Cookie
X-Frame-Options: SAMEORIGIN
Expecting value: line 1 column 1 (char 0)
```
And the same on a remote machine:
```
ubuntu@
500 Internal Server Error
Content-Length: 41
Content-Type: text/plain; charset=utf-8
Date: Thu, 16 Jun 2022 13:14:40 GMT
Server: TwistedWeb/18.9.0
Status: 500
Vary: Cookie
X-Frame-Options: SAMEORIGIN
Expecting value: line 1 column 1 (char 0)
```
These errors happen only 1 in every ~10 commands.
The error message sounds like Maas is getting an empty response on a requests, I assume the response comes from the RBAC server because we do not have these problems when local authentication is used.
In the maas log:
```
regiond.
regiond.
regiond.
regiond.
regiond.
regiond.
regiond.
regiond.
regiond.
regiond.log- File "/snap/
regiond.log- response = wrapped_
regiond.log- File "/snap/
regiond.log- return view_atomic(*args, **kwargs)
regiond.log- File "/usr/lib/
regiond.log- return func(*args, **kwds)
regiond.log- File "/snap/
regiond.log- response = super()
regiond.log- File "/snap/
regiond.log- response = func(*args, **kwargs)
regiond.log- File "/snap/
regiond.log: result = self.error_
regiond.log- File "/snap/
regiond.log- result = meth(request, *args, **kwargs)
regiond.log- File "/snap/
regiond.log- return function(self, request, *args, **kwargs)
regiond.log- File "/snap/
regiond.log- return Pod.objects.
regiond.log- File "/snap/
regiond.log- fetched = rbac.get_
regiond.log- File "/snap/
regiond.log- results = self._get_
regiond.log- File "/snap/
regiond.log- fetched = self.client.
regiond.log- File "/snap/
regiond.log- result = self._request(
regiond.log- File "/snap/
regiond.log- content = resp.json()
regiond.log- File "/snap/
regiond.log- return complexjson.
regiond.log- File "/snap/
regiond.log- return _default_
regiond.log- File "/snap/
regiond.log- obj, end = self.raw_decode(s)
regiond.log- File "/snap/
regiond.log- return self.scan_once(s, idx=_w(s, idx).end())
regiond.
```
On the rbac server, I am seeing some database errors:
```
postgresql/
postgresql/
```
But the occurrences of these messages data base errors does not seem to correlate to the ServerErrors.
The maas snap logs are in the attachments.
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in maas: | |
milestone: | none → 3.3.0 |
Changed in maas: | |
milestone: | 3.3.0 → 3.4.0 |
Changed in maas: | |
milestone: | 3.4.0 → 3.4.x |
Changed in maas: | |
milestone: | 3.4.x → 3.5.0 |
Could you add RBAC logs to the issue?