Maas with RBAC fails random commands with: Expecting value: line 1 column 1 (char 0)

Bug #1978957 reported by Bas de Bruijne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned
3.3
Triaged
Medium
Unassigned
3.4
Triaged
Medium
Unassigned

Bug Description

I have a Maas deployment setup with RBAC configauth. I'm having some problems with the stability of this configuration, arbitrary commands from both Maas and the juju maas controller end with `500 Internal Server Error (Expecting value: line 1 column 1 (char 0))`, e.g.:
```
2022-06-16-12:41:08 root ERROR [localhost] Command failed: juju bootstrap --bootstrap-constraints 'arch=amd64 tags=juju ' --config bootstrap-timeout=1800 --model-default /home/ubuntu/project/generated/model_defaults.yaml maas_cloud foundations-maas
2022-06-16-12:41:08 root ERROR [localhost] STDOUT follows:

2022-06-16-12:41:08 root ERROR [localhost] STDERR follows:
Creating Juju controller "foundations-maas" on maas_cloud
Looking for packaged Juju agent version 2.9.31 for amd64
Located Juju agent version 2.9.31-ubuntu-amd64 at https://streams.canonical.com/juju/tools/agent/2.9.31/juju-2.9.31-linux-amd64.tgz
Launching controller instance(s) on maas_cloud...
ERROR failed to bootstrap model: cannot start bootstrap instance: unexpected: ServerError: 500 Internal Server Error (Expecting value: line 1 column 1 (char 0))
```

In this particular command I am trying to bootstrap the juju maas controller. It fails most of the time on different places in the bootstrapping process with the same error. I did manage to successfully bootstrap once, but then juju failed again on this error when deploying a bundle.

Another example (on the maas host):
```
ubuntu@infra1$ maas root vm-hosts read type=virsh --debug

500 Internal Server Error

   Content-Length: 41
     Content-Type: text/plain; charset=utf-8
             Date: Thu, 16 Jun 2022 13:17:23 GMT
           Server: TwistedWeb/18.9.0
           Status: 500
             Vary: Cookie
  X-Frame-Options: SAMEORIGIN

Expecting value: line 1 column 1 (char 0)
```
And the same on a remote machine:
```
ubuntu@lab3-rufflet$ maas root vm-hosts read type=virsh --debug
500 Internal Server Error

   Content-Length: 41
     Content-Type: text/plain; charset=utf-8
             Date: Thu, 16 Jun 2022 13:14:40 GMT
           Server: TwistedWeb/18.9.0
           Status: 500
             Vary: Cookie
  X-Frame-Options: SAMEORIGIN

Expecting value: line 1 column 1 (char 0)
```
These errors happen only 1 in every ~10 commands.

The error message sounds like Maas is getting an empty response on a requests, I assume the response comes from the RBAC server because we do not have these problems when local authentication is used.

In the maas log:
```
regiond.log-2022-06-16 13:15:40 regiond: [info] 10.244.40.1 GET /MAAS/api/2.0/vm-hosts/?type=virsh HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Python-httplib2/0.14.0 (gzip))
regiond.log-2022-06-16 13:15:43 regiond: [info] 10.244.40.1 GET /MAAS/api/2.0/vm-hosts/?type=virsh HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.14.0 (gzip))
regiond.log-2022-06-16 13:16:07 regiond: [info] 10.244.40.30 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
regiond.log-2022-06-16 13:17:07 regiond: [info] 10.244.40.30 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
regiond.log-2022-06-16 13:17:14 regiond: [info] 10.244.40.1 GET /MAAS/api/2.0/vm-hosts/?type=virsh HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.14.0 (gzip))
regiond.log-2022-06-16 13:17:18 regiond: [info] 10.244.40.1 GET /MAAS/api/2.0/vm-hosts/?type=virsh HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.14.0 (gzip))
regiond.log-2022-06-16 13:17:20 regiond: [info] 10.244.40.1 GET /MAAS/api/2.0/vm-hosts/?type=virsh HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.14.0 (gzip))
regiond.log:2022-06-16 13:18:23 maasserver: [error] ################################ Exception: Expecting value: line 1 column 1 (char 0) ################################
regiond.log:2022-06-16 13:18:23 maasserver: [error] Traceback (most recent call last):
regiond.log- File "/snap/maas/19835/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 113, in _get_response
regiond.log- response = wrapped_callback(request, *callback_args, **callback_kwargs)
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/utils/views.py", line 284, in view_atomic_with_post_commit_savepoint
regiond.log- return view_atomic(*args, **kwargs)
regiond.log- File "/usr/lib/python3.8/contextlib.py", line 75, in inner
regiond.log- return func(*args, **kwds)
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/api/support.py", line 56, in __call__
regiond.log- response = super().__call__(request, *args, **kwargs)
regiond.log- File "/snap/maas/19835/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 20, in inner_func
regiond.log- response = func(*args, **kwargs)
regiond.log- File "/snap/maas/19835/usr/lib/python3.8/dist-packages/piston3/resource.py", line 197, in __call__
regiond.log: result = self.error_handler(e, request, meth, em_format)
regiond.log- File "/snap/maas/19835/usr/lib/python3.8/dist-packages/piston3/resource.py", line 195, in __call__
regiond.log- result = meth(request, *args, **kwargs)
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/api/support.py", line 308, in dispatch
regiond.log- return function(self, request, *args, **kwargs)
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/api/pods.py", line 464, in read
regiond.log- return Pod.objects.get_pods(request.user, PodPermission.view).order_by(
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/models/bmc.py", line 601, in get_pods
regiond.log- fetched = rbac.get_resource_pool_ids(
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/rbac.py", line 321, in get_resource_pool_ids
regiond.log- results = self._get_resource_pool_identifiers(user, *permissions)
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/rbac.py", line 375, in _get_resource_pool_identifiers
regiond.log- fetched = self.client.allowed_for_user(
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/rbac.py", line 139, in allowed_for_user
regiond.log- result = self._request("GET", url)
regiond.log- File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/macaroon_auth.py", line 291, in _request
regiond.log- content = resp.json()
regiond.log- File "/snap/maas/19835/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
regiond.log- return complexjson.loads(self.text, **kwargs)
regiond.log- File "/snap/maas/19835/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
regiond.log- return _default_decoder.decode(s)
regiond.log- File "/snap/maas/19835/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
regiond.log- obj, end = self.raw_decode(s)
regiond.log- File "/snap/maas/19835/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
regiond.log- return self.scan_once(s, idx=_w(s, idx).end())
regiond.log:simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

On the rbac server, I am seeing some database errors:
```
postgresql/postgresql-10-main.log:2022-06-16 12:51:07.225 UTC [14754] rbac@rbac ERROR: could not serialize access due to concurrent update
postgresql/postgresql-10-main.log-2022-06-16 12:51:07.225 UTC [14754] rbac@rbac STATEMENT: UPDATE service SET last_contact='2022-06-16T12:51:07.190373'::timestamp WHERE service.id = 1
```

But the occurrences of these messages data base errors does not seem to correlate to the ServerErrors.

The maas snap logs are in the attachments.

Tags: cdo-qa
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :
Bill Wear (billwear)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
Changed in maas:
milestone: none → 3.3.0
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Could you add RBAC logs to the issue?

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I pasted the RBAC logs here: https://pastebin.canonical.com/p/CgtdTrrZ44/
The timestamps probably don't line up with the maas logs, but it is the same problem.

Let me know if this is enough, if necessary I can enable RBAC debug logging and recreate the bug.

Changed in maas:
milestone: 3.3.0 → 3.4.0
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
Changed in maas:
milestone: 3.4.x → 3.5.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.