Unable to connect to any rack controller ; no connections available

Bug #1910783 reported by Marian Gasparovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned

Bug Description

MAAS maas_2.8.2-8577-g.a3e674063

vm-host compose failed with "Unable to connect to any rack controller ; no connections available."

/var/snap/maas/common/log/regiond.log contains

2021-01-08 06:14:11 root: [error] Unable to connect to any rack controller ; no connections available.
Traceback (most recent call last):
  File "/snap/maas/8980/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/utils/views.py", line 291, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/snap/maas/8980/usr/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/support.py", line 57, in __call__
    response = upcall(request, *args, **kwargs)
  File "/snap/maas/8980/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 21, in inner_func
    response = func(*args, **kwargs)
  File "/snap/maas/8980/usr/lib/python3/dist-packages/piston3/resource.py", line 190, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/snap/maas/8980/usr/lib/python3/dist-packages/piston3/resource.py", line 188, in __call__
    result = meth(request, *args, **kwargs)
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/support.py", line 313, in dispatch
    return function(self, request, *args, **kwargs)
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/support.py", line 163, in wrapper
    return func(self, request, *args, **kwargs)
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/pods.py", line 334, in compose
    machine = form.compose()
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/forms/pods.py", line 839, in compose
    ).wait(timeout)
  File "/snap/maas/8980/usr/lib/python3/dist-packages/crochet/_eventloop.py", line 231, in wait
    result.raiseException()
  File "/snap/maas/8980/usr/lib/python3/dist-packages/twisted/python/failure.py", line 385, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/snap/maas/8980/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/snap/maas/8980/lib/python3.6/site-packages/maasserver/rpc/regionservice.py", line 1164, in cancelled
    "available." % ",".join(identifiers)
provisioningserver.rpc.exceptions.NoConnectionsAvailable: Unable to connect to any rack controller ; no connections available.

maas logs in https://oil-jenkins.canonical.com/artifacts/f58f3c00-de5a-4a3a-b9ae-3cf410a18418/generated/generated/maas/logs-2021-01-08-06.14.46.tar

All artifacts in https://oil-jenkins.canonical.com/artifacts/f58f3c00-de5a-4a3a-b9ae-3cf410a18418/index.html

Tags: cdo-qa
Revision history for this message
Marian Gasparovic (marosg) wrote :

sorry about copy/paste error to bug title when creating it

summary: - Unable to connect to any rack controller ; no connections available.
- Traceback (most recent call last): File
- "/snap/maas/8980/usr/lib/python3/dist-
- packages/django/core/handlers/base.py", line 185, in _get_response
- response = wrapped_callback(request, *callback_args, **callback_kwargs)
- File "/snap/maas/8980/lib/python3.6/site-
- packages/maasserver/utils/views.py", line 291, in
- view_atomic_with_post_commit_savepoint return view_atomic(*args,
- **kwargs) File "/snap/maas/8980/usr/lib/python3.6/contextlib.py", line
- 52, in inner return func(*args, **kwds) File
- "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/support.py",
- line 57, in __call__ response = upcall(request, *args, **kwargs)
- File "/snap/maas/8980/usr/lib/python3/dist-
- packages/django/views/decorators/vary.py", line 21, in inner_func
- response = func(*args, **kwargs) File
- "/snap/maas/8980/usr/lib/python3/dist-packages/piston3/resource.py",
- line 190, in __call__ result = self.error_handler(e, request, meth,
- em_format) File "/snap/maas/8980/usr/lib/python3/dist-
- packages/piston3/resource.py", line 188, in __call__ result =
- meth(request, *args, **kwargs) File
- "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/support.py",
- line 313, in dispatch return function(self, request, *args,
- **kwargs) File "/snap/maas/8980/lib/python3.6/site-
- packages/maasserver/api/support.py", line 163, in wrapper return
- func(self, request, *args, **kwargs) File
- "/snap/maas/8980/lib/python3.6/site-packages/maasserver/api/pods.py",
- line 334, in compose machine = form.compose() File
- "/snap/maas/8980/lib/python3.6/site-packages/maasserver/forms/pods.py",
- line 839, in compose ).wait(timeout) File
- "/snap/maas/8980/usr/lib/python3/dist-packages/crochet/_eventloop.py",
- line 231, in wait result.raiseException() File
- "/snap/maas/8980/usr/lib/python3/dist-
- packages/twisted/python/failure.py", line 385, in raiseException
- raise self.value.with_traceback(self.tb) File
- "/snap/maas/8980/usr/lib/python3/dist-
- packages/twisted/internet/defer.py", line 653, in _runCallbacks
- current.result = callback(current.result, *args, **kw) File
- "/snap/maas/8980/lib/python3.6/site-
- packages/maasserver/rpc/regionservice.py", line 1164, in cancelled
- "available." % ",".join(identifiers)
- provisioningserver.rpc.exceptions.NoConnectionsAvailable: Unable to
- connect to any rack controller ; no connections available.
+ Unable to connect to any rack controller ; no connections available
Revision history for this message
Alberto Donato (ack) wrote :

Has the issue been seen multiple times or just once? Does it happen with latest maas too?

Changed in maas:
status: New → Incomplete
Revision history for this message
Marian Gasparovic (marosg) wrote :

It was just that one occurrence, we did not see it since.

Changed in maas:
status: Incomplete → New
Revision history for this message
Björn Tillenius (bjornt) wrote :

Ok, please report any new failures. Especially if it happens with 3.0.

Changed in maas:
status: New → Triaged
importance: Undecided → Medium
Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Rod Smith (rodsmith) wrote :
Download full text (5.5 KiB)

This has been happening to us recently with MAAS 3.3.4. It's a sporadic failure. I'm seeing the following in regiond.log:

2023-09-08 19:12:21 twisted.internet.protocol.Factory: [info] RegionServer conne
ction established (HOST:IPv6Address(type='TCP', host='::ffff:10.1.16.3', port=52
52, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.1.16.3',
 port=58048, flowInfo=0, scopeID=0))
2023-09-08 19:12:21 maasserver.rpc.regionservice: [info] Rack controller authent
icated from '::ffff:10.1.16.3:58044'.
2023-09-08 19:12:21 maasserver.rpc.regionservice: [info] Rack controller authent
icated from '::ffff:10.1.16.3:58048'.
2023-09-08 19:12:23 maasserver.ipc: [info] Worker pid:29451 registered RPC conne
ction to ('rsfc68', '10.1.16.3', 5252).
2023-09-08 19:12:25 maasserver.ipc: [info] Worker pid:29451 registered RPC conne
ction to ('rsfc68', '10.1.16.3', 5252).
2023-09-08 19:12:25 maasserver.dhcp: [info] Successfully configured DHCPv4 on ra
ck controller 'weavile (rsfc68)'.
2023-09-08 19:12:26 maasserver.dhcp: [info] Successfully configured DHCPv6 on ra
ck controller 'weavile (rsfc68)'.
2023-09-08 19:12:39 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK
 (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService
)
2023-09-08 19:13:02 maasserver.models.node: [info] hoggus: Turning on netboot fo
r node
2023-09-08 19:13:02 maasserver.models.node: [info] hoggus: Turning ephemeral dep
loy off for node
2023-09-08 19:15:21 maasserver: [error] Error while calling ScanNetworks: Unable
 to get RPC connection for rack controller 'weavile' (rsfc68).
2023-09-08 19:15:21 maasserver.regiondservices.active_discovery: [info] Active n
etwork discovery: Unable to initiate network scanning on any rack controller. Ve
rify that the rack controllers are started and have connected to the region.
2023-09-08 19:15:42 maasserver.models.signals.power: [critical] Failed to update
 power state of machine after state transition.
        Traceback (most recent call last):
          File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
            current.result = callback( # type: ignore[misc]
          File "/snap/maas/28521/lib/python3.10/site-packages/maasserver/models/node.py", line 6052, in cb_power_control
            d = getClientFromIdentifiers(client_idents)
          File "/snap/maas/28521/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 128, in wrapper
            return func(*args, **kwargs)
          File "/snap/maas/28521/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 60, in wrapper
            return maybeDeferred(func, *args, **kwargs)
        --- <exception caught here> ---
          File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
            current.result = callback( # type: ignore[misc]
          File "/snap/maas/28521/lib/python3.10/site-packages/maasserver/models/signals/power.py", line 46, in eb_error
            failure.trap(Node.DoesNotExist, UnknownPowerType, PowerProblem)
          File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/py...

Read more...

Revision history for this message
Rod Smith (rodsmith) wrote :
Jeff Lane  (bladernr)
Changed in maas:
status: Incomplete → Confirmed
Alberto Donato (ack)
Changed in maas:
status: Confirmed → New
Alberto Donato (ack)
Changed in maas:
importance: Medium → Undecided
Revision history for this message
Alberto Donato (ack) wrote :

does restarting maas fix the issue when this happens, or does disconnect/reconnect keep happening?

Changed in maas:
status: New → Incomplete
Revision history for this message
Marian Gasparovic (marosg) wrote :

We saw it six times since original reporting, last occurrence was in Aug 2023 while testing MAAS deb 3.3.5-13201-g.2715332b6-0ubuntu1~22.04.1
We cannot say whether restarting MAAS helps as we reinstall MAAS each time for the test

Changed in maas:
status: Incomplete → New
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

We'd like to take a look at an environment where this issue occurs, as we can't reproduce this locally. The current hypothesis is that failure.trap() re-raises an unhandled exception that drops one of the connections between region and rack. There's an earlier issue where such dropped connections would not be restored, which is fixed after 3.3.4.

Changed in maas:
importance: Undecided → Medium
milestone: none → 3.5.0
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.