MAAS rack is scaling up the number of connections without limit due to a race condition
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
High
|
Unassigned | ||
3.3 |
Triaged
|
High
|
Unassigned | ||
3.4 |
Triaged
|
High
|
Unassigned | ||
3.5 |
Triaged
|
High
|
Unassigned |
Bug Description
From MAAS 3.3 the rackd might scale up to a big number of RPC connections due to a race condition in the scaling up logic.
All the rackd services might request a client to talk to the region with the `getClientNow`
@deferred
def getClientNow(self):
"""Returns a `Defer` that resolves to a :class:
connected to a region.
If a connection already exists to the region then this method
will just return that current connection. If no connections exists
this method will try its best to make a connection before returning
the client.
:raises: :py:class:
there no connections can be made to a region controller.
"""
try:
return self.getClient()
except exceptions.
return self._tryUpdate
except exceptions.
return self.connection
)
and
@PROMETHEUS_
@inlineCall
def scale_up_
for ev, ev_conns in self.connection
# pick first group with room for additional conns
if len(ev_conns) < self._max_
# spawn an extra connection
)
raise exceptions.
However, the cloned RPC connection is added to self.connection
see https:/
In particular, this bug is the responsible for the `Too many open files` exception that can show in the rackd logs.
summary: |
- MAAS rack is scaling up the number of connections without limit + MAAS rack is scaling up the number of connections without limit due to a + race condition |
description: | updated |
Changed in maas: | |
milestone: | 3.6.0 → 3.6.x |
For the time being, the best workaround is to increase the max number of open files of the rackd process