MAAS 3.5 fails to boot machines because the rack is timing out retrieving the images
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Committed
|
Critical
|
Anton Troyanov | ||
3.5 |
Fix Released
|
Critical
|
Anton Troyanov |
Bug Description
In MAAS 3.5 with the following setup:
1) r00ta-ThinkPad-
172.0.2.0/24
10.14.10.0/24
fd42:
10.194.168.0/21
2) novel-mantis: rack with access to the following subnets:
172.0.2.0/24
10.35.173.0/24
fd42:
20.0.1.0/24
I'm trying to boot some machines that are on the subnet 20.0.1.0/24. These machines most of the time fail to boot because they don't get the bootloader (see screenshot)
looking at the rack logs in novel-mantis I see
Apr 23 15:37:38 novel-mantis maas-rackd[1761]: provisioningser
Apr 23 15:37:38 novel-mantis maas-rackd[1761]: provisioningser
and in the maas-agent logs in novel-mantis, I see that
Apr 23 15:37:38 novel-mantis maas-agent[13651]: 2024/04/23 15:37:38 http: proxy error: dial tcp [fd42:7bdd:
But novel-mantis DOES NOT have access to fd42:7bdd:
Related branches
- Anton Troyanov: Approve
-
Diff: 126 lines (+76/-9)2 files modifiedsrc/maasagent/internal/httpproxy/service.go (+46/-9)
src/maasagent/internal/httpproxy/service_test.go (+30/-0)
- MAAS Lander: Approve
- Jacopo Rota: Approve
- Christian Grabowski: Approve
-
Diff: 126 lines (+76/-9)2 files modifiedsrc/maasagent/internal/httpproxy/service.go (+46/-9)
src/maasagent/internal/httpproxy/service_test.go (+30/-0)
Changed in maas: | |
milestone: | 3.5.0 → 3.6.0 |
Changed in maas: | |
assignee: | nobody → Anton Troyanov (troyanov) |
Changed in maas: | |
status: | Triaged → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
I enabled debug logs and I extracted
from the rack filename= b'bootx64. efi', mode=b'octet', options= OrderedDict( [(b'tsize' , b'0'), (b'blksize', b'1468'), (b'windowsize', b'4')]))> ver.rackdservic es.tftp: [info] bootx64.efi requested by 20.0.1.191 ver.rackdservic es.http: [info] /images/bootx64.efi requested by ::1 ERRORDatagram object at 0x7f2a2837b2e0>
Apr 23 16:00:02 novel-mantis maas-rackd[36406]: tftp.protocol: [debug] Datagram received from ('20.0.1.191', 1845): <RRQDatagram(
Apr 23 16:00:02 novel-mantis maas-rackd[36406]: provisioningser
Apr 23 16:00:02 novel-mantis maas-rackd[36406]: provisioningser
Apr 23 16:00:02 novel-mantis maas-rackd[36406]: tftp.bootstrap: [debug] Got error: <tftp.datagram.
Apr 23 16:00:12 novel-mantis maas-rackd[36406]: tftp.bootstrap: [debug] Timed out during option negotiation proces
from the agent adc0:71f4: :1]:5240: connect: network is unreachable
Apr 23 16:00:02 novel-mantis maas-agent[37170]: 2024/04/23 16:00:02 http: proxy error: dial tcp [fd42:7bdd:
Apr 23 16:00:06 novel-mantis maas-agent[37170]: 2024/04/23 16:00:06 http: proxy error: dial tcp 10.194.168.1:5240: i/o timeout
Apr 23 16:01:15 novel-mantis maas-agent[37170]: 2024/04/23 16:01:15 http: proxy error: context canceled