Cluster gets disconnected after error: provisioningserver.service_monitor.UnknownServiceError: 'maas-dhcpd' is unknown to upstart.

Bug #1457708 reported by Ashley Lai
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Raphaël Badin

Bug Description

For the past few hours all deployments in prodstack hit with the 400 error. Please see the logs attached.

ERROR failed to bootstrap environment: cannot start bootstrap instance: gomaasapi: got error back from server: 400 BAD REQUEST ({"power_type": ["The cluster controller for this node is not responding; power type validation is not available.Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)"], "distro_series": ["'trusty' is not a valid distro_series. It should be one of: ''."]})
2015-05-22 01:01:22,838 [ERROR] oil_ci.juju.client: Calling "juju bootstrap" failed!
2015-05-22 01:01:22,838 [ERROR] oil_ci.cli: Deployment failed:
+ rc=1
+ echo 'Deployment returned: 1'

Tags: oil

Related branches

Revision history for this message
Ashley Lai (alai) wrote :
Larry Michel (lmic)
summary: - Prostack: all pipelines hit with 400 BAD REQUEST error
+ 1.8b7: all pipelines hit with 400 BAD REQUEST error
Revision history for this message
Raphaël Badin (rvb) wrote : Re: 1.8b7: all pipelines hit with 400 BAD REQUEST error

At the time the problem happened, there is this in the regiond logs:
2015-05-22 01:01:21 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:21 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:21 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:21 [-] 127.0.0.1 - - [22/May/2015:01:01:20 +0000] "POST /MAAS/api/1.0/nodes/node-95167f62-12b6-11e4-9a15-00163eca07b6/?op=start HTTP/1.1" 400 230 "-" "Go 1.1 package http"
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)

Revision history for this message
Raphaël Badin (rvb) wrote :

Actually, clusterd.log contains the stacktrace that explains why the cluster got disconnected: http://paste.ubuntu.com/11279881/

summary: - 1.8b7: all pipelines hit with 400 BAD REQUEST error
+ Cluster gets disconnected after error:
+ provisioningserver.service_monitor.UnknownServiceError: 'maas-dhcpd' is
+ unknown to upstart.
Revision history for this message
Raphaël Badin (rvb) wrote :

This error causes the cluster to disconnect.

Changed in maas:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Raphaël Badin (rvb)
Raphaël Badin (rvb)
Changed in maas:
status: Triaged → In Progress
Raphaël Badin (rvb)
Changed in maas:
milestone: none → 1.8.0
Raphaël Badin (rvb)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Blake Rouse (blake-rouse) wrote :

I think the code should also handle this error and not crash. The packaging fix is needed but handling the exception needs to be better as well.

Revision history for this message
Raphaël Badin (rvb) wrote :

>I think the code should also handle this error and not crash. The packaging fix is needed but handling the exception needs to
> be better as well.

Agreed, and that's why I filed bug 1457799.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.