Comment 17 for bug 1375664

Revision history for this message
Graham Binns (gmb) wrote :

Quick update; Gavin, Jason and I have been working to debug this. Through crude bisecting, we've isolated the problem (at least with release()) to the call to power_off_nodes() in NodeManager.stop_nodes().

More precisely, if call_power_command() in power_nodes() (src/maasserver/clusterrpc/power.py) returns early rather than calling the Power(Off|On) RPC command, all the nodes will release just hunky dory. If the RPC call goes through, roughly half of the nodes will remain Allocated. We've also see errors in the log for PowerActionAlreadyInProgress for some nodes, though we don't know yet whether that's spurious or not.

Gavin and I are leaving to grab dinner now; we'll pick this up later.