Comment 27 for bug 1562249

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1562249] Re: Failed to deploy machine with HP Smart Array Raid 6i

On Thu, Jun 30, 2016 at 4:00 PM, Robin <email address hidden> wrote:

> Thanks for including the patch for trusty. Here are the new
> unsuccessful results:

> robin@IbmRS1:~$ sudo apt-cache policy python3-curtin
> [sudo] password for robin:
> python3-curtin:
> Installed: 0.1.0~bzr403-0ubuntu1
> Candidate: 0.1.0~bzr403-0ubuntu1
> Version table:
> *** 0.1.0~bzr403-0ubuntu1 0
> 500 http://ppa.launchpad.net/wesley-wiedenmeier/test2/ubuntu/
> trusty/main amd64 Packages
> 100 /var/lib/dpkg/status
> 0.1.0~bzr385-0ubuntu1 0
> 500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main
> amd64 Packages
> 0.1.0~bzr227-0ubuntu1~14.04.1 0
> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty-updates/universe
> amd64 Packages
> 0.1.0~bzr126-0ubuntu1 0
> 500 http://ca.archive.ubuntu.com/ubuntu/ trusty/universe amd64
> Packages
>
>
> MAAS VERSION:
> Installed: 1.9.3+bzr4577-0ubuntu1~trusty1
>
>
> DL380-G4 Failed Deployment
> MAchine Outut:
>
> Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have
> been unable to inform the kernel of the change, probably because it/they
> are in use. As a result, the old partition(s) will remain in use. You
> should reboot now before making further changes.
> File descriptor 3 (socket:[13947]) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> Skipping volume group MaaS
> Volume group name has invalid characters
> File descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> File descriptor 3 (socket:[13947]) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on lvremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> Skipping volume group MaaS
> Volume group name has invalid characters
> File descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 4 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> File descriptor 5 (/tmp/install.log) leaked on vgremove invocation. Parent
> PID 10046: python
> Volume group "MaaS" not found
> Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but we have
> been unable to inform the kernel of the change, probably because it/they
> are in use. As a result, the old partition(s) will remain in use. You
> should reboot now before making further changes.
> An error occured handling 'cciss!c0d0': ProcessExecutionError - Unexpected
> error while running command.
> Command: ['parted', '/dev/cciss/c0d0', '--script', 'mklabel', 'msdos']
> Exit code: 1
> Reason: -
> Stdout: ''
> Stderr: ''
> Unexpected error while running command.
> Command: ['parted', '/dev/cciss/c0d0', '--script', 'mklabel', 'msdos']
> Exit code: 1
> Reason: -
> Stdout: ''
> Stderr: ''
> Installation failed with exception: Unexpected error while running command.
> Command: ['curtin', 'block-meta', 'custom']
> Exit code: 3
> Reason: -
> Stdout: 'Error: Partition(s) 5 on /dev/cciss/c0d0 have been written, but
> we have been unable to inform the kernel of the change, probably because
> it/they are in use. As a result, the old partition(s) will remain in use.
> You should reboot now before making further changes.\nFile descriptor 3
> (socket:[13947]) leaked on lvremove invocation. Parent PID 10046:
> python\nFile descriptor 4 (/tmp/install.log) leaked on lvremove invocation.
> Parent PID 10046: python\nFile descriptor 5 (/tmp/install.log) leaked on
> lvremove invocation. Parent PID 10046: python\n Volume group "MaaS" not
> found\n Skipping volume group MaaS\n Volume group name has invalid
> characters\nFile descriptor 3 (socket:[13947]) leaked on vgremove
> invocation. Parent PID 10046: python\nFile descriptor 4 (/tmp/install.log)
> leaked on vgremove invocation. Parent PID 10046: python\nFile descriptor 5
> (/tmp/install.log) leaked on vgremove invocation. Parent PID 10046:
> python\n Volume group "MaaS" not found\nFile descriptor 3 (socket:[13947])
> leaked on lvremove invocation. Parent PID 10046: python\nFile descriptor 4
> (/tmp/install.log) leaked on lvremove invocation. Parent PID 10046:
> python\nFile descriptor 5 (/tmp/install.log) leaked on lvremove invocation.
> Parent PID 10046: python\n Volume group "MaaS" not found\n Skipping
> volume group MaaS\n Volume group name has invalid characters\nFile
> descriptor 3 (socket:[13947]) leaked on vgremove invocation. Parent PID
> 10046: python\nFile descriptor 4 (/tmp/install.log) leaked on vgremove
> invocation. Parent PID 10046: python\nFile descriptor 5 (/tmp/install.log)
> leaked on vgremove invocation. Parent PID 10046: python\n Volume group
> "MaaS" not found\nError: Partition(s) 5 on /dev/cciss/c0d0 have been
> written, but we have been unable to inform the kernel of the change,
> probably because it/they are in use. As a result, the old partition(s)
> will remain in use. You should reboot now before making further
> changes.\nAn error occured handling \'cciss!c0d0\': ProcessExecutionError -
> Unexpected error while running command.\nCommand: [\'parted\',
> \'/dev/cciss/c0d0\', \'--script\', \'mklabel\', \'msdos\']\nExit code:
> 1\nReason: -\nStdout: \'\'\nStderr: \'\'\nUnexpected error while running
> command.\nCommand: [\'parted\', \'/dev/cciss/c0d0\', \'--script\',
> \'mklabel\', \'msdos\']\nExit code: 1\nReason: -\nStdout: \'\'\nStderr:
> \'\'\n'
> Stderr: ''
>
>
> DL380-G5-28 Failed Deployment
> Machine output:
>
> Error: /dev/cciss/c0d1: unrecognised disk label
> Error: /dev/cciss/c0d1: unrecognised disk label
> An error occured handling 'cciss!c0d0-part1': OSError - [Errno 2] No such
> file or directory: '/dev/cciss/c0d01'
> [Errno 2] No such file or directory: '/dev/cciss/c0d01'
> Installation failed with exception: Unexpected error while running command.
> Command: ['curtin', 'block-meta', 'custom']
> Exit code: 3
> Reason: -
> Stdout: "Error: /dev/cciss/c0d1: unrecognised disk label\nError:
> /dev/cciss/c0d1: unrecognised disk label\nAn error occured handling
> 'cciss!c0d0-part1': OSError - [Errno 2] No such file or directory:
> '/dev/cciss/c0d01'\n[Errno 2] No such file or directory:
> '/dev/cciss/c0d01'\n"
> Stderr: ''
>
>
> The second HpDL380-G5 also fails deployment with a similar machine output.
>

Thanks that's quite helpful.

>
> The IbmX3650 Deploys succellfully.
>
> I have yet to understand how tho get the curtin logs .. and will revert
> back with that.
>

You should be able to follow the maascli guide:

https://maas.ubuntu.com/docs/maascli.html

Once you've logged in to a session

then this will dump a yaml output to stdout:

maas <session> node get-curtin-config <system-id>

And prior to running a deployment, you can do:

maas <session> maas set-config name=curtin_verbose value=true

And then redeploy the failure case, the output should include more curtin
debugging output.

That said, the general issue seems to be around the various kernel levels
for the cciss driver
sometimes the path /dev/cciss/<disk> exists (say Vivid, Wily, Xenial) but
older release it's not

Can you try deploying Wily or Xenial Ubuntu release instead of Trusty to
your target node?

> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1562249
>
> Title:
> Failed to deploy machine with HP Smart Array Raid 6i
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1562249/+subscriptions
>