hostnqn file not created automatically

Bug #2035606 reported by Yusuf Güngör
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
os-brick
In Progress
Undecided
Unassigned

Bug Description

os-brick does not automatically creates the hostnqn file even restart of nova_compute.

Code tries to run "nvme show-hostnqn" command and if hostnqn file does not exist it returns 'hostnqn is not available -- use nvme gen-hostnqn\n' with return code 254.

Return code 254 causes putils.ProcessExecutionError and this exception is handled on nvmeof.py -> create_hostnqn method. There is if-else block on exception handling section and latest else statement raises the exception again.

Because of raising the exception again, the create_hostnqn method does not continue to create a new hostnqn file.

Can we handle the error code 254 exception to create a new hostnqn file?

nova-compute log:

2023-09-14 11:35:47.153 534 WARNING os_brick.privileged.nvmeof [-] Could not generate host nqn: Unexpected error while running command.
Command: nvme show-hostnqn
Exit code: 254
Stdout: ''
Stderr: 'hostnqn is not available -- use nvme gen-hostnqn\n'
2023-09-14 11:35:47.159 534 WARNING os_brick.privileged.nvmeof [-] Could not generate host nqn: Unexpected error while running command.
Command: nvme show-hostnqn
Exit code: 254
Stdout: ''
Stderr: 'hostnqn is not available -- use nvme gen-hostnqn\n'

Code:https://github.com/openstack/os-brick/blob/8282b14889872d48ccc49312cb0901773b914c38/os_brick/privileged/nvmeof.py#L59

@os_brick.privileged.default.entrypoint
def create_hostnqn() -> str:
    """Create the hostnqn file to speed up finding out the nqn.

    By having the /etc/nvme/hostnqn not only do we make sure that that value is
    always used on this system, but we are also able to just open the file to
    get the nqn on each get_connector_properties call instead of having to make
    a call to nvme show-hostnqn command.
    """
    host_nqn = ''
    try:
        os.makedirs('/etc/nvme', mode=0o755, exist_ok=True)

        # Try to get existing nqn generated from dmi or systemd
        try:
            host_nqn, err = rootwrap.custom_execute('nvme', 'show-hostnqn')
            host_nqn = host_nqn.strip()

        # This is different from OSError's ENOENT, which is missing nvme
        # command. This ENOENT is when nvme says there isn't an nqn.
        except putils.ProcessExecutionError as e:
            err_msg = e.stdout[:e.stdout.find('\n')]
            show_hostnqn_subcmd_missing = (
                "ERROR: Invalid sub-command".casefold() in err_msg.casefold())
            if show_hostnqn_subcmd_missing:
                LOG.debug('Version too old cannot check current hostnqn.')
            elif e.exit_code == errno.ENOENT:
                LOG.debug('No nqn could be formed from dmi or systemd.')
            else:
                LOG.debug('Unknown error from nvme show-hostnqn: %s', err_msg)
---- Here ----->raise

        if not host_nqn:
            LOG.debug('Generating nqn')
----------->host_nqn, err = rootwrap.custom_execute('nvme', 'gen-hostnqn')
            host_nqn = host_nqn.strip()

        with open('/etc/nvme/hostnqn', 'w') as f:
            LOG.debug('Writing hostnqn file')
            f.write(host_nqn)
        os.chmod('/etc/nvme/hostnqn', 0o644)
    except Exception as e:
        LOG.warning("Could not generate host nqn: %s", e)

    return host_nqn

Versions:
  os-brick==6.1.1
  host: Ubuntu 20.04.6 LTS (kernel 5.4.0-153-generic)

Full logs attached

Revision history for this message
Yusuf Güngör (yusuf2) wrote :
description: updated
Yusuf Güngör (yusuf2)
description: updated
description: updated
Revision history for this message
Yusuf Güngör (yusuf2) wrote (last edit ):
Download full text (3.3 KiB)

We are using kolla-ansible and kolla nova_compute container does not mount the "/etc/nvme/" directory from host to container. Actualy we also do not want to make some services host files dependent when using containers.

"nvme-cli" package is installed to container image by kolla and "nvme-cli" package creates the /etc/nvme paths for container image.

But the problem is, hostnqn and hostid files embedded to the nova_compute images and all containers which are spawned from this image uses the same hostnqn file which is not unique.

kolla side solution will be deleting the "/etc/nvme/hostnqn" and "/etc/kolla/hostid" files from image after docker build of the container image **but** os_brick should handle the situation of absence of these files.

I am sharing the mounts of nova_compute container:

        "Mounts": [
            {
                "Type": "bind",
                "Source": "/etc/timezone",
                "Destination": "/etc/timezone",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/kolla/nova-compute",
                "Destination": "/var/lib/kolla/config_files",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "libvirtd",
                "Source": "/var/lib/docker/volumes/libvirtd/_data",
                "Destination": "/var/lib/libvirt",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            },
            {
                "Type": "volume",
                "Name": "nova_compute",
                "Source": "/var/lib/docker/volumes/nova_compute/_data",
                "Destination": "/var/lib/nova",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            },
            {
                "Type": "volume",
                "Name": "kolla_logs",
                "Source": "/var/lib/docker/volumes/kolla_logs/_data",
                "Destination": "/var/log/kolla",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            },
            {
                "Type": "bind",
                "Source": "/dev",
                "Destination": "/dev",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/localtime",
                "Destination": "/etc/localtime",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/lib/modules",
                "Destination": "/lib/modules",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
              ...

Read more...

Revision history for this message
Yusuf Güngör (yusuf2) wrote :
Revision history for this message
Gorka Eguileor (gorka) wrote :

This sounds like a deployment issue or an nvme-cli or libnvme issue to me.

There are many things to unpack here so I'll try to be brief.

You do want to have /etc/nvme/hostnqn preserved across nova-compute container restart, otherwise you are going to run into a lot of problems. The easiest way to do it is to preserve the file on the host, though it is not necessary if we are able to make the generation always return the same value.

Which bring us to the next point, in modern nvme-cli versions calling `nvme show-hostnqn` is equivalent to calling `nvme gen-hostnqn` when /etc/nvme/hostnqn doesn't exist.

This is an excerpt show-hostnqn command code [1]:
     hostnqn = nvmf_hostnqn_from_file();
     if (!hostnqn)
      hostnqn = nvmf_hostnqn_generate();

And this an excerpt of the gen-hostnqn command code [2]:
 hostnqn = nvmf_hostnqn_generate();

So changing the code to call `nvme gen-hostnqn` should fail as well, assuming a relatively modern versions are being used.

So we end up with the question of "why is nvmf_hostnqn_generate failing?" and I haven't looked at its code [3] to figure out what could be happening in your scenario.

Could you please go into the compute container and manually check the output of the following nvme commands?:

  nvme show-hostnqn
  nvme gen-hostnqn
  nvme --version

[1]: https://github.com/linux-nvme/nvme-cli/blob/70c99f98d44fdf8d2d5b00a1a447353e2961f001/nvme.c#L8675
[2]: https://github.com/linux-nvme/nvme-cli/blob/70c99f98d44fdf8d2d5b00a1a447353e2961f001/nvme.c#L8660
[3]: https://github.com/linux-nvme/libnvme/blob/18b3316c502f229a5339d2c283510ec8008ec3b9/src/nvme/fabrics.c#L1409

Revision history for this message
Yusuf Güngör (yusuf2) wrote (last edit ):

Hi @gorka thanks for reply and your effort.

# When /etc/nvme/hostnq file is exist and user is nova:
(nova-compute)[nova@dev-compute-04 /]$ whoami
nova
(nova-compute)[nova@dev-compute-04 /]$ nvme show-hostnqn
nqn.2014-08.org.nvmexpress:uuid:fd43ebf6-4548-4089-8f19-d01704c95447(nova-compute)[nova@dev-compute-04 /]$
(nova-compute)[nova@dev-compute-04 /]$ nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:1f0b3d9c-9c52-4782-9498-03e81bfef29e
(nova-compute)[nova@dev-compute-04 /]$ nvme --version
nvme version 1.16

# When /etc/nvme/hostnq file is exist and user is root:
(nova-compute)[root@dev-compute-04 /]# whoami
root
(nova-compute)[root@dev-compute-04 /]# nvme show-hostnqn
nqn.2014-08.org.nvmexpress:uuid:fd43ebf6-4548-4089-8f19-d01704c95447(nova-compute)[root@dev-compute-04 /]#
(nova-compute)[root@dev-compute-04 /]# nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:48f41494-5384-4ee9-aad4-2b3c53a249a8
(nova-compute)[root@dev-compute-04 /]# nvme --version
nvme version 1.16

# When /etc/nvme/hostnq file does not exist and user is nova:
(nova-compute)[nova@dev-compute-04 /]$ whoami
nova
(nova-compute)[nova@dev-compute-04 /]$ nvme show-hostnqn
hostnqn is not available -- use nvme gen-hostnqn
(nova-compute)[nova@dev-compute-04 /]$ nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:660977d6-5c06-4871-bbaf-b4a6a91f5044
(nova-compute)[nova@dev-compute-04 /]$ nvme --version
nvme version 1.16

# When /etc/nvme/hostnq file does not exist and user is root:
(nova-compute)[root@dev-compute-04 /]# whoami
root
(nova-compute)[root@dev-compute-04 /]# nvme show-hostnqn
hostnqn is not available -- use nvme gen-hostnqn
(nova-compute)[root@dev-compute-04 /]# nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:c2e73770-8012-45cc-b0d7-b8d3e9a5e76c
(nova-compute)[root@dev-compute-04 /]# nvme --version
nvme version 1.16
(nova-compute)[root@dev-compute-04 /]#

Revision history for this message
Gorka Eguileor (gorka) wrote :

Looks like in nvme v1.16 the `show-hostnqn` command would only read the file [1] and not look for other info in the system that would be persistent across reboots.

Also, the libnvme version you have installed in your system seems to have a bug, because the `sudo nvme gen-hostnqn` command should always return the same value based on the contents of `/sys/class/dmi/id/product_uuid`.

[1]: https://github.com/linux-nvme/nvme-cli/blob/deee9cae1ac94760deebd71f8e5449061338666c/nvme.c#L6553
[2]: https://github.com/linux-nvme/libnvme/blob/18b3316c502f229a5339d2c283510ec8008ec3b9/src/nvme/fabrics.c#L1409

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/os-brick/+/895202

Changed in os-brick:
status: New → In Progress
Revision history for this message
Yusuf Güngör (yusuf2) wrote :

Hi Gorka, thanks you for fix.

It seems that using nvme-cli 1.9 also has the same bug too. It may be ok for 2.3 or 2.5

root@dev-compute-04:~# nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:cae16cbd-ef30-4f5b-8e1d-79ecd663dc73
root@dev-compute-04:~# nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:89ddfe79-e569-4387-8376-3d7ba66fe09e
root@dev-compute-04:~# nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:eaaf6b81-c647-4ea6-8526-d8da6585fa1f
root@dev-compute-04:~# nvme --version
nvme version 1.9
root@dev-compute-04:~#

https://packages.ubuntu.com/search?keywords=nvme-cli

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.