Activity log for bug #1943863

Date Who What changed Old value New value Message
2021-09-16 18:41:32 Vladimir Grevtsev bug added bug
2021-09-16 18:41:46 Vladimir Grevtsev bug added subscriber Canonical Field Critical
2021-09-16 18:54:20 Vladimir Grevtsev description == Env focal/ussuri + ovn, latest stable charms juju status: https://paste.ubuntu.com/p/2725tV47ym/ == Problem description DPDK instance can't be launched after the fresh deployment (focal/ussuri + OVN, latest stable charms), raising a below error: $ os server show dpdk-test-instance -f yaml OS-DCF:diskConfig: MANUAL OS-EXT-AZ:availability_zone: '' OS-EXT-SRV-ATTR:host: null OS-EXT-SRV-ATTR:hypervisor_hostname: null OS-EXT-SRV-ATTR:instance_name: instance-00000218 OS-EXT-STS:power_state: NOSTATE OS-EXT-STS:task_state: null OS-EXT-STS:vm_state: error OS-SRV-USG:launched_at: null OS-SRV-USG:terminated_at: null accessIPv4: '' accessIPv6: '' addresses: '' config_drive: 'True' created: '2021-09-15T18:51:00Z' fault: code: 500 created: '2021-09-15T18:52:01Z' details: "Traceback (most recent call last):\n File \"/usr/lib/python3/dist-packages/nova/conductor/manager.py\"\ , line 651, in build_instances\n scheduler_utils.populate_retry(\n File \"\ /usr/lib/python3/dist-packages/nova/scheduler/utils.py\", line 919, in populate_retry\n\ \ raise exception.MaxRetriesExceeded(reason=msg)\nnova.exception.MaxRetriesExceeded:\ \ Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance\ \ 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error: process\ \ exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64:\ \ -chardev socket,id=charnet0,path=/run/libvirt-vhost-user/vhu3ba44fdc-7c,server:\ \ Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file\ \ or directory\n" message: 'Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error: process exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64: -chardev ' flavor: m1.medium.project.dpdk (4f452aa3-2b2c-4f2e-8465-5e3c2d8ec3f1) hostId: '' id: 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73 image: auto-sync/ubuntu-bionic-18.04-amd64-server-20210907-disk1.img (3851450e-e73d-489b-a356-33650690ed7a) key_name: ubuntu-keypair name: dpdk-test-instance project_id: cdade870811447a89e2f0199373a0d95 properties: '' status: ERROR updated: '2021-09-15T18:52:01Z' user_id: 13a0e7862c6641eeaaebbde1ae096f9e volumes_attached: '' For the record, a "generic" instances (e.g non-DPDK/non-SRIOV) are scheduling/starting without any issues. == Steps to reproduce openstack network create --external --provider-network-type vlan --provider-segment xxx --provider-physical-network dpdkfabric ext_net_dpdk openstack subnet create --allocation-pool start=<redacted>,end=<redacted> --network ext_net_dpdk --subnet-range <redacted>/23 --gateway <redacted> --no-dhcp ext_net_dpdk_subnet openstack aggregate create --zone nova dpdk openstack aggregate set --property dpdk=true dpdk openstack aggregate add host dpdk <fqdn> openstack aggregate show dpdk --max-width=80 openstack flavor set --property aggregate_instance_extra_specs:dpdk=true --property hw:mem_page_size=large m1.medium.dpdk openstack server create --config-drive true --network ext_net_dpdk --key-name ubuntu-keypair --image focal --flavor m1.medium.dpdk dpdk-test-instance == Analysis [before redeployment] nova-compute log : https://pastebin.canonical.com/p/FgPYNb3bPj/ [fresh deployment] juju crashdump: https://drive.google.com/file/d/1W_w3CAUq4ggp4alDnpCk08mSaCL6Uaxk/view?usp=sharing <on hypervisor> # ovs-vsctl get open_vswitch . other_config {dpdk-extra="--pci-whitelist 0000:3e:00.0 --pci-whitelist 0000:40:00.0", dpdk-init="true", dpdk-lcore-mask="0x1000001", dpdk-socket-mem="4096,4096"} # cat /etc/tmpfiles.d/nova-ovs-vhost-user.conf # Create libvirt writeable directory for vhost-user sockets d /run/libvirt-vhost-user 0770 libvirt-qemu kvm - - In fact, none of the compute hosts have that file: https://paste.ubuntu.com/p/XJRFypbMQf/ (however, the error from this issue doesn't appear on non-DPDK hosts). After doing the below command, that missing /run/... file has appeared and VM could have been scheduled and started. However, although it have been started, it wasn't reachable over the network. # systemd-tmpfiles --create # stat /run/libvirt-vhost-user File: /run/libvirt-vhost-user Size: 40 Blocks: 0 IO Block: 4096 directory == Env focal/ussuri + ovn, latest stable charms juju status: https://paste.ubuntu.com/p/2725tV47ym/ Hardware: Huawei CH121 V5 with MZ532,4*25GE Mezzanine Card,PCIE 3.0 X16 NICs + manually installed PMD for DPDK enablement (librte-pmd-hinic20.0 package) == Problem description DPDK instance can't be launched after the fresh deployment (focal/ussuri + OVN, latest stable charms), raising a below error: $ os server show dpdk-test-instance -f yaml OS-DCF:diskConfig: MANUAL OS-EXT-AZ:availability_zone: '' OS-EXT-SRV-ATTR:host: null OS-EXT-SRV-ATTR:hypervisor_hostname: null OS-EXT-SRV-ATTR:instance_name: instance-00000218 OS-EXT-STS:power_state: NOSTATE OS-EXT-STS:task_state: null OS-EXT-STS:vm_state: error OS-SRV-USG:launched_at: null OS-SRV-USG:terminated_at: null accessIPv4: '' accessIPv6: '' addresses: '' config_drive: 'True' created: '2021-09-15T18:51:00Z' fault:   code: 500   created: '2021-09-15T18:52:01Z'   details: "Traceback (most recent call last):\n File \"/usr/lib/python3/dist-packages/nova/conductor/manager.py\"\     , line 651, in build_instances\n scheduler_utils.populate_retry(\n File \"\     /usr/lib/python3/dist-packages/nova/scheduler/utils.py\", line 919, in populate_retry\n\     \ raise exception.MaxRetriesExceeded(reason=msg)\nnova.exception.MaxRetriesExceeded:\     \ Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance\     \ 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error: process\     \ exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64:\     \ -chardev socket,id=charnet0,path=/run/libvirt-vhost-user/vhu3ba44fdc-7c,server:\     \ Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file\     \ or directory\n"   message: 'Exceeded maximum number of retries. Exceeded max scheduling attempts 3     for instance 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error:     process exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64:     -chardev ' flavor: m1.medium.project.dpdk (4f452aa3-2b2c-4f2e-8465-5e3c2d8ec3f1) hostId: '' id: 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73 image: auto-sync/ubuntu-bionic-18.04-amd64-server-20210907-disk1.img (3851450e-e73d-489b-a356-33650690ed7a) key_name: ubuntu-keypair name: dpdk-test-instance project_id: cdade870811447a89e2f0199373a0d95 properties: '' status: ERROR updated: '2021-09-15T18:52:01Z' user_id: 13a0e7862c6641eeaaebbde1ae096f9e volumes_attached: '' For the record, a "generic" instances (e.g non-DPDK/non-SRIOV) are scheduling/starting without any issues. == Steps to reproduce openstack network create --external --provider-network-type vlan --provider-segment xxx --provider-physical-network dpdkfabric ext_net_dpdk openstack subnet create --allocation-pool start=<redacted>,end=<redacted> --network ext_net_dpdk --subnet-range <redacted>/23 --gateway <redacted> --no-dhcp ext_net_dpdk_subnet openstack aggregate create --zone nova dpdk openstack aggregate set --property dpdk=true dpdk openstack aggregate add host dpdk <fqdn> openstack aggregate show dpdk --max-width=80 openstack flavor set --property aggregate_instance_extra_specs:dpdk=true --property hw:mem_page_size=large m1.medium.dpdk openstack server create --config-drive true --network ext_net_dpdk --key-name ubuntu-keypair --image focal --flavor m1.medium.dpdk dpdk-test-instance == Analysis [before redeployment] nova-compute log : https://pastebin.canonical.com/p/FgPYNb3bPj/ [fresh deployment] juju crashdump: https://drive.google.com/file/d/1W_w3CAUq4ggp4alDnpCk08mSaCL6Uaxk/view?usp=sharing <on hypervisor> # ovs-vsctl get open_vswitch . other_config {dpdk-extra="--pci-whitelist 0000:3e:00.0 --pci-whitelist 0000:40:00.0", dpdk-init="true", dpdk-lcore-mask="0x1000001", dpdk-socket-mem="4096,4096"} # cat /etc/tmpfiles.d/nova-ovs-vhost-user.conf # Create libvirt writeable directory for vhost-user sockets d /run/libvirt-vhost-user 0770 libvirt-qemu kvm - - In fact, none of the compute hosts have that file: https://paste.ubuntu.com/p/XJRFypbMQf/ (however, the error from this issue doesn't appear on non-DPDK hosts). After doing the below command, that missing /run/... file has appeared and VM could have been scheduled and started. However, although it have been started, it wasn't reachable over the network. # systemd-tmpfiles --create # stat /run/libvirt-vhost-user   File: /run/libvirt-vhost-user   Size: 40 Blocks: 0 IO Block: 4096 directory
2021-09-17 16:41:49 Nobuto Murata bug added subscriber Nobuto Murata
2021-09-21 14:24:46 Vladimir Grevtsev removed subscriber Canonical Field Critical
2021-09-21 14:24:51 Vladimir Grevtsev bug added subscriber Canonical Field High
2021-09-21 16:31:59 Vladimir Grevtsev charm-nova-compute: status New Invalid
2021-09-22 11:09:28 Liam Young bug task added neutron (Ubuntu)
2021-09-22 11:09:48 Liam Young neutron (Ubuntu): status New Invalid
2021-09-22 13:22:28 Liam Young bug task added neutron
2021-09-22 13:22:48 Liam Young bug task deleted neutron
2021-09-22 13:22:58 Liam Young bug task deleted neutron (Ubuntu)
2021-09-22 13:23:34 Liam Young bug task added charm-layer-ovn
2021-09-22 13:24:02 Liam Young charm-layer-ovn: status New Confirmed
2021-09-22 13:24:04 Liam Young charm-layer-ovn: importance Undecided High
2021-09-22 13:24:09 Liam Young charm-layer-ovn: assignee Liam Young (gnuoy)
2021-09-25 03:34:16 Nobuto Murata bug task added charm-ovn-chassis
2021-09-25 03:34:51 Nobuto Murata charm-layer-ovn: status Confirmed Fix Committed
2021-10-11 15:18:15 Alex Kavanagh charm-layer-ovn: milestone 21.10
2021-10-18 22:54:10 Billy Olsen charm-ovn-chassis: status New Fix Committed
2021-10-18 22:54:13 Billy Olsen charm-ovn-chassis: importance Undecided High
2021-10-18 22:54:18 Billy Olsen charm-ovn-chassis: milestone 21.10
2021-10-22 13:24:42 Alex Kavanagh charm-layer-ovn: status Fix Committed Fix Released
2021-10-22 13:24:44 Alex Kavanagh charm-ovn-chassis: status Fix Committed Fix Released