centos 6.5 C610 chipset cloud-init-nonet waiting for network device

Bug #1460721 reported by Baboune
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Medium
Alexei Sheplyakov
6.0.x
Won't Fix
Medium
MOS Maintenance

Bug Description

Fuel 6.1, Juno, CENTOS, neutron VLAN, HA.

32 nodes in cluster. Out of 32, 16 servers fully operational. Can instantiate VM, network devices, etc. Fully OK.

16 nodes are answering calls from Openstack (neutron, blocks, launch VM, etc) but, when a VM is launched on any of that pool of 16 servers, then an error occurs:
  "cloud-init-nonet waiting 120 seconds for a network device.
   cloud-init-nonet gave up waiting for a network."

It seems to be running Cloud-init v. 0.7.5.

The 16 failing machines have different HW, Chipset is C610, NIC is intel x520, cpu is Xeon e5-2609 haswell, storage is using onboard sata .

Any known issues with the centOS kernel? Any updates in the last 2-3 weeks in the Fuel CentOS packaged kernels or images?

Also, any suggestions to look for potential kernel, or virtualization problems?

OSTF tests are green.

fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
auth_required: true
build_id: 2015-05-21_05-40-59
build_number: '247'
feature_groups:
- experimental
fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
      build_id: 2015-05-21_05-40-59
      build_number: '247'
      feature_groups:
      - experimental
      fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
      fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
      fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
      nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
      release: '6.1'

Tags: centos kernel
Revision history for this message
Baboune (seyvet) wrote :

Adding some info, the server is up and running with centos as a compute node, and all openstack services are operational/running.

When a VM is instantiated on this server, the VM fails to provision a "virtual" network device.

The image used for the VM can be any image (TestVM, or latest ubuntu trusty cloud image), the error is the same.

Changed in fuel:
milestone: none → 6.1
assignee: nobody → MOS Linux (mos-linux)
Revision history for this message
Albert Syriy (asyriy) wrote : Re: [Bug 1460721] [NEW] centos 6.5 C610 chipset cloud-init-nonet waiting for network device
Download full text (3.3 KiB)

Hello,

Two drivers have been updated in CentOS, they are megaraid_sas and hpsa,
but both related to the RAID controllers.
Nothing was updated for network side.

The CLI command
lspci -vv -nn -t
will help to detect what sort of equipment were detected on the servers
lspci -vv -nn -k
will show the driver in use

Please run these commands on the "stuck"-ed servers.
Regards,
Albert

Albert Syriy,

Software Engineer,
Mirantis

On Mon, Jun 1, 2015 at 7:37 PM, Launchpad Bug Tracker <
<email address hidden>> wrote:

> Nastya Urlapova (aurlapova) has assigned this bug to you for Fuel for
> OpenStack:
>
> Fuel 6.1, Juno, CENTOS, neutron VLAN, HA.
>
> 32 nodes in cluster. Out of 32, 16 servers fully operational. Can
> instantiate VM, network devices, etc. Fully OK.
>
> 16 nodes are answering calls from Openstack (neutron, blocks, launch VM,
> etc) but, when a VM is launched on any of that pool of 16 servers, then an
> error occurs:
> "cloud-init-nonet waiting 120 seconds for a network device.
> cloud-init-nonet gave up waiting for a network."
>
> It seems to be running Cloud-init v. 0.7.5.
>
> The 16 failing machines have different HW, Chipset is C610, NIC is
> intel x520, cpu is Xeon e5-2609 haswell, storage is using onboard sata .
>
> Any known issues with the centOS kernel? Any updates in the last 2-3
> weeks in the Fuel CentOS packaged kernels or images?
>
> Also, any suggestions to look for potential kernel, or virtualization
> problems?
>
> OSTF tests are green.
>
> fuel --fuel-version
> DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used
> as the source for settings. This behavior is deprecated. Please specify the
> path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS
> environment variable.
> api: '1.0'
> astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
> auth_required: true
> build_id: 2015-05-21_05-40-59
> build_number: '247'
> feature_groups:
> - experimental
> fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
> fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
> fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
> nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
> release: '6.1'
> release_versions:
> 2014.2.2-6.1:
> VERSION:
> api: '1.0'
> astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
> build_id: 2015-05-21_05-40-59
> build_number: '247'
> feature_groups:
> - experimental
> fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
> fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
> fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
> nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
> release: '6.1'
>
> ** Affects: fuel
> Importance: Undecided
> Assignee: MOS Linux (mos-linux)
> Status: New
>
>
> ** Tags: centos kernel l23network
> --
> centos 6.5 C610 chipset cloud-init-nonet waiting for net...

Read more...

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> cpu is Xeon e5-2609 haswell

This might be futex_wait bug (https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64).
It has been fixed in the vanilla kernel in October 2014 (https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0), and in CentOS/RHEL 6.6 kernel 2.6.32-504.16.2 (which is unfortunately not included in MOS 6.1)

Revision history for this message
Baboune (seyvet) wrote :

Great lead. How could we best upgrade the kernel according to you?

Revision history for this message
Baboune (seyvet) wrote :
Download full text (127.0 KiB)

Additional info from one of the affected computes:

root@node-64 ~]# lspci -vv -nn -t
-+-[0000:ff]-+-08.0 Intel Corporation Haswell-E QPI Link 0 [8086:2f80]
 | +-08.2 Intel Corporation Haswell-E QPI Link 0 [8086:2f32]
 | +-08.3 Intel Corporation Haswell-E QPI Link 0 [8086:2f83]
 | +-09.0 Intel Corporation Haswell-E QPI Link 1 [8086:2f90]
 | +-09.2 Intel Corporation Haswell-E QPI Link 1 [8086:2f33]
 | +-09.3 Intel Corporation Haswell-E QPI Link 1 [8086:2f93]
 | +-0b.0 Intel Corporation Haswell-E R3 QPI Link 0 & 1 Monitoring [8086:2f81]
 | +-0b.1 Intel Corporation Haswell-E R3 QPI Link 0 & 1 Monitoring [8086:2f36]
 | +-0b.2 Intel Corporation Haswell-E R3 QPI Link 0 & 1 Monitoring [8086:2f37]
 | +-0c.0 Intel Corporation Haswell-E Unicast Registers [8086:2fe0]
 | +-0c.1 Intel Corporation Haswell-E Unicast Registers [8086:2fe1]
 | +-0c.2 Intel Corporation Haswell-E Unicast Registers [8086:2fe2]
 | +-0c.3 Intel Corporation Haswell-E Unicast Registers [8086:2fe3]
 | +-0c.4 Intel Corporation Haswell-E Unicast Registers [8086:2fe4]
 | +-0c.5 Intel Corporation Haswell-E Unicast Registers [8086:2fe5]
 | +-0f.0 Intel Corporation Haswell-E Buffered Ring Agent [8086:2ff8]
 | +-0f.1 Intel Corporation Haswell-E Buffered Ring Agent [8086:2ff9]
 | +-0f.2 Intel Corporation Haswell-E Buffered Ring Agent [8086:2ffa]
 | +-0f.3 Intel Corporation Haswell-E Buffered Ring Agent [8086:2ffb]
 | +-0f.4 Intel Corporation Haswell-E System Address Decoder & Broadcast Registers [8086:2ffc]
 | +-0f.5 Intel Corporation Haswell-E System Address Decoder & Broadcast Registers [8086:2ffd]
 | +-0f.6 Intel Corporation Haswell-E System Address Decoder & Broadcast Registers [8086:2ffe]
 | +-10.0 Intel Corporation Haswell-E PCIe Ring Interface [8086:2f1d]
 | +-10.1 Intel Corporation Haswell-E PCIe Ring Interface [8086:2f34]
 | +-10.5 Intel Corporation Haswell-E Scratchpad & Semaphore Registers [8086:2f1e]
 | +-10.6 Intel Corporation Haswell-E Scratchpad & Semaphore Registers [8086:2f7d]
 | +-10.7 Intel Corporation Haswell-E Scratchpad & Semaphore Registers [8086:2f1f]
 | +-12.0 Intel Corporation Haswell-E Home Agent 0 [8086:2fa0]
 | +-12.1 Intel Corporation Haswell-E Home Agent 0 [8086:2f30]
 | +-12.2 Intel Corporation Haswell-E Home Agent 0 Debug [8086:2f70]
 | +-12.4 Intel Corporation Haswell-E Home Agent 1 [8086:2f60]
 | +-12.5 Intel Corporation Haswell-E Home Agent 1 [8086:2f38]
 | +-12.6 Intel Corporation Haswell-E Home Agent 1 Debug [8086:2f78]
 | +-13.0 Intel Corporation Haswell-E Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086:2fa8]
 | +-13.1 Intel Corporation Haswell-E Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086:2f71]
 | +-13.2 Intel Corporation Haswell-E Integrated Memory Controller 0 Channel Target Address Decoder [8086:2faa]
 | ...

Revision history for this message
Baboune (seyvet) wrote :

Hi,
some additional logs obtained when launching a new VM on that compute:
- VM logs: http://pastebin.com/bHHwPmsM

- syslog:
<8>Jun 1 21:36:33 node-64 ceph-osd: 2015-06-01 21:36:33.679976 7fe228971700 0 -- 192.168.41.20:6800/2851 >> 192.168.41.17:6800/3050 pipe(0x42d4100 sd=64 :6800 s=2 pgs=1800 cs=17 l=0 c=0x3935ac0).fault with nothing to send, going to standby
<3>Jun 2 07:33:58 node-64 kernel: kvm: 61570: cpu0 unhandled rdmsr: 0x345

- libvirt.log:
2015-06-02 00:00:07.982+0000: 3222: error : virNetSocketReadWire:1453 : End of file while reading data: Input/output error
2015-06-02 04:00:07.790+0000: 3222: error : virNetSocketReadWire:1453 : End of file while reading data: Input/output error
2015-06-02 07:34:02.074+0000: 3223: warning : qemuOpenVhostNet:508 : Unable to open vhost-net. Opened so far 0, requested 1
2015-06-02 07:34:02.075+0000: 3223: warning : qemuDomainObjTaint:1658 : Domain id=5 name='instance-00000089' uuid=1cbfc83c-65fc-4fbd-9c62-7cfdf7d7c042 is tainted: high-privileges

- qemu/instance-00000089.log
2015-06-02 07:34:02.074+0000: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name instance-00000089 -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,+invpcid,+erms,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+movbe,+dca,+pcid,+pdcm,+xtpr,+fma,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 1cbfc83c-65fc-4fbd-9c62-7cfdf7d7c042 -smbios type=1,manufacturer=Red Hat Inc.,product=OpenStack Nova,version=2014.2.2-fuel6.1.mira23,serial=78009c06-71ad-4650-9994-ac9ced57817a,uuid=1cbfc83c-65fc-4fbd-9c62-7cfdf7d7c042 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000089.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/1cbfc83c-65fc-4fbd-9c62-7cfdf7d7c042/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:48:c0:ca,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/1cbfc83c-65fc-4fbd-9c62-7cfdf7d7c042/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
Domain id=5 is tainted: high-privileges
char device redirected to /dev/pts/1 (label charserial1)

tags: removed: l23network
Revision history for this message
Pavel Boldin (pboldin) wrote :

Could you please update the kernel from the CentOS repository and check if that indeed fixes the bug?

Revision history for this message
Baboune (seyvet) wrote :

$ yum update kernel
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
Setting up Update Process
No Packages marked for Update

It requires moving to 6.6.z.

How to update the kernel this in the best way?

Revision history for this message
Baboune (seyvet) wrote :

updating kernel allowed to get further into the boot process of the VM.

$ uname -a
Linux node-64.domain.tld 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

See logs http://pastebin.com/CZv5TUAf

Relevant section:
 Starting configure network device[74G[ OK ]
 * Starting Mount network filesystems[74G[ OK ]
 * Stopping Mount network filesystems[74G[ OK ]
 * Starting configure network device[74G[ OK ]
cloud-init-nonet[8.99]: static networking is now up
Cloud-init v. 0.7.5 running 'init' at Tue, 02 Jun 2015 11:46:10 +0000. Up 10.58 seconds.
ci-info: ++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
ci-info: +--------+------+-------------+---------------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+-------------+---------------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | 20.20.20.63 | 255.255.255.0 | fa:16:3e:fd:4d:39 |
ci-info: +--------+------+-------------+---------------+-------------------+
ci-info: +++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++
ci-info: +-------+-------------+------------+---------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-------------+------------+---------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 20.20.20.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 20.20.20.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: +-------+-------------+------------+---------------+-----------+-------+
 * Stopping cold plug devices[74G[ OK ]
 * Stopping log initial device creation[74G[ OK ]
 * Starting enable remaining boot-time encrypted block devices[74G[ OK ]
2015-06-02 11:49:28,211 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by <class 'socket.error'>: [Errno 113] No route to host)]

It basically passed the cloud-init-nonet but then fails with another NW error.

Revision history for this message
Baboune (seyvet) wrote :

The previous was an ubuntu-trusty image.

With cirros TestVM fails in the "Discover..." phase:
  - See http://pastebin.com/RzG1HXcX

udhcpc (v1.20.1) started
udhcpc (v1.20.1) started
Sending discover...
Sending discover...
Sending discover...
No lease, failing
WARN: /etc/rc3.d/S40-network failed
cirros-ds 'net' up at 181.58
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.60. request failed
failed 2/20: up 183.78. request failed
failed 3/20: up 185.79. request failed
failed 4/20: up 187.80. request failed
failed 5/20: up 189.80. request failed
failed 6/20: up 191.81. request failed

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to packages/centos6/kernel (6.1)

Related fix proposed to branch: 6.1
Change author: Alexei Sheplyakov <email address hidden>
Review: https://review.fuel-infra.org/7303

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Upgrading just kernel is not going to work (openvswitch kernel module should be rebuilt with the corresponding kernel).
I'll build a custom ISO with the patched kernel and a proper openvswitch module.

Revision history for this message
Baboune (seyvet) wrote : Re: [Bug 1460721] Re: centos 6.5 C610 chipset cloud-init-nonet waiting for network device

Great! Thanks a lot.

Sent from my iPhone

> On 02 Jun 2015, at 16:46, Alexei Sheplyakov <email address hidden> wrote:
>
> Upgrading just kernel is not going to work (openvswitch kernel module should be rebuilt with the corresponding kernel).
> I'll build a custom ISO with the patched kernel and a proper openvswitch module.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1460721
>
> Title:
> centos 6.5 C610 chipset cloud-init-nonet waiting for network device
>
> Status in Fuel: OpenStack installer that works:
> New
>
> Bug description:
> Fuel 6.1, Juno, CENTOS, neutron VLAN, HA.
>
> 32 nodes in cluster. Out of 32, 16 servers fully operational. Can
> instantiate VM, network devices, etc. Fully OK.
>
> 16 nodes are answering calls from Openstack (neutron, blocks, launch VM, etc) but, when a VM is launched on any of that pool of 16 servers, then an error occurs:
> "cloud-init-nonet waiting 120 seconds for a network device.
> cloud-init-nonet gave up waiting for a network."
>
> It seems to be running Cloud-init v. 0.7.5.
>
> The 16 failing machines have different HW, Chipset is C610, NIC is
> intel x520, cpu is Xeon e5-2609 haswell, storage is using onboard sata
> .
>
> Any known issues with the centOS kernel? Any updates in the last 2-3
> weeks in the Fuel CentOS packaged kernels or images?
>
> Also, any suggestions to look for potential kernel, or virtualization
> problems?
>
> OSTF tests are green.
>
> fuel --fuel-version
> DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
> api: '1.0'
> astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
> auth_required: true
> build_id: 2015-05-21_05-40-59
> build_number: '247'
> feature_groups:
> - experimental
> fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
> fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
> fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
> nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
> release: '6.1'
> release_versions:
> 2014.2.2-6.1:
> VERSION:
> api: '1.0'
> astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
> build_id: 2015-05-21_05-40-59
> build_number: '247'
> feature_groups:
> - experimental
> fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
> fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
> fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
> nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
> release: '6.1'
>
> To manage notifications

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to packages/centos6/kmod-openvswitch (6.1)

Related fix proposed to branch: 6.1
Change author: Alexei Sheplyakov <email address hidden>
Review: https://review.fuel-infra.org/7318

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Apparently building custom ISO on fuel-infra CI is not so easy. I'll try to build one locally on my workstation.
Meanwhile could you please try installing the kernel and kmod-openvswitch from this repo: http://osci-obs.vm.mirantis.net:82/centos-fuel-6.1-stable-LP1460721/centos

Revision history for this message
Baboune (seyvet) wrote :

Hi,

We tried with the rpms. No problem installing those.

Behavior observed is confusing though.

1) We used a centOS minimal image as the image for the first VM. Everything worked. The VM booted with no errors, and it could reach out, and be reached even via floating ip. This was done on one slave node-64.
2) We used a centOS minimal image as the image for the first VM. Everything worked. The VM booted with no errors, and it could reach out, and be reached even via floating ip. This was done on another slave node-65.
3) Launched a trusty cloud image on node-64. The VM seems to boot without errors but stays a long time on "Starting enable remaining boot-time encrypted block devices[74G[ OK ]" then it appears to start. It is however unreachable, and it actually blocks the VM launched in 1) from accessing the NW.
4) Launched a trusty cloud image on node-65. The VM seems to boot without errors but stays a long time on "Starting enable remaining boot-time encrypted block devices[74G[ OK ]" then it appears to start. It is however unreachable, and it actually blocks the VM launched in 2) from accessing the NW.
5) Launch a TestVM on node-64, the boot shows traditional DHCP discover errors, then problem with accessing the network.

So it seems as initially things work as long as we do not use a cloud-init enabled image. If we do, then it stops working for all VMs on the host.

Any ideas?

Any logs that could help?

Revision history for this message
Baboune (seyvet) wrote :

How could I try the fedora LT kernel?
would that help?

yum install kernel-lt kernel-lt-devel kernel-lt-headers
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
Setting up Install Process
Package kernel-lt-3.10.55-1.mira6.x86_64 already installed and latest version
Package kernel-lt-devel-3.10.55-1.mira6.x86_64 already installed and latest version
Resolving Dependencies
--> Running transaction check
---> Package kernel-lt-headers.x86_64 0:3.10.55-1.mira6 will be installed
--> Processing Conflict: kernel-lt-headers-3.10.55-1.mira6.x86_64 conflicts kernel-headers < 3.10.55-1.mira6
--> Restarting Dependency Resolution with new changes.
--> Running transaction check
---> Package kernel-headers.x86_64 0:2.6.32-504.1.3.el6 will be updated
---> Package kernel-headers.x86_64 0:2.6.32-504.1.3.el6.mos61 will be an update
--> Processing Conflict: kernel-lt-headers-3.10.55-1.mira6.x86_64 conflicts kernel-headers < 3.10.55-1.mira6
--> Finished Dependency Resolution
Error: kernel-lt-headers conflicts with kernel-headers-2.6.32-504.1.3.el6.mos61.x86_64
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

Revision history for this message
Baboune (seyvet) wrote :

OK. I changed the grub.conf file on one of the servers that were running centos6.5 to use the fedora lt such as:

[root@node-62 ~]# cat /etc/grub.conf

default=0
timeout=5
title CentOS (3.10.55-1.mira6.x86_64)
    kernel /vmlinuz-3.10.55-1.mira6.x86_64 console=ttyS0,9600 console=tty0 biosdevname=0 crashkernel=none rootdelay=90 nomodeset root=UUID=d0d25fb3-eb0e-4eb7-8533-f67babfae5d0 LANG=en_US.UTF-8 KEYTABLE=us
    initrd /initramfs-3.10.55-1.mira6.x86_64.img

title Default (vmlinuz-2.6.32-504.1.3.el6.x86_64)
    kernel /vmlinuz-2.6.32-504.1.3.el6.x86_64 console=ttyS0,9600 console=tty0 biosdevname=0 crashkernel=none rootdelay=90 nomodeset root=UUID=d0d25fb3-eb0e-4eb7-8533-f67babfae5d0
    initrd /initramfs-2.6.32-504.1.3.el6.x86_64.img

And I could instantiate multiple VMs on the server and connect to them. All networking is OK. Boot time is considerably shorter. All VM boot logs are normal.

Will do more tests but it looks promising.

Revision history for this message
Baboune (seyvet) wrote :

Tried on another server and it worked.

What I dod:
1) Makes 3.10 kernel available in /etc/grub.conf

$ yum reinstall kernel-lt

Otherwise only
  title Default (vmlinuz-2.6.32-504.1.3.el6.x86_64)
      kernel /vmlinuz-2.6.32-504.1.3.el6.x86_64 console=ttyS0,9600 console=tty0 biosdevname=0 crashkernel=none rootdelay=90 nomodeset root=UUID=d0d25fb3-eb0e-4eb7-8533-f67babfae5d0
      initrd /initramfs-2.6.32-504.1.3.el6.x86_64.img

is listed.

2) Resulting grub.conf:

[root@node-63 ~]# cat /etc/grub.conf

default=1
timeout=5
title CentOS (3.10.55-1.mira6.x86_64)
    kernel /vmlinuz-3.10.55-1.mira6.x86_64 console=ttyS0,9600 console=tty0 biosdevname=0 crashkernel=none rootdelay=90 nomodeset root=UUID=d0d25fb3-eb0e-4eb7-8533-f67babfae5d0 LANG=en_US.UTF-8 KEYTABLE=us
    initrd /initramfs-3.10.55-1.mira6.x86_64.img

title Default (vmlinuz-2.6.32-504.1.3.el6.x86_64)
    kernel /vmlinuz-2.6.32-504.1.3.el6.x86_64 console=ttyS0,9600 console=tty0 biosdevname=0 crashkernel=none rootdelay=90 nomodeset root=UUID=d0d25fb3-eb0e-4eb7-8533-f67babfae5d0
    initrd /initramfs-2.6.32-504.1.3.el6.x86_64.img

3) Then
$ vi /etc/grub.conf

and set "default=1" to "default=0" to point to 3.10.

4) $ reboot

Changed in fuel:
status: New → Triaged
status: Triaged → New
Changed in fuel:
importance: Undecided → Medium
status: New → Confirmed
assignee: MOS Linux (mos-linux) → Alexei Sheplyakov (asheplyakov)
Revision history for this message
Baboune (seyvet) wrote :

One of the nodes is showing these logs:

hd: ovs|00023|odp_util(revalidator8)|ERR|Dropped 1 log messages in last 58 seconds (most recently, 58 seconds ago) due to excessive rate
<27>Jun 16 11:57:37 node-82 ovs-vswitchd: ovs|00024|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame
<27>Jun 16 11:58:25 node-82 ovs-vswitchd: ovs|00008|odp_util(revalidator9)|ERR|mask expected for non-Ethernet II frame
<27>Jun 16 11:59:52 node-82 ovs-vswitchd: ovs|00025|odp_util(revalidator8)|ERR|Dropped 3 log messages in last 81 seconds (most recently, 62 seconds ago) due to excessive rate
<27>Jun 16 11:59:52 node-82 ovs-vswitchd: ovs|00026|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame
<27>Jun 16 12:00:19 node-82 ovs-vswitchd: ovs|00027|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame
<27>Jun 16 12:02:03 node-82 ovs-vswitchd: ovs|00028|odp_util(revalidator8)|ERR|Dropped 3 log messages in last 94 seconds (most recently, 80 seconds ago) due to excessive rate
<27>Jun 16 12:02:03 node-82 ovs-vswitchd: ovs|00029|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame
<2>Jun 16 12:02:35 node-82 kernel: CPU0: Package power limit notification (total events = 150885)
<27>Jun 16 12:02:36 node-82 ovs-vswitchd: ovs|00030|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame
<27>Jun 16 12:03:01 node-82 ovs-vswitchd: ovs|00031|odp_util(revalidator8)|ERR|Dropped 1 log messages in last 25 seconds (most recently, 25 seconds ago) due to excessive rate
<27>Jun 16 12:03:01 node-82 ovs-vswitchd: ovs|00032|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame
<27>Jun 16 12:04:02 node-82 ovs-vswitchd: ovs|00033|odp_util(revalidator8)|ERR|Dropped 3 log messages in last 59 seconds (most recently, 12 seconds ago) due to excessive rate
<27>Jun 16 12:04:02 node-82 ovs-vswitchd: ovs|00034|odp_util(revalidator8)|ERR|mask expected for non-Ethernet II frame

This seems to point to ovs.

After setting the kernel to vmlinuz-3.10.55-1.mira6.x86_64 as per above description, should the following package be installed as well:
yum info kmod-openvswitch-lt
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
Available Packages
Name : kmod-openvswitch-lt
Arch : x86_64
Version : 2.3.1
Release : 1.mira2
Size : 1.2 M
Repo : mos
Summary : Open vSwitch kernel module
URL : http://openvswitch.org/
License : GPLv2
Description : This package provide Open vSwitch kernel module for kernel-lt 3.10.55-1.mira6

Are there other packages that require to be added?

Revision history for this message
Baboune (seyvet) wrote :

Additional info, here are the openvswitch packaged that are installed on all nodes:
rpm -qa | grep vswit
openvswitch-2.3.1-1.mira1.x86_64
openstack-neutron-openvswitch-2014.2.2-fuel6.1.mira29.noarch
kmod-openvswitch-2.3.1-1.mira1.x86_64

This is in parallel with the /vmlinuz-3.10.55-1.mira6.x86_64 kernel.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> rpm -qa | grep vswit
> openvswitch-2.3.1-1.mira1.x86_64
> openstack-neutron-openvswitch-2014.2.2-fuel6.1.mira29.noarch
> kmod-openvswitch-2.3.1-1.mira1.x86_64
> This is in parallel with the /vmlinuz-3.10.55-1.mira6.x86_64 kernel.

This combination should not work (kmod-openvswitch-lt is required for kernel-lt)

Revision history for this message
Baboune (seyvet) wrote : Re: [Bug 1460721] Re: centos 6.5 C610 chipset cloud-init-nonet waiting for network device
Download full text (3.3 KiB)

Are there other packages/ kernel modules that must be present?

Sent from my iPhone

On 16 Jun 2015, at 15:04, Alexei Sheplyakov <email address hidden> wrote:

>> rpm -qa | grep vswit
>> openvswitch-2.3.1-1.mira1.x86_64
>> openstack-neutron-openvswitch-2014.2.2-fuel6.1.mira29.noarch
>> kmod-openvswitch-2.3.1-1.mira1.x86_64
>> This is in parallel with the /vmlinuz-3.10.55-1.mira6.x86_64 kernel.
>
> This combination should not work (kmod-openvswitch-lt is required for
> kernel-lt)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1460721
>
> Title:
> centos 6.5 C610 chipset cloud-init-nonet waiting for network device
>
> Status in Fuel: OpenStack installer that works:
> Confirmed
>
> Bug description:
> Fuel 6.1, Juno, CENTOS, neutron VLAN, HA.
>
> 32 nodes in cluster. Out of 32, 16 servers fully operational. Can
> instantiate VM, network devices, etc. Fully OK.
>
> 16 nodes are answering calls from Openstack (neutron, blocks, launch VM, etc) but, when a VM is launched on any of that pool of 16 servers, then an error occurs:
> "cloud-init-nonet waiting 120 seconds for a network device.
> cloud-init-nonet gave up waiting for a network."
>
> It seems to be running Cloud-init v. 0.7.5.
>
> The 16 failing machines have different HW, Chipset is C610, NIC is
> intel x520, cpu is Xeon e5-2609 haswell, storage is using onboard sata
> .
>
> Any known issues with the centOS kernel? Any updates in the last 2-3
> weeks in the Fuel CentOS packaged kernels or images?
>
> Also, any suggestions to look for potential kernel, or virtualization
> problems?
>
> OSTF tests are green.
>
> fuel --fuel-version
> DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
> api: '1.0'
> astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
> auth_required: true
> build_id: 2015-05-21_05-40-59
> build_number: '247'
> feature_groups:
> - experimental
> fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
> fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
> fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
> nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
> release: '6.1'
> release_versions:
> 2014.2.2-6.1:
> VERSION:
> api: '1.0'
> astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
> build_id: 2015-05-21_05-40-59
> build_number: '247'
> feature_groups:
> - experimental
> fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
> fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
> fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
> nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fc...

Read more...

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on packages/centos6/kmod-openvswitch (6.1)

Change abandoned by Alexei Sheplyakov <email address hidden> on branch: 6.1
Review: https://review.fuel-infra.org/7318
Reason: Let's remove the junk

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on packages/centos6/kernel (6.1)

Change abandoned by Alexei Sheplyakov <email address hidden> on branch: 6.1
Review: https://review.fuel-infra.org/7303
Reason: Nobody cares

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0-updates and 6.1-updates as the fix would require to upgrade CentOS kernel. Our current recommendation is to use Ubuntu which has newer kernel or wait for MOS to support newer CentOS kernel (8.0)

Changed in fuel:
status: Confirmed → Won't Fix
milestone: 6.1 → 6.1-updates
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.