BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Critical
|
Unassigned | ||
network-manager (Ubuntu) |
Confirmed
|
Critical
|
Unassigned |
Bug Description
The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data
Basically, we are dropping data, as you can see from the benchmark tool as follows:
tcdforge@
[INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; UHD_3.14.
[WARNING] [UHD] Unable to set the thread priority. Performance may be negatively affected.
Please see the general application notes in the manual for instructions.
EnvironmentError: OSError: error in pthread_
[00:00:00.000007] Creating the usrp device with: ...
[INFO] [X300] X300 initialization sequence...
[INFO] [X300] Maximum frame size: 1472 bytes.
[INFO] [X300] Radio 1x clock: 200 MHz
[INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
[INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D00000000000)
[INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
[INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
[INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD100000000001)
[INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD100000000001)
[INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0000000000000)
[INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0000000000000)
[INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0000000000000)
[INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0000000000000)
Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
RX DSP: 0
RX Dboard: A
RX Subdev: SBX-120 RX
RX Channel: 1
RX DSP: 0
RX Dboard: B
RX Subdev: SBX-120 RX
TX Channel: 0
TX DSP: 0
TX Dboard: A
TX Subdev: SBX-120 TX
TX Channel: 1
TX DSP: 0
TX Dboard: B
TX Subdev: SBX-120 TX
[00:00:04.305374] Setting device timestamp to 0...
[WARNING] [UHD] Unable to set the thread priority. Performance may be negatively affected.
Please see the general application notes in the manual for instructions.
EnvironmentError: OSError: error in pthread_
[00:00:04.310990] Testing receive rate 10.000000 Msps on 1 channels
[WARNING] [UHD] Unable to set the thread priority. Performance may be negatively affected.
Please see the general application notes in the manual for instructions.
EnvironmentError: OSError: error in pthread_
[00:00:04.318356] Testing transmit rate 10.000000 Msps on 1 channels
[00:00:06.693119] Detected Rx sequence error.
D[00:00:09.402843] Detected Rx sequence error.
DD[00:00:40.927978] Detected Rx sequence error.
D[00:01:44.982243] Detected Rx sequence error.
D[00:02:11.400692] Detected Rx sequence error.
D[00:02:14.805292] Detected Rx sequence error.
D[00:02:41.875596] Detected Rx sequence error.
D[00:03:06.927743] Detected Rx sequence error.
D[00:03:47.967891] Detected Rx sequence error.
D[00:03:58.233659] Detected Rx sequence error.
D[00:03:58.876588] Detected Rx sequence error.
D[00:04:03.139770] Detected Rx sequence error.
D[00:04:45.287465] Detected Rx sequence error.
D[00:04:56.425845] Detected Rx sequence error.
D[00:04:57.929209] Detected Rx sequence error.
[00:05:04.529548] Benchmark complete.
Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples: 4622800
Num overruns detected: 0
Num transmitted samples: 3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected: 0
Num late commands: 0
Num timeouts (Tx): 0
Num timeouts (Rx): 0
Done!
tcdforge@
In this particular case description, the nodes are USRP x310s. However, we have the same issue with N210 nodes dropping samples connected to the BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device.
There is no problem with the USRPs themselves, as we have tested them with normal 1G network cards and have no dropped samples.
Personally I think its something to do with the 10G network card, possibly on a ubuntu driver???
Note, Dell have said there is no hardware problem with the 10G interfaces
I have followed the troubleshooting information on this link to try determine the problem: https:/
- There is no firewall on that port (disabled).
- I tried setting the cpu frequency power but got "no or unknown cpufreq driver is active on this CPU".
- I also changed the cable to Cat6a connecting the USRPs to the 10G SRIOV port, and I get the same issue
This is from the VM with connected USRP x310
tcdforge@x310a:~$ lspci -nn | grep -i ethernet
00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
00:05.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme-E Ethernet Virtual Function [14e4:16dc]
tcdforge@x310a:~$
5e:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller [14e4:16d8] (rev 01)
Subsystem: Dell BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller [1028:1fea]
Flags: bus master, fast devsel, latency 0, IRQ 50, NUMA node 0
Memory at b9a10000 (64-bit, prefetchable) [size=64K]
Memory at b9100000 (64-bit, prefetchable) [size=1M]
Memory at b9aa2000 (64-bit, prefetchable) [size=8K]
Expansion ROM at b9c00000 [disabled] [size=512K]
Kernel driver in use: bnxt_en
Kernel modules: bnxt_en
We get this info from the server:
scamallra@rack9:~$ cpupower frequency-info
analyzing CPU 0:
no or unknown cpufreq driver is active on this CPU
CPUs which run at the same hardware frequency: Not Available
CPUs which need to have their frequency coordinated by software: Not Available
maximum transition latency: Cannot determine or is not supported.
Not Available
available cpufreq governors: Not Available
Unable to determine current policy
current CPU frequency: Unable to call hardware
current CPU frequency: Unable to call to kernel
boost state support:
Supported: yes
Active: yes
lsb_release -rd
Description: Ubuntu 18.04.3 LTS
Release: 18.04
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: network-manager 1.10.6-2ubuntu1.1
ProcVersionSign
Uname: Linux 4.15.0-70-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
Date: Fri Nov 22 17:39:21 2019
NetworkManager.
[main]
NetworkingEnab
WirelessEnable
WWANEnabled=true
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: network-manager
UpgradeStatus: No upgrade log present (probably fresh install)
nmcli-con: NAME UUID TYPE TIMESTAMP TIMESTAMP-REAL AUTOCONNECT AUTOCONNECT-
nmcli-nm:
RUNNING VERSION STATE STARTUP CONNECTIVITY NETWORKING WIFI-HW WIFI WWAN-HW WWAN
running 1.10.6 connected started unknown enabled enabled enabled enabled enabled
I have reports of the same device appearing to drop packets and incur greater number of retransmissions under certain circumstances which we're still trying to nail down.
I'm using this bug for now until proven to be a different problem.
This is causing issues in a production environment.