Updates to ib_peer_memory requested by Nvidia
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
dann frazier | ||
Focal |
Fix Released
|
Medium
|
dann frazier | ||
Hirsute |
Fix Released
|
Medium
|
dann frazier | ||
Impish |
Fix Released
|
Medium
|
dann frazier |
Bug Description
[Impact]
Nvidia notified me via private email that they'd discovered some issues with the ib_peer_memory patch we are carrying in hirsute/impish and sent me a patch intended to resolve them. My knowledge of these changes is limited to what is mentioned in the commit message:
- Allow clients to opt out of unmap during invalidation
- Fix some bugs in the sequencing of mlx5 MRs
- Enable ATS for peer memory
[Test Case]
ib_write_bw from the perftest package, rebuilt with CUDA support, can be used as a smoke test of this feature. I'll attach a sample test script here. I've verified this test passes with the kernels in the archive, and continues to pass with the provided patch applied.
[Fix]
Nvidia has emailed me fixes for both trees. They are not currently available in a public tree elsewhere, though I'm told at some point they should end up in a branch here:
https:/
[What could go wrong]
The only known use case for ib_peer_memory are Nvidia GPU users making use of the GPU PeerDirect feature where GPUs can share memory with one another over an Infiniband network. Bugs here could cause problems (hangs, crashes, corruption) with such workloads.
CVE References
Changed in linux (Ubuntu Impish): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Hirsute): | |
importance: | Undecided → Medium |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Impish): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
tags: | added: verification-done |
This bug is awaiting verification that the linux/5.13.0-23.23 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- impish' to 'verification- done-impish' . If the problem still exists, change the tag 'verification- needed- impish' to 'verification- failed- impish' .
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!