BlueField DPU Setup Notes¶
This webpage is directly generated from the README of j3soon/bluefield-dpu-setup-notes. Please refer to the repository for the mentioned examples.
Unofficial notes for setting up and configuring NVIDIA BlueField DPUs on custom server systems (non-DGX platforms), including support for Proxmox VE.
Terminology¶
Linux Systems (Host/DPU/BMC):
Host: The system running on the server.- In the optional Proxmox VE section, it will be further divided into
PVEandVM. - In the remaining sections, the host refers to the server's operating system, regardless of whether it's running directly on hardware or within the VM of Proxmox VE.
- In the optional Proxmox VE section, it will be further divided into
DPU: The system running on the DPU.BMC: The system on the board management controller of the DPU. This is an independent system that provides out-of-band management capabilities, separate from the DPU's main operating system.
Hardware¶
- Server: G493-ZB3-AAP1-rev-1x [ref]
- BlueField-3 DPU: B3210E 900-9D3B6-00SC-EA0 [ref]
- V100 GPU: Tesla V100-PCIE-16GB
Hardware Setup¶
Require a supplementary 8-pin ATX power supply connectivity available through the external power supply connector .
Do not link the CPU power cable to the BlueField-3 DPU PCIe ATX power connector, as their pin configurations differ. Using the CPU power cable in this manner is strictly prohibited and can potentially damage the BlueField-3 DPU. Please refer to External PCIe Power Supply Connector Pins for the external PCIe power supply pins.
-- Hardware Installation and PCIe Bifurcation
- DPU BMC 1GbE interface connected to the management network via ToR
- Remote Management Controller (RMC) connected to DPU BMC 1GbE via ToR
Info
RMC is the platform for data center infrastructure managers to manage DPUs.- DHCP server existing in the management network
- An NVQual certified server
References:
- NVIDIA BlueField-3 Networking Platform User Guide
- BlueField-3 Administrator Quick Start Guide
- Hardware Installation and PCIe Bifurcation
Software¶
- (Optional) Proxmox VE 8.2.2
- Host OS
- Operating System: Ubuntu 24.04.2 LTS
- Kernel: Linux 6.8.0-54-generic
- Architecture: x86-64
- DOCA-Host: 2.9.2 LTS [ref]
- BMC
- Operating System: NVIDIA Moonraker/RoyB BMC (OpenBMC Project Reference Distro) BF-24.01-5
- Kernel: Linux 5.15.50-e62bf17
- Architecture: arm
- DPU
(Optional) Proxmox VE Passthrough¶
Please skip to the next section if Proxmox VE is not used.
-
IOMMU Setup
-
Ensure that IOMMU (VT-d or AMD-Vi) is enabled in the BIOS/UEFI.
-
AMD enables it by default, check it using the following command:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done- It works when you see multiple groups and you can check which devices are properly isolated (no other devices in the same group except for PCI bridges) for PCI passthrough.
-
If it cannot be enabled, modify the GRUB configuration. Locate
GRUB_CMDLINE_LINUX_DEFAULT, and for AMD, set it to:bash GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt" # for Intel GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"-
Verify whether IOMMU is enabled (though it's uncertain if this method works) by using:
-
-
check NIC info
-
-
Proxmox VE Setup
-
Find PCI ID
-
Check vfio module
-
enable if there is no vfio module:
-
-
Configure VFIO: The BlueField card must be managed by
vfio-pcito prevent the default driver from automatically loading.
-
Reboot the system and verify that the PCI device is bound to
vfio-pci:
-
-
VM Setup
-
Create or stop the target VM, add the following line in Proxmox Web UI or directly edit the VM configuration file (e.g.
/etc/pve/qemu-server/<VMID>.conf), replace0000:03:00.0with the PCI address of your BlueField card.- If the card has multiple functions (multi-function device), you can add
hostpci1,hostpci2, etc. or addmultifunction=on(adjust as needed).
-
Check the VM
lspci -nn | grep -i nvidia
- If the card has multiple functions (multi-function device), you can add
- Appendix
- V100 Passthrough in Proxmox VE GUI:
Datacenter > Resource Mappings > Add - DPU Passthrough in Proxmox VE GUI:
VM > Hardware > Add > PCI Device
- V100 Passthrough in Proxmox VE GUI:
-
References:
Software Setup¶
Host¶
Execute the following commands on the host.
Check PCI devices:
Install common packages:
sudo apt-get update
# Install pv for viewing progress of the commands below
sudo apt-get install -y pv
(Optional) Uninstall old DOCA-Host: [ref]
for f in $( dpkg --list | grep -E 'doca|flexio|dpa-gdbserver|dpa-stats|dpaeumgmt' | awk '{print $2}' ); do echo $f ; sudo apt remove --purge $f -y ; done
sudo /usr/sbin/ofed_uninstall.sh --force
sudo apt-get autoremove
Install DOCA-Host (DPU Driver) 2.9.2 LTS [download]:
# DPU Driver (DOCA-Host)
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.9.2/host/doca-host_2.9.2-012000-24.10-ubuntu2404_amd64.deb
sudo dpkg -i doca-host_2.9.2-012000-24.10-ubuntu2404_amd64.deb
sudo apt-get update
sudo apt-get -y install doca-all
# Check DOCA-Host
dpkg -l | grep doca
# GPU Driver & CUDA
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
sudo sh cuda_12.8.0_570.86.10_linux.run
# Check Driver
nvidia-smi
Fix macsec driver issue:
Strangely, Ubuntu 24.04's kernel binary package doesn't seem to include the
macsecdriver, causingmlx5_ibnot being able to load. This may be observed by runningsudo mst status -v,sudo dmesg | grep mlx5, andibstatus.2025/06/29 Update: A easier solution seems to be:
Then we don't need to build themacsecdriver ourselves.
To fix this issue, we build the macsec driver ourselves:
# Download macsec from kernel source
wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/drivers/net/macsec.c?h=v6.8 -O macsec.c
# Create Makefile
cat << 'EOF' > Makefile
obj-m += macsec.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
EOF
make
sudo cp macsec.ko /lib/modules/$(uname -r)/kernel/drivers/net
sudo depmod -a
# macsec module should be available
modinfo macsec
sudo modprobe macsec
lsmod | grep macsec
# Reload mlx5_core module
sudo rmmod mlx5_core
sudo modprobe mlx5_core
Make sure to re-compile the macsec module if you encounter the following error when running
sudo modprobe macsec:
Connect to the DPU via RShim: [ref]
sudo systemctl enable --now rshim
sudo ip addr add 192.168.100.1/24 dev tmfifo_net0
ping 192.168.100.2
# connect to the DPU
ssh ubuntu@192.168.100.2
Change DPU to IB mode: [ref]
# Note that this can also be done on DPU
sudo mlxconfig -d /dev/mst/mt41692_pciconf0 set LINK_TYPE_P1=1
sudo mlxconfig -d /dev/mst/mt41692_pciconf0 set LINK_TYPE_P2=1
# Cold reboot the machine
Deploying DPU OS Using BFB from Host: [download] [ref]
# update DOCA-BlueField to 2.9.2
wget https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-2.9.2-31_25.02_ubuntu-22.04_prod.bfb
sudo bfb-install --bfb bf-bundle-2.9.2-31_25.02_ubuntu-22.04_prod.bfb --rshim /dev/rshim0
(Optional, Unconfirmed) Update DPU Firmware: [download]
Other DOCA tools and commands for debugging:
cd /opt/mellanox/doca/tools
doca_caps --list-devs
doca_bench --device 01:00.0 --query device-capabilities
DPU¶
Execute the following commands on the DPU.
Update DPU firmware: [ref] [firmware-tools] [flint] [mlxfwmanager]
# check firmware
sudo mlxfwmanager --query
sudo flint -d /dev/mst/mt41692_pciconf0 q
# update firmware
sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl
# force update
# sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl --force-fw-update
# Need to cold reboot the machine
Launch OpenSM on DPU for using InfiniBand on host side. Before this step, running ibstat on host will show State: Down and Physical state: LinkUp. Running ibstat on host will show State: Up after this step.
# Get the `Node GUID` from the corresponding CA
ibstat
# Run OpenSM with the Node GUID to recognize virtual ports on the host.
sudo opensm -g <DPU_IB_NODE_GUID> -p 10
# If there's another OpenSM running on other hosts, make sure to set the priority higher than those.
# In our case, we have another OpenSM with priority 0 in the subnet, so we set our priority to 10.
InfiniBand in DPU Mode
In DPU Mode, when operating with an InfiniBand network, OpenSM must be executed from the BlueField Arm side rather than the host side. Similarly, InfiniBand management tools such as
sminfo,ibdev2netdev, andibnetdiscovercan only be used from the BlueField Arm side and are not accessible from the host side.
Resetting DPU: [ref]
Output of the query command:
Reset-levels: 0: Driver, PCI link, network link will remain up ("live-Patch") -Supported (default) 1: Only ARM side will not remain up ("Immediate reset"). -Not Supported 3: Driver restart and PCI reset -Supported 4: Warm Reboot -Supported Reset-types (relevant only for reset-levels 1,3,4): 0: Full chip reset -Supported (default) 1: Phy-less reset (keep network port active during reset) -Not Supported 2: NIC only reset (for SoC devices) -Not Supported 3: ARM only reset -Not Supported 4: ARM OS shut down -Not Supported Reset-sync (relevant only for reset-level 3): 0: Tool is the owner -Not supported 1: Driver is the owner -Supported (default)
Debugging:
References:
BMC¶
IPMI:
# Check sensors
ipmitool sdr
# Power control
ipmitool chassis power
# chassis power Commands: status, on, off, cycle, reset, diag, soft
# Check power status
ipmitool chassis status
# Control the BMC itself
ipmitool mc
Redfish:
# Check BMC version
curl -k -u 'root:<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware
References:
- Connecting to BMC Interfaces
- Reset Control
- Table of Common Redfish Commands
- NVIDIA BlueField Reset and Reboot Procedures
Host2¶
Given another host connected with InfiniBand, you can ping it from the DPU:
On the other host host2:
On the DPU:
sudo ibnetdiscover # You should see the same lid
ibstat # check `CA` and `Port`
sudo ibping -C <CA> -P <PORT> -L <LID>
# For an example:
# sudo ibping -C mlx5_0 -P 1 -L 13
You can also switch the server and client roles by running ibping -S on the DPU and ibping -C <CA> -P <PORT> -L <LID> on the other host.
Examples¶
Please refer to the examples for more details.
Contributors & Acknowledgements¶
Contributors: @tsw303005, @Aiden128, @YiPrograms, and @j3soon.
This note has been made possible through the support of LSA Lab, and NVIDIA AI Technology Center (NVAITC).