Skip to content

BlueField DPU Setup Notes

This webpage is directly generated from the README of j3soon/bluefield-dpu-setup-notes. Please refer to the repository for the mentioned examples.

Unofficial notes for setting up and configuring NVIDIA BlueField DPUs on custom server systems (non-DGX platforms), including support for Proxmox VE.

Terminology

Linux Systems (Host/DPU/BMC):

  • Host: The system running on the server.
    • In the optional Proxmox VE section, it will be further divided into PVE and VM.
    • In the remaining sections, the host refers to the server's operating system, regardless of whether it's running directly on hardware or within the VM of Proxmox VE.
  • DPU: The system running on the DPU.
  • BMC: The system on the board management controller of the DPU. This is an independent system that provides out-of-band management capabilities, separate from the DPU's main operating system.

Hardware

  • Server: G493-ZB3-AAP1-rev-1x [ref]
  • BlueField-3 DPU: B3210E 900-9D3B6-00SC-EA0 [ref]
    • BMC Management Interface (RJ45): Ethernet Cable [ref]
    • DPU Eth/IB Port 0 (QSFP112): InfiniBand Cable [ref]
    • DPU Eth/IB Port 1 (QSFP112): Empty
  • V100 GPU: Tesla V100-PCIE-16GB

Hardware Setup

Require a supplementary 8-pin ATX power supply connectivity available through the external power supply connector .

Do not link the CPU power cable to the BlueField-3 DPU PCIe ATX power connector, as their pin configurations differ. Using the CPU power cable in this manner is strictly prohibited and can potentially damage the BlueField-3 DPU. Please refer to External PCIe Power Supply Connector Pins for the external PCIe power supply pins.

-- Hardware Installation and PCIe Bifurcation

  • DPU BMC 1GbE interface connected to the management network via ToR
  • Remote Management Controller (RMC) connected to DPU BMC 1GbE via ToR

    Info
    RMC is the platform for data center infrastructure managers to manage DPUs.

  • DHCP server existing in the management network
  • An NVQual certified server

-- Prerequisites for Initial BlueField-3 Deployment

References:

Software

  • (Optional) Proxmox VE 8.2.2
  • Host OS
    • Operating System: Ubuntu 24.04.2 LTS
    • Kernel: Linux 6.8.0-54-generic
    • Architecture: x86-64
    • DOCA-Host: 2.9.2 LTS [ref]
  • BMC
    • Operating System: NVIDIA Moonraker/RoyB BMC (OpenBMC Project Reference Distro) BF-24.01-5
    • Kernel: Linux 5.15.50-e62bf17
    • Architecture: arm
  • DPU
    • Operating System: Ubuntu 22.04.5 LTS
    • Kernel: Linux 5.15.0-1035-bluefield
    • Architecture: arm64
    • DOCA-BlueField: 2.9.2 [ref]
    • Mode: DPU Mode [ref]
    • DPU image and firmware: [ref]
      $ sudo bfvcheck
      Beginning version check...
      
      -RECOMMENDED VERSIONS-
      ATF: v2.2(release):4.9.2-14-geeb9a6f94
      UEFI: 4.9.2-25-ge0f86cebd6
      FW: 32.43.2566
      
      -INSTALLED VERSIONS-
      ATF: v2.2(release):4.9.2-14-geeb9a6f94
      UEFI: 4.9.2-25-ge0f86cebd6
      FW: 32.43.2566
      
      Version check complete.
      No issues found.
      
    • BlueField OS image version: [ref]
      $ cat /etc/mlnx-release
      bf-bundle-2.9.2-31_25.02_ubuntu-22.04_prod
      

(Optional) Proxmox VE Passthrough

Please skip to the next section if Proxmox VE is not used.

  • IOMMU Setup

    • Ensure that IOMMU (VT-d or AMD-Vi) is enabled in the BIOS/UEFI.

      lscpu | grep Virtualization
      
    • AMD enables it by default, check it using the following command:

      for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
      
      • It works when you see multiple groups and you can check which devices are properly isolated (no other devices in the same group except for PCI bridges) for PCI passthrough.
    • If it cannot be enabled, modify the GRUB configuration. Locate GRUB_CMDLINE_LINUX_DEFAULT, and for AMD, set it to:

      bash GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt"
      
      # for Intel
      GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
      
      • Verify whether IOMMU is enabled (though it's uncertain if this method works) by using:

        dmesg | grep -e DMAR -e IOMMU
        
        # works
        DMAR: IOMMU enabled
        
    • check NIC info

      bash lspci -nn | grep -i bluefield
      bash lspci -nn | grep -i nvidia
      
  • Proxmox VE Setup

    • Find PCI ID

      lspci -nn | grep -i mellanox
      lspci -nn | grep -i nvidia
      
      # Take note of the device's PCI address (e.g., 0000:03:00.0) and Vendor:Device ID (e.g., 1b4b:xxxx).
      
    • Check vfio module

      lsmod | grep vfio
      
      • enable if there is no vfio module:

        echo "vfio" >> /etc/modules
        echo "vfio_iommu_type1" >> /etc/modules
        echo "vfio_pci" >> /etc/modules
        update-initramfs -u
        reboot
        dmesg | grep -i vfio
        
    • Configure VFIO: The BlueField card must be managed by vfio-pci to prevent the default driver from automatically loading.

      nano /etc/modprobe.d/vfio.conf
      
      # vendor device
      options vfio-pci ids=15b3:a2d2
      # update
      update-initramfs -u
      
      # Not sure if this is needed
      softdep mlx5_core pre: vfio-pci
      # Blacklist default driver (edit `/etc/modprobe.d/pve-blacklist.conf`)
      `blacklist mlx5_core`
      
    • Reboot the system and verify that the PCI device is bound to vfio-pci:

      lspci -nnk -d 1b4b:xxxx
      
  • VM Setup

    • Create or stop the target VM, add the following line in Proxmox Web UI or directly edit the VM configuration file (e.g. /etc/pve/qemu-server/<VMID>.conf), replace 0000:03:00.0 with the PCI address of your BlueField card.

      hostpci0: 0000:03:00.0,pcie=1
      
      • If the card has multiple functions (multi-function device), you can add hostpci1, hostpci2, etc. or add multifunction=on (adjust as needed).
      • Check the VM

        • lspci -nn | grep -i nvidia
    • Appendix
      • V100 Passthrough in Proxmox VE GUI: Datacenter > Resource Mappings > Add
      • DPU Passthrough in Proxmox VE GUI: VM > Hardware > Add > PCI Device

References:

Software Setup

Host

Execute the following commands on the host.

Check PCI devices:

lspci -nn | grep -i mellanox
lspci -nn | grep -i nvidia

Install common packages:

sudo apt-get update
# Install pv for viewing progress of the commands below
sudo apt-get install -y pv

(Optional) Uninstall old DOCA-Host: [ref]

for f in $( dpkg --list | grep -E 'doca|flexio|dpa-gdbserver|dpa-stats|dpaeumgmt' | awk '{print $2}' ); do echo $f ; sudo apt remove --purge $f -y ; done
sudo /usr/sbin/ofed_uninstall.sh --force
sudo apt-get autoremove

Install DOCA-Host (DPU Driver) 2.9.2 LTS [download]:

# DPU Driver (DOCA-Host)
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.9.2/host/doca-host_2.9.2-012000-24.10-ubuntu2404_amd64.deb
sudo dpkg -i doca-host_2.9.2-012000-24.10-ubuntu2404_amd64.deb
sudo apt-get update
sudo apt-get -y install doca-all
# Check DOCA-Host
dpkg -l | grep doca

# GPU Driver & CUDA
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
sudo sh cuda_12.8.0_570.86.10_linux.run
# Check Driver
nvidia-smi

Fix macsec driver issue:

Strangely, Ubuntu 24.04's kernel binary package doesn't seem to include the macsec driver, causing mlx5_ib not being able to load. This may be observed by running sudo mst status -v, sudo dmesg | grep mlx5, and ibstatus.

2025/06/29 Update: A easier solution seems to be:

sudo apt-get install linux-modules-extra-$(uname -r)
Then we don't need to build the macsec driver ourselves.

To fix this issue, we build the macsec driver ourselves:

# Download macsec from kernel source
wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/drivers/net/macsec.c?h=v6.8 -O macsec.c

# Create Makefile
cat << 'EOF' > Makefile
obj-m += macsec.o
all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
EOF

make

sudo cp macsec.ko /lib/modules/$(uname -r)/kernel/drivers/net
sudo depmod -a

# macsec module should be available
modinfo macsec
sudo modprobe macsec
lsmod | grep macsec

# Reload mlx5_core module
sudo rmmod mlx5_core
sudo modprobe mlx5_core

Make sure to re-compile the macsec module if you encounter the following error when running sudo modprobe macsec:

modprobe: ERROR: could not insert 'macsec': Exec format error

Connect to the DPU via RShim: [ref]

sudo systemctl enable --now rshim
sudo ip addr add 192.168.100.1/24 dev tmfifo_net0
ping 192.168.100.2
# connect to the DPU
ssh ubuntu@192.168.100.2

Change DPU to IB mode: [ref]

# Note that this can also be done on DPU
sudo mlxconfig -d /dev/mst/mt41692_pciconf0 set LINK_TYPE_P1=1
sudo mlxconfig -d /dev/mst/mt41692_pciconf0 set LINK_TYPE_P2=1
# Cold reboot the machine

Deploying DPU OS Using BFB from Host: [download] [ref]

# update DOCA-BlueField to 2.9.2
wget https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-2.9.2-31_25.02_ubuntu-22.04_prod.bfb
sudo bfb-install --bfb bf-bundle-2.9.2-31_25.02_ubuntu-22.04_prod.bfb --rshim /dev/rshim0

(Optional, Unconfirmed) Update DPU Firmware: [download]

# update firmware
wget https://content.mellanox.com/BlueField/FW-Bundle/bf-fwbundle-2.9.2-31_25.02-prod.bfb
sudo bfb-install --bfb bf-fwbundle-2.9.2-31_25.02-prod.bfb --rshim rshim0

Other DOCA tools and commands for debugging:

cd /opt/mellanox/doca/tools
doca_caps --list-devs
doca_bench --device 01:00.0 --query device-capabilities
sudo ibdev2netdev -v
sudo mlxlink -d /dev/mst/mt41692_pciconf0

DPU

Execute the following commands on the DPU.

# Check BlueField OS image version
cat /etc/mlnx-release
# Check DOCA-BlueField
dpkg -l | grep doca

Update DPU firmware: [ref] [firmware-tools] [flint] [mlxfwmanager]

# check firmware
sudo mlxfwmanager --query
sudo flint -d /dev/mst/mt41692_pciconf0 q

# update firmware
sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl

# force update
# sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl --force-fw-update

# Need to cold reboot the machine

Launch OpenSM on DPU for using InfiniBand on host side. Before this step, running ibstat on host will show State: Down and Physical state: LinkUp. Running ibstat on host will show State: Up after this step.

# Get the `Node GUID` from the corresponding CA
ibstat

# Run OpenSM with the Node GUID to recognize virtual ports on the host.
sudo opensm -g <DPU_IB_NODE_GUID> -p 10
# If there's another OpenSM running on other hosts, make sure to set the priority higher than those.
# In our case, we have another OpenSM with priority 0 in the subnet, so we set our priority to 10.

InfiniBand in DPU Mode

In DPU Mode, when operating with an InfiniBand network, OpenSM must be executed from the BlueField Arm side rather than the host side. Similarly, InfiniBand management tools such as sminfo, ibdev2netdev, and ibnetdiscover can only be used from the BlueField Arm side and are not accessible from the host side.

-- BlueField Modes of Operation

Resetting DPU: [ref]

# Query for reset level required to load new firmware
sudo mlxfwreset -d /dev/mst/mt*pciconf0 q

Output of the query command:

Reset-levels:
0: Driver, PCI link, network link will remain up ("live-Patch")  -Supported     (default)
1: Only ARM side will not remain up ("Immediate reset").         -Not Supported
3: Driver restart and PCI reset                                  -Supported
4: Warm Reboot                                                   -Supported

Reset-types (relevant only for reset-levels 1,3,4):
0: Full chip reset                                               -Supported     (default)
1: Phy-less reset (keep network port active during reset)        -Not Supported
2: NIC only reset (for SoC devices)                              -Not Supported
3: ARM only reset                                                -Not Supported
4: ARM OS shut down                                              -Not Supported

Reset-sync (relevant only for reset-level 3):
0: Tool is the owner                                             -Not supported
1: Driver is the owner                                           -Supported     (default)

Debugging:

# collect all debug message in host
sudo /usr/sbin/sysinfo-snapshot.py

References:

BMC

IPMI:

# Check sensors
ipmitool sdr

# Power control
ipmitool chassis power
# chassis power Commands: status, on, off, cycle, reset, diag, soft

# Check power status
ipmitool chassis status

# Control the BMC itself
ipmitool mc

Redfish:

# Check BMC version
curl -k -u 'root:<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware

References:

Host2

Given another host connected with InfiniBand, you can ping it from the DPU:

On the other host host2:

ibstat # check `Base lid`
sudo ibping -S

On the DPU:

sudo ibnetdiscover # You should see the same lid
ibstat # check `CA` and `Port`
sudo ibping -C <CA> -P <PORT> -L <LID>
# For an example:
# sudo ibping -C mlx5_0 -P 1 -L 13

You can also switch the server and client roles by running ibping -S on the DPU and ibping -C <CA> -P <PORT> -L <LID> on the other host.

Examples

Please refer to the examples for more details.

Contributors & Acknowledgements

Contributors: @tsw303005, @Aiden128, @YiPrograms, and @j3soon.

This note has been made possible through the support of LSA Lab, and NVIDIA AI Technology Center (NVAITC).