Setting up environments for user applications on an HPC cluster is often tedious and divert attention from the application itself. To tackle this, containerization support is a great way to simplify the process. HPC clusters often use Slurm workload manager along with containerization tools such as Singularity/Apptainer, Rootless Docker (environment module), or Enroot+Pyxis, for easier environment management.
Based on my experience working with Slurm and all these containerization options, I personally prefer Slurm with Enroot+Pyxis as it offers the simplest workflow for users familiar with Docker, while also ensuring minimal performance overhead.
The setup instructions are already documented in the official Pyxis repository. Enroot documentation also contains detailed usage guide on single-node tasks. However, there is no documentation for running multi-node tasks directly with Enroot without Pyxis. Using Enroot directly without Pyxis may be needed when you have direct (bare metal) access to multiple Ubuntu nodes and do not want to set up a scheduler or use a workload manager like Slurm. In such cases, Enroot itself alone can serve as a lightweight and effective containerization solution for HPC environments.
This (unofficial) document describes the minimal setup required for running multi-node tasks directly with Enroot without Pyxis and Slurm. Please note that running multi-node tasks with Enroot is more of a hack than a fool-proof solution, the recommended method for multi-node tasks remains using Enroot+Pyxis.
Create a user account (with same username/UID/GID) with sudo privileges on all nodes, with the home directory set to /mnt/home/<username>. If the user already exists, skip this step.
You'll want to use tools like LDAP to manage the user account. Alternatively, you can manually create the user account on all nodes:
# Create user accountUSERNAME=<username>
sudouseradd-m-d/mnt/home/${USERNAME}-s/bin/bash-u10001-g10001-Gsudo${USERNAME}# Enable password-less sudoecho'%sudo ALL=(ALL) NOPASSWD:ALL'>>/etc/sudoers
Set up an NFS server on the head node and mount the shared home directory on all other nodes. If NFS is already configured, ensure the necessary paths are exported and mounted correctly.
Skip this step if password-less SSH is already configured.
On head node:
# Generate SSH keyssh-keygen-ted25519# and press Enter multiple times to accept the default values# Copy to shared home directory (will automatically work on all nodes due to shared home directory)cat~/.ssh/id_ed25519.pub>>~/.ssh/authorized_keys
On all nodes, edit Enroot config for shared container file system (assumes Bash shell):
# run once for all nodes (including head node)IP=<IP>
# You may edit /etc/enroot/enroot.conf directly, but the following idempotent commands are recommended for consistency# Set ENROOT_DATA_PATH to /opt/enroot/datassh$IP"sudo grep -q '^ENROOT_DATA_PATH[[:space:]]\+/opt/enroot/data\$' /etc/enroot/enroot.conf || sudo sed -i '/^#ENROOT_DATA_PATH[[:space:]]\+\\\${XDG_DATA_HOME}\/enroot\$/a ENROOT_DATA_PATH /opt/enroot/data' /etc/enroot/enroot.conf"# Set ENROOT_MOUNT_HOME to yesssh$IP"sudo grep -q '^ENROOT_MOUNT_HOME[[:space:]]\+yes\$' /etc/enroot/enroot.conf || sudo sed -i '/^#ENROOT_MOUNT_HOME[[:space:]]\+no\$/a ENROOT_MOUNT_HOME yes' /etc/enroot/enroot.conf"
On head node, create data/workspace directory and add Enroot hook for OpenMPI:
On head node, create container with current username as prefix (the created container will be visible on all nodes due to the ENROOT_DATA_PATH setting we set earlier):
cd/mnt/home/${USER}/enroot/sqsh
enrootcreate--name${USER}-hpc-benchmarks-25-04nvidia+hpc-benchmarks+25.04.sqsh
ls/opt/enroot/data/${USER}-hpc-benchmarks-25-04
enrootlist
# Single node MPI quick testenrootstart${USER}-hpc-benchmarks-25-04mpirunhostname
Create a workspace directory, and store the hostfile there (for multi-node tasks, assuming all nodes have 8 GPUs):
Run multi-node quick test (assuming 2 nodes with 8 GPUs each):
enrootstart--rw--mount/opt/enroot/workspace/${USER}:/app${USER}-hpc-benchmarks-25-04mpirun-np16--hostfile/app/hosts.txthostname
# should see 8 hostnames for each node
Note: The command prefix before the container name (i.e., enroot start --rw --mount /opt/enroot/workspace/${USER}:/app) must match exactly what is set in the /etc/enroot/hooks.d/ompi.sh hook. Do not modify this part of the command, or the multi-node launch will not work correctly. You can change everything after the container name (e.g., mpirun ...) though. In addition, it is highly recommended to use absolute paths in the command.
enrootstart--rw--mount/opt/enroot/workspace/${USER}:/app${USER}-hpc-benchmarks-25-04
# in the containercphpl-linux-x86_64/sample-dat/HPL-H200-8GPUs.dat/app/
cphpl-linux-x86_64/sample-dat/HPL-H200-16GPUs.dat/app/
# Ctrl+D to exit the container
The result may not be optimal. You may tune the dat file, mpirun flags, and environment variables according to your machine for better HPL performance.
The result may not be optimal. You may tune the dat file, mpirun flags, and environment variables according to your machine for better HPL performance.
enrootstart--rw--mount/opt/enroot/workspace/${USER}:/app${USER}-hpc-benchmarks-25-04
# in the containercathpl-mxp-linux-x86_64/sample-slurm/hpl-mxp-enroot-1N.sub
cathpl-mxp-linux-x86_64/sample-slurm/hpl-mxp-enroot-2N.sub
# Ctrl+D to exit the container
So basically all software other than those listed in the Sample Environment is included in the container.
For sanity check:
enrootstart--rw--mount/opt/enroot/workspace/${USER}:/app${USER}-hpc-benchmarks-25-04
# in the containerucx_info-v
ompi_info|grep"MPI extensions"# ...# Ctrl+D to exit the container
You can see that both UCX and OpenMPI are built with CUDA support, even though you may not have installed UCX, OpenMPI, or even CUDA on the host OS.
To the best of my knowledge, this Enroot multi-node setup (or hack) is first introduced by @3XX0 in this issue.
Aside from normal single-node Enroot setup, there are four major points in Multi-node Setup:
Setting ENROOT_DATA_PATH to a NFS shared directory in /etc/enroot/enroot.conf.
This path is used to store the container file system (unpacked by enroot create). Setting it to a NFS shared directory ensures that the container file system is visible (by enroot list) on all nodes once being created. Without this option, user need to manually run enroot create on each node, which is tedious and error-prone. Executing enroot remove will delete the container file system from this path. (Reference)
Setting ENROOT_MOUNT_HOME to yes in /etc/enroot/enroot.conf.
Mounting the home directory allows the container to access the ~/.ssh folder. This is necessary for MPI (mpirun) to automatically use password-less SSH authentication to launch orted processes on all nodes. (Reference)
Setting OMPI_MCA_orte_launch_agent to enroot start ... orted.
Setting the OMPI_MCA_orte_launch_agent environment variable is a common trick to make mpirun launch the orted process within a (Enroot/Singularity) container. Basically it tells mpirun to run enroot start ... orted instead of running orted directly.
Adding a (executable) hook for OpenMPI in /etc/enroot/hooks.d/ompi.sh.
This hook removes the need of manually setting OMPI_MCA_orte_launch_agent environment variable every time you run a task via enroot start. In our case, without this hook will require running the following everytime:
Running mpirun ... enroot start ... may prevent intra-node optimizations, resulting in worse performance. In addition, using OpenMPI in the Enroot container makes life easier, as we don't even need to install OpenMPI on any node.