This webpage is directly generated from the README of j3soon/go-nvml-mig-create-instance. Please refer to the repository for additional information such as example Go code.
Unofficial example on creating Multi-Instance GPU (MIG) instances with NVIDIA Management Library (NVML) Go bindings.
dockerrun--rm-it--gpusall\-v$(pwd):/workspace\--cap-add=SYS_ADMIN\-eNVIDIA_MIG_CONFIG_DEVICES=all\golang
# in the containercd/workspace
Note: --runtime=nvidia, -e NVIDIA_VISIBLE_DEVICES=all, and -e NVIDIA_DRIVER_CAPABILITIES=all may be required depending on your environment and use cases.
Alternatively, you can install Go on your host machine and skip this step.
Run the example and observe results:
gorunmain.go
# List the available CIs and GIsnvidia-smimig-lgi;nvidia-smimig-lci;# Destroy all the CIs and GIsnvidia-smimig-dci;nvidia-smimig-dgi;
This should also work on A100/H100/H200 by substituting the MIG profile to a supported one.
We can see that it takes the reference of GpuInstanceProfileInfo as the argument. Take a look at its source (ref: cpp, go, src):
/**
* GPU instance profile information.
*/
typedef struct nvmlGpuInstanceProfileInfo_st
{
unsigned int id; //!< Unique profile ID within the device
unsigned int isP2pSupported; //!< Peer-to-Peer support
unsigned int sliceCount; //!< GPU Slice count
unsigned int instanceCount; //!< GPU instance count
unsigned int multiprocessorCount; //!< Streaming Multiprocessor count
unsigned int copyEngineCount; //!< Copy Engine count
unsigned int decoderCount; //!< Decoder Engine count
unsigned int encoderCount; //!< Encoder Engine count
unsigned int jpegCount; //!< JPEG Engine count
unsigned int ofaCount; //!< OFA Engine count
unsigned long long memorySizeMB; //!< Memory size in MBytes
} nvmlGpuInstanceProfileInfo_t;
We suspect that these information isn't meant to be filled by hand. We should check the source for using the GetGpuInstanceProfileInfo API to retrieve these information (ref: cpp, go, src):
/**
* Get GPU instance profile information
*
* Information provided by this API is immutable throughout the lifetime of a MIG mode.
*
* For Ampere &tm; or newer fully supported devices.
* Supported on Linux only.
*
* @param device The identifier of the target device
* @param profile One of the NVML_GPU_INSTANCE_PROFILE_*
* @param info Returns detailed profile information
*
* @return
* - \ref NVML_SUCCESS Upon success
* - \ref NVML_ERROR_UNINITIALIZED If library has not been successfully initialized
* - \ref NVML_ERROR_INVALID_ARGUMENT If \a device, \a profile or \a info are invalid
* - \ref NVML_ERROR_NOT_SUPPORTED If \a device doesn't support MIG or \a profile isn't supported
* - \ref NVML_ERROR_NO_PERMISSION If user doesn't have permission to perform the operation
*/
nvmlReturn_t DECLDIR nvmlDeviceGetGpuInstanceProfileInfo(nvmlDevice_t device, unsigned int profile,
nvmlGpuInstanceProfileInfo_t *info);
Seems like we need to pass a NVML_GPU_INSTANCE_PROFILE_* as the profile argument. Let's view the source (ref: cpp, go, src):
/**
* GPU instance profiles.
*
* These macros should be passed to \ref nvmlDeviceGetGpuInstanceProfileInfo to retrieve the
* detailed information about a GPU instance such as profile ID, engine counts.
*/
#define NVML_GPU_INSTANCE_PROFILE_1_SLICE 0x0
#define NVML_GPU_INSTANCE_PROFILE_2_SLICE 0x1
#define NVML_GPU_INSTANCE_PROFILE_3_SLICE 0x2
#define NVML_GPU_INSTANCE_PROFILE_4_SLICE 0x3
#define NVML_GPU_INSTANCE_PROFILE_7_SLICE 0x4
#define NVML_GPU_INSTANCE_PROFILE_8_SLICE 0x5
#define NVML_GPU_INSTANCE_PROFILE_6_SLICE 0x6
#define NVML_GPU_INSTANCE_PROFILE_1_SLICE_REV1 0x7
#define NVML_GPU_INSTANCE_PROFILE_2_SLICE_REV1 0x8
#define NVML_GPU_INSTANCE_PROFILE_1_SLICE_REV2 0x9
#define NVML_GPU_INSTANCE_PROFILE_COUNT 0xA
Please note that the NVML_GPU_INSTANCE_PROFILE_COUNT here is only a trick to get the number of profiles. It is not meant to be used as a profile.
We can see that our hypothesis is correct based on the comments. We use NVML_GPU_INSTANCE_PROFILE_4_SLICE in our example.
Similar to the case in creating GIs, we'll need a ComputeInstanceProfileInfo. Let's look at its source (ref: cpp, go, src):
/**
* Compute instance profile information.
*/
typedef struct nvmlComputeInstanceProfileInfo_st
{
unsigned int id; //!< Unique profile ID within the GPU instance
unsigned int sliceCount; //!< GPU Slice count
unsigned int instanceCount; //!< Compute instance count
unsigned int multiprocessorCount; //!< Streaming Multiprocessor count
unsigned int sharedCopyEngineCount; //!< Shared Copy Engine count
unsigned int sharedDecoderCount; //!< Shared Decoder Engine count
unsigned int sharedEncoderCount; //!< Shared Encoder Engine count
unsigned int sharedJpegCount; //!< Shared JPEG Engine count
unsigned int sharedOfaCount; //!< Shared OFA Engine count
} nvmlComputeInstanceProfileInfo_t;
Similarly, let's check the source for GetComputeInstanceProfileInfo API (ref: cpp, go, src):
/**
* Get compute instance profile information.
*
* Information provided by this API is immutable throughout the lifetime of a MIG mode.
*
* For Ampere &tm; or newer fully supported devices.
* Supported on Linux only.
*
* @param gpuInstance The identifier of the target GPU instance
* @param profile One of the NVML_COMPUTE_INSTANCE_PROFILE_*
* @param engProfile One of the NVML_COMPUTE_INSTANCE_ENGINE_PROFILE_*
* @param info Returns detailed profile information
*
* @return
* - \ref NVML_SUCCESS Upon success
* - \ref NVML_ERROR_UNINITIALIZED If library has not been successfully initialized
* - \ref NVML_ERROR_INVALID_ARGUMENT If \a gpuInstance, \a profile, \a engProfile or \a info are invalid
* - \ref NVML_ERROR_NOT_SUPPORTED If \a profile isn't supported
* - \ref NVML_ERROR_NO_PERMISSION If user doesn't have permission to perform the operation
*/
nvmlReturn_t DECLDIR nvmlGpuInstanceGetComputeInstanceProfileInfo(nvmlGpuInstance_t gpuInstance, unsigned int profile,
unsigned int engProfile,
nvmlComputeInstanceProfileInfo_t *info);
We should pass a NVML_COMPUTE_INSTANCE_PROFILE_* as the first (profile) argument. Let's view the source (ref: cpp, go, src):
/**
* Compute instance profiles.
*
* These macros should be passed to \ref nvmlGpuInstanceGetComputeInstanceProfileInfo to retrieve the
* detailed information about a compute instance such as profile ID, engine counts
*/
#define NVML_COMPUTE_INSTANCE_PROFILE_1_SLICE 0x0
#define NVML_COMPUTE_INSTANCE_PROFILE_2_SLICE 0x1
#define NVML_COMPUTE_INSTANCE_PROFILE_3_SLICE 0x2
#define NVML_COMPUTE_INSTANCE_PROFILE_4_SLICE 0x3
#define NVML_COMPUTE_INSTANCE_PROFILE_7_SLICE 0x4
#define NVML_COMPUTE_INSTANCE_PROFILE_8_SLICE 0x5
#define NVML_COMPUTE_INSTANCE_PROFILE_6_SLICE 0x6
#define NVML_COMPUTE_INSTANCE_PROFILE_1_SLICE_REV1 0x7
#define NVML_COMPUTE_INSTANCE_PROFILE_COUNT 0x8
We use COMPUTE_INSTANCE_PROFILE_2_SLICE for the first argument in our example. As for the second argument (engProfile), let's also look at the source (ref: cpp, go, src):
#define NVML_COMPUTE_INSTANCE_ENGINE_PROFILE_SHARED 0x0 //!< All the engines except multiprocessors would be shared
#define NVML_COMPUTE_INSTANCE_ENGINE_PROFILE_COUNT 0x1
We can only use COMPUTE_INSTANCE_ENGINE_PROFILE_SHARED for the second argument in our example.
Although we currently only have the ability to share GPU engines (Copy Engine (CE), NVENC, NVDEC, NVJPEG, Optical Flow Accelerator (OFA), etc.) between CIs within the same GI, this struct may be extended to support isolating these engines for each CI within the same GI in the future.