Skip to content

Tutorials

NVIDIA Resources

j3soon/tutorial

NVIDIA Resources¶

For the topics you wish to explore further, start by consulting the official documentation, such as the CUDA C++ Programming Guide if you're aiming to become a "CUDA Ninja". However, official documentation can sometimes be challenging to understand. In such cases, searching for "NVIDIA Blog" and "GTC Talks" on relevant topics through Google often leads to excellent presentations and discussions on the subject. Also make sure to prioritize blogs and talks from recent years, as they are more likely to cover the latest technologies.

The following are the reference materials for further study.

NVIDIA's Latest Hardware¶

GB200 NVL72
- [Blog] NVIDIA Contributes NVIDIA GB200 NVL72 Designs to Open Compute Project
- [Blog] NVIDIA Ethernet Networking Accelerates World’s Largest AI Supercomputer, Built by xAI
Grace Hopper & NVLink-C2C & Memory Coherent Architecture
GPUDirect Storage & RDMA
- NVIDIA GPUDirect
DGX SuperPOD
- NVIDIA DGX SuperPOD: Next Generation Scalable Infrastructure for AI Leadership Reference Architecture Featuring NVIDIA DGX H100
- NVIDIA DGX SuperPOD FAQ

CUDA & Optimization¶

Advanced CUDA¶

Fat Binary & PTX (Parallel Thread Execution) & SASS (Streaming Assembly)
Cooperative Groups (CG)
- [Blog] Cooperative Groups: Flexible CUDA Thread Programming
- 8. Cooperative Groups | CUDA C++ Programming Guide
LDGSTS

Require Ampere architecture or later.
Hopper Architecture & Thread Block Cluster & Distributed Shared Memory (DSMEM) & Tensor Memory Accelerator (TMA)

Require Hopper architecture or later.
CUDA Graphs

NVIDIA HPC SDK¶

ISO C++/Fortran, OpenACC/OpenMP, CUDA
- [GitHub] N-ways to GPU programming
Python
NVIDIA CUDA-X Libraries
- [Blog] Accelerating GPU Applications with NVIDIA Math Libraries
CUDA C++ Core Libraries and stdexec
NVSHMEM
- [Blog] Scaling Scientific Computing with NVSHMEM
NVIDIA Collective Communications Library (NCCL)
- [Blog] Fast Multi-GPU collectives with NCCL
- [Blog] Doubling all2all Performance with NVIDIA Collective Communication Library 2.12
NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
- [Blog] Advancing Performance with NVIDIA SHARP In-Network Computing
- [YouTube] In-Network Computing with NVIDIA SHARP

Dev Tools¶

Compute Sanitizer
- [Blog] Efficient CUDA Debugging: Memory Initialization and Thread Synchronization with NVIDIA Compute Sanitizer
- Compute Sanitizer
Nsight Systems/Compute/Graphics & Nsight Visual Studio Code Edition
- [GitHub] NVIDIA/nsight-vscode-edition
- and more.
Containerization

Higher-Level SDKs¶

NVIDIA NeMo
- [GitHub] NVIDIA/NeMo

NVIDIA Omniverse & Isaac
- [GitHub] NVIDIA Isaac Summary

NVIDIA PhysicsNeMo (formerly NVIDIA Modulus)
- [GitHub] NVIDIA/PhysicsNeMo

Other GTC Talks¶

Epilogue¶

Last updated on 2024-12-08.