This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.
-
Updated
Jul 29, 2019 - Python
This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.
GPU-native agent-swarm orchestration for the NVIDIA AI stack — NeMo, NIM, Triton, DCGM, NGC, NIXL, OpenShell. Spawn GPU-pinned agent teams across DGX/HGX nodes with NVLink-aware scheduling, task DAGs, adaptive scheduling, and full observability.
Open hardware desktop AI node: 4× Tesla V100, 128GB HBM2, PCIe/NVLink topology and V-Core liquid/air cooling.
A hybrid testbed for evaluating top open-source LLMs (like gpt-oss-20b and Llama 3.3) on local, cloud GPUs, and AWS Inferentia2/Trainium instances, focusing on vLLM optimization, capacity management, kernel bypass, hardware-software co-design, as well as supporting infrastructure such as NCCL, RDMA, NVMeoF.
NCCL collective benchmarks on an 8×H100 NVSwitch host — busbw vs link budget, NVLS/Ring/Tree, small-message latency floors (eager vs CUDA Graph vs symmetric memory), and the TP-decode comms ceiling they imply. Includes a quiet-box rerun methodology for attribution.
Add a description, image, and links to the nvlink topic page so that developers can more easily learn about it.
To associate your repository with the nvlink topic, visit your repo's landing page and select "manage topics."