Lecture 87: Low Latency Communication Kernels with NVSHMEM
Автор: GPU MODE
Загружено: 2025-12-07
Просмотров: 554
Speaker: Prajwal Singhania
High-performance inference at scale is increasingly bottlenecked by communication, especially in decode-heavy LLM workloads where tensor parallelism dominates.
In this talk, we will introduce NVRAR - an NVSHMEM-based all-reduce tailored for inter-node settings.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: