GPU Pipeline Optimization Explained | Async UDFs, CUDA Streams & Pinned Memory
Автор: Daft Engine
Загружено: 2025-09-10
Просмотров: 625
🖥️ Whiteboard Deep Dive into GPU Pipeline Optimization
In this deep dive, Srinu Lade / srinivas-lade (Software Engineer working on Daft’s execution engine) breaks down how to optimize GPU pipelines for ML and multimodal data processing. Using architectural diagrams, he explains why sequential CPU→GPU execution creates bottlenecks and how techniques like async UDFs, CUDA streams, and pinned memory unlock parallelism.
What you’ll learn:
How GPU workloads flow: host↔device transfers, VRAM, kernel execution
Why Python UDFs are a bottleneck — and how async execution improves throughput
Using CUDA streams to overlap transfers and compute for better utilization
How GPU internals (H2D/D2H engines + compute units) enable pipeline parallelism
Reducing OS overhead with pinned memory reuse in PyTorch workflows
How Daft abstracts these optimizations into a high-level API for data/ML engineers
Our aim is to abstract away these low-level complexities and provide a high-level API in Daft that delivers optimized GPU execution out-of-the-box for ML workloads.
—
Daft. Simple and reliable data processing for any modality and scale.
Explore → https://daft.ai/
Build → https://docs.daft.ai/
Connect → https://www.daft.ai/slack
Contribute → https://github.com/Eventual-Inc/Daft
Learn → https://daft.ai/blog
pip install daft
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: