AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms | Nicusan | JuliaCon Global 2025

Автор: The Julia Programming Language

Загружено: 2025-12-17

Просмотров: 207

Описание:

AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms by Andrei-Leonard Nicusan

PreTalx: https://pretalx.com/juliacon-2025/tal...

In this talk I present AcceleratedKernels.jl, a library that provides a unified interface for writing parallel algorithms in Julia. The library is built on KernelAbstractions.jl, which allows high-level Julia code to be compiled into efficient kernels for a range of hardware. AcceleratedKernels.jl supports both multithreaded CPUs and GPUs from several vendors (CUDA, ROCm, oneAPI, Metal) using a single codebase. This design removes the need to write separate code for each target, making it easier for developers to write and maintain high-performance applications.

Key points in the talk include:

Unified Codebase: I describe how the same Julia user-code can be used to produce high-performance kernels for different hardware.
Performance Benchmarks: I will present benchmark results that compare AcceleratedKernels.jl with traditional implementations. Benchmarks for operations like sorting, mapreduce, and arithmetic computations show that the performance of kernels generated by AcceleratedKernels.jl is comparable to that of code written in C with OpenMP (on CPUs) and vendor libraries like Nvidia Thrust (on GPUs). These tests have been run on different architectures, from desktop CPUs to data-center GPUs, and the results demonstrate competitive speed and scalability.
Developer Experience: I will show how to write custom kernels in Julia with minimal changes to existing code - with the aim of writing a user application / library that transparently works across architectures, without special-cased kernels for GPUs or explicit multithreading. This also allows composable CPU-GPU co-processing across Julia libraries.
Real-World Applications: I will discuss several use cases from scientific computing and industry where the ability to run the same code on different hardware is valuable. Examples include multi-node data sorting and numerical simulations - in particular Lagrangian simulations such as the Discrete Element Method, Molecular Dynamics, or N-Body Simulations - where parallel execution is critical.
Future Work: I will outline planned improvements for AcceleratedKernels.jl, such as adding automated tuning for algorithm parameters, extending the range of available algorithms, and supporting emerging hardware platforms. I also discuss how contributions from the community can help shape the future of the library.

AcceleratedKernels.jl was created to simplify parallel programming by reducing the need for hardware-specific code. Instead of writing separate kernels for each target, developers write a single function that runs across all supported devices.

The talk will also include a live demonstration. I will write a simple kernel in Julia and show how it runs on both a CPU and a GPU without any modifications. I will discuss some challenges encountered during development, such as algorithm and interface design choices.

Finally, I will place AcceleratedKernels.jl within the broader Julia ecosystem and show its composability across separate libraries.

In summary, this session provides a detailed overview of AcceleratedKernels.jl, covering its design, performance, and practical applications. Attendees will learn how to write portable parallel code in Julia using a single, unified API and understand the trade-offs involved in cross-architecture programming. This talk is aimed at developers, researchers, and anyone interested in high-performance computing with Julia, and it offers practical insights into writing code that runs efficiently on modern hardware.

AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms | Nicusan | JuliaCon Global 2025

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Accelerating Machine Learning in Julia using Lux & Reactant | Pal | JuliaCon Global 2025

Accelerating Machine Learning in Julia using Lux & Reactant | Pal | JuliaCon Global 2025

EvoTrees.jl: Efficient Boosted Trees on CPUs & GPUs in Julia | Desgagne-Bouchard

EvoTrees.jl: Efficient Boosted Trees on CPUs & GPUs in Julia | Desgagne-Bouchard

Julia in Academia: Textbooks, Stanford Courses, and the Future | Moss | JuliaCon Global 2025

Julia in Academia: Textbooks, Stanford Courses, and the Future | Moss | JuliaCon Global 2025

Как быстро собирать embedded-код и заливать его на любую dev-плату • C • Live coding

Как быстро собирать embedded-код и заливать его на любую dev-плату • C • Live coding

Julia Symbolics лучше, чем Python Sympy

Julia Symbolics лучше, чем Python Sympy

Как взломать любое программное обеспечение

Как взломать любое программное обеспечение

РЕФЛЕКСИЯ В С++26: Неужели дождались?

РЕФЛЕКСИЯ В С++26: Неужели дождались?

Domain-Driven Design | Просто о сложном

Domain-Driven Design | Просто о сложном

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

i didn't expect to see this...

i didn't expect to see this...

Optimizing Gaussian Basis Sets with Automatic Differentiation | Madureira | JuliaCon Global 2025

Optimizing Gaussian Basis Sets with Automatic Differentiation | Madureira | JuliaCon Global 2025

Interfaces for Streaming and Chunked Compression | Zimmerberg | JuliaCon Global 2025

Interfaces for Streaming and Chunked Compression | Zimmerberg | JuliaCon Global 2025

Екатерина Шульман про нехватку денег в бюджете, отъём вкладов и конфискацию имущества

Екатерина Шульман про нехватку денег в бюджете, отъём вкладов и конфискацию имущества

Building Data Visualisations in Python in Minutes • Kris Jenkins • GOTO 2025

Building Data Visualisations in Python in Minutes • Kris Jenkins • GOTO 2025

Automating Testing and Documentation Generation for Dyad | Tiller | JuliaCon Global 2025

Automating Testing and Documentation Generation for Dyad | Tiller | JuliaCon Global 2025

Цены рухнули на 30%: риэлторы бегут с рынка, новостройки пустуют

Цены рухнули на 30%: риэлторы бегут с рынка, новостройки пустуют

4. Assembly Language & Computer Architecture

4. Assembly Language & Computer Architecture

Делаем графические процессоры по-настоящему быстрыми: глубокий анализ эффективности тренировок

Делаем графические процессоры по-настоящему быстрыми: глубокий анализ эффективности тренировок

Математическая тревожность, нейросети, задачи тысячелетия / Андрей Коняев

Математическая тревожность, нейросети, задачи тысячелетия / Андрей Коняев

Как подключить свои документы к LLM — полный разбор RAG

Как подключить свои документы к LLM — полный разбор RAG