niagara: Simplifying synchronization

Автор: Arseny Kapoulkine

Загружено: 2025-08-23

Просмотров: 1046

Описание:

In this stream we will work on simplifying our synchronization code following recent Vulkan advances, such as VK_KHR_unified_image_layouts.

Post-stream updates: This may have been a bad idea!

1. On closer inspection, while the performance of the code didn't change significantly after our updates, we do have more L2 invalidations now (on both amdvlk & radv) where they didn't happen before. I'm not sure if this is inherent to the technique, or something that can be improved in the drivers - certainly hoping for the latter as invalidating L2 cache between dispatches is not always free for performance. This seems to be something that can be improved on RDNA4+ at least, but RDNA2-3 has rare cases where individual images may have L2 coherence problems, and a global barrier is not specific enough to guard against that.

2. After profiling the resulting code on NVidia (5070 Ti), it looks like this simplification costs us ~40us per frame :( The cost appears to be flat - resolution-indepdendent - and actually, surprisingly, mostly centralized in the buffer barriers that we've upgraded. It doesn't appear like a single barrier is responsible for this, and more like every set of buffer barriers we replaced with a global memory barrier costs us ~5us, which adds up over the course of the frame. While 40us is not the end of the world, it's also not nothing, and since the cost scales with the frame complexity, this could cost more for more complex renderers, which may invalidate the entire idea unless this can be fixed in future drivers.

3. (not performance related) The syncval issue we were looking into at the very end was in fact a bug in the code -- the code specified barriers for the TLAS buffer, but the build/update process requires two buffers, TLAS buffer and scratch buffer; validation layers were correctly flagging that the barrier on scratch buffer was absent. This is good because this is exactly the type of scenario our new strategy is designed to make a non-issue, hence the new code just worked.

Timestamps by ‪@cacheman‬

00:00:00 Intro
00:10:00 Synchronization overview
00:21:30 VK_KHR_unified_image_layouts
00:34:20 Writing code 1
00:52:10 Chat 1
00:57:30 Writing code 2 (converting image barriers)
01:15:54 Benchmark 1
01:31:11 Discussion
01:40:40 Commit 1 (stage barriers)
01:43:20 Discussion: Image Layout
01:51:30 Writing code 3 (image layout)
01:57:00 Benchmark 2
02:00:00 Writing code 4 (depth buffer layout)
02:06:30 Commit 2 (layout conversion)
02:07:40 Writing code 5 (barrier pass)
02:19:18 Benchmark 3
02:21:14 Writing code 6 (barrier pass cont'd)
02:28:00 Benchmark 4
02:30:12 Commit 3 (barriers)
02:32:46 Chat 2
02:37:28 Investigation: Acceleration structure barrier
03:03:30 Commit 4
03:04:48 Recap

niagara: Simplifying synchronization

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

niagara: Cooking geometry

niagara: Cooking geometry

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Упрощаем React в 2 раза | CSS Боль

Упрощаем React в 2 раза | CSS Боль

Vulkanised 2024: Vulkan Synchronization Made Easy - Grigory Dzhavadyan

Vulkanised 2024: Vulkan Synchronization Made Easy - Grigory Dzhavadyan

Новогодний Стрим: разбор задач и прожарка AI-решений

Новогодний Стрим: разбор задач и прожарка AI-решений

Как устроен PHP 🐘: фундаментальное знание для инженеров

Как устроен PHP 🐘: фундаментальное знание для инженеров

Learning VULKAN by Rendering a GALAXY

Learning VULKAN by Rendering a GALAXY

Что обнаружено после взлома разработчика электронных повесток?

Что обнаружено после взлома разработчика электронных повесток?

Принц Персии: разбираем код гениальной игры, вытирая слезы счастья

Принц Персии: разбираем код гениальной игры, вытирая слезы счастья

Это должно было умереть, но стало стратегией

Это должно было умереть, но стало стратегией

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Английский в СССР знали плохо или хорошо?

Английский в СССР знали плохо или хорошо?

America’s New Gold Rush Is NOT Silicon

America’s New Gold Rush Is NOT Silicon

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Как в 1С не остаться на уровне новичка в 2026 году?

Как в 1С не остаться на уровне новичка в 2026 году?

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Структура файлов и каталогов в Linux

Структура файлов и каталогов в Linux

Линус Торвальдс в ярости из-за предложения RISC-V

Линус Торвальдс в ярости из-за предложения RISC-V

Зачем подключать конденсатор параллельно диоду? Вот почему!

Зачем подключать конденсатор параллельно диоду? Вот почему!

Как распутать DDR3 и не сойти с ума

Как распутать DDR3 и не сойти с ума