niagara: Simplifying synchronization
Автор: Arseny Kapoulkine
Загружено: 2025-08-23
Просмотров: 1046
In this stream we will work on simplifying our synchronization code following recent Vulkan advances, such as VK_KHR_unified_image_layouts.
Post-stream updates: This may have been a bad idea!
1. On closer inspection, while the performance of the code didn't change significantly after our updates, we do have more L2 invalidations now (on both amdvlk & radv) where they didn't happen before. I'm not sure if this is inherent to the technique, or something that can be improved in the drivers - certainly hoping for the latter as invalidating L2 cache between dispatches is not always free for performance. This seems to be something that can be improved on RDNA4+ at least, but RDNA2-3 has rare cases where individual images may have L2 coherence problems, and a global barrier is not specific enough to guard against that.
2. After profiling the resulting code on NVidia (5070 Ti), it looks like this simplification costs us ~40us per frame :( The cost appears to be flat - resolution-indepdendent - and actually, surprisingly, mostly centralized in the buffer barriers that we've upgraded. It doesn't appear like a single barrier is responsible for this, and more like every set of buffer barriers we replaced with a global memory barrier costs us ~5us, which adds up over the course of the frame. While 40us is not the end of the world, it's also not nothing, and since the cost scales with the frame complexity, this could cost more for more complex renderers, which may invalidate the entire idea unless this can be fixed in future drivers.
3. (not performance related) The syncval issue we were looking into at the very end was in fact a bug in the code -- the code specified barriers for the TLAS buffer, but the build/update process requires two buffers, TLAS buffer and scratch buffer; validation layers were correctly flagging that the barrier on scratch buffer was absent. This is good because this is exactly the type of scenario our new strategy is designed to make a non-issue, hence the new code just worked.
Timestamps by @cacheman
00:00:00 Intro
00:10:00 Synchronization overview
00:21:30 VK_KHR_unified_image_layouts
00:34:20 Writing code 1
00:52:10 Chat 1
00:57:30 Writing code 2 (converting image barriers)
01:15:54 Benchmark 1
01:31:11 Discussion
01:40:40 Commit 1 (stage barriers)
01:43:20 Discussion: Image Layout
01:51:30 Writing code 3 (image layout)
01:57:00 Benchmark 2
02:00:00 Writing code 4 (depth buffer layout)
02:06:30 Commit 2 (layout conversion)
02:07:40 Writing code 5 (barrier pass)
02:19:18 Benchmark 3
02:21:14 Writing code 6 (barrier pass cont'd)
02:28:00 Benchmark 4
02:30:12 Commit 3 (barriers)
02:32:46 Chat 2
02:37:28 Investigation: Acceleration structure barrier
03:03:30 Commit 4
03:04:48 Recap
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: