vulkan: add pipeline barriers for memcpy read/write operations#23770
vulkan: add pipeline barriers for memcpy read/write operations#237700cc4m wants to merge 2 commits into
Conversation
| std::lock_guard<std::recursive_mutex> guard(src->device->mutex); | ||
| vk_context subctx = ggml_vk_create_temporary_context(src->device->compute_queue.cmd_pool); | ||
| ggml_vk_ctx_begin(src->device, subctx); | ||
| subctx->s->buffer->buf.pipelineBarrier( |
There was a problem hiding this comment.
Can we do this one conditionally like you have with host_write_dirty?
There was a problem hiding this comment.
How would that work? All write operations and all shaders write to buffers, so unless we track it on a buffer level in some way, it would always trigger, would it not?
There was a problem hiding this comment.
Just want to avoid back to back reads causing extra barriers and submits. So we could mark it dirty in graph_compute and then reset it here after we do the barrier.
There was a problem hiding this comment.
I guess I'm not totally sure that would be reliable e.g. in a multithreaded situation. So maybe this is fine. But I think we also need the barrier in the non-UMA path?
There was a problem hiding this comment.
I think a simple dirty flag would be difficult to get right with edge cases. Something like an atomic submission/write counter might work, but add some complexity. To unblock #22930, I think it's best to merge this solution and improve it as a follow-up PR.
There was a problem hiding this comment.
Do we need the same barrier in the else path?
There was a problem hiding this comment.
Sorry, I forgot to respond to that. I thought we only need pipeline barriers for "mixed access" gpu buffers, in this case due to the memcpy reading from the GPU buffer that shaders wrote to. If we use GPU commands to copy from there into a staging buffer, that's still GPU access. I guess that leaves whether host buffers need pipeline barriers, but I thought not.
There was a problem hiding this comment.
We may get lucky and transfer engines don't need the barrier in practice, but I think in terms of spec behavior we're supposed to do it. But if we've never seen any corruption in that path, I guess I'm OK with omitting it for now.
f0e0f66 to
0a1ebc5
Compare
Overview
Insert Vulkan pipeline barriers before/after memcpy operations for host-visible memory. This resolves the Intel read issue in #22930, and I also added the opposite direction barriers for writes. I am not sure if this is the best way. @jeffbolznv @rillomas
Requirements