vulkan: fix UMA performance by preferring cached host memory and handling non…#23762
vulkan: fix UMA performance by preferring cached host memory and handling non…#23762winstonma wants to merge 1 commit into
Conversation
…-coherent flushing
|
It should supersede #22930. The result from the benchmark script shows that the Coherent flag would create write-combining memory which hurt the read speed. |
|
This needs to be justified by real model performance, not a synthetic test. We shouldn't be reading from write combined memory. |
|
That doesn't change anything. As far as I know, cached memory is supposed to be used for outputs, where the GPU calculates something that gets transferred to CPU afterwards. It is possible using it everywhere reduces GPU access performance in general. Additionally, you just tested a single iGPU. Even on one that doesn't even have any non-coherent cached memory type. |
|
Thanks, that makes sense. I agree the current change could be too broad for the evidence I have so far. The Right now the changes the default buffer preference for UMA devices in a way that could help my small-write case while still risking regressions for general GPU access or for other devices and workloads that I couldn't tested. I totally agree that the change needs validation on more hardware. In addition I’m not trying to claim the response above is the generally correct policy for all Vulkan UMA devices. I’m offering this as a counterpoint to the current direction, based on one observed workload/device, to show there may be cases where the opposite tradeoff is preferable. If the |

Overview
The original code was potentially allocating write-combining (WC) memory for device buffers, which is fast to write but painfully slow to read back.
In addition, in ggml_vk_create_buffer, memory_property_flags was being set to the requested flags rather than what was actually allocated. This PR sets memory_property_flags from what the driver actually gave.
Additional information
I use this benchmark script to test
ggml_backend_tensor_setandggml_backend_tensor_geton my AMD UMA device.Before the patch
Patch is applied:
Requirements