WebGPU: add memory aliasing for intermediate tensor buffers#19305
WebGPU: add memory aliasing for intermediate tensor buffers#19305digantdesai wants to merge 6 commits intomainfrom
Conversation
The export pipeline already runs a greedy memory planning pass that assigns mem_obj_id to tensors with non-overlapping lifetimes, but the WebGPU runtime was ignoring it and allocating a dedicated WGPUBuffer per tensor. Read mem_obj_id from the flatbuffer during graph build. Tensors sharing the same mem_obj_id now share a single WGPUBuffer sized to the largest user. Constants and tensors without a mem_obj_id still get dedicated buffers. Adds a chained-add native test (z=x+y; z=z+x; z=z+y) that verifies both correctness and that memory aliasing produces savings (~20% for this model). Co-authored with Claude.
Replace the silent `nbytes > 0 ? nbytes : 4` fallback pattern with ET_CHECK_MSG assertions. If a zero-byte tensor reaches buffer creation, we want to know immediately rather than silently creating a dummy 4-byte buffer that masks the issue. Co-authored with Claude.
Invert the condition to eliminate the empty if-body with a comment. Co-authored with Claude.
Export and run the chained-add memory aliasing test in test_build_webgpu.sh so it runs automatically instead of requiring a manual WEBGPU_TEST_CHAINED_MODEL env var. Co-authored with Claude.
Longer chain produces more intermediates, giving the memory planner more opportunity to alias buffers. Expected output: 3x + 3y. Co-authored with Claude.
Fix: if a constant tensor has mem_obj_id >= 0, force it to -1 so the dedicated buffer path and the destructor stay consistent. Previously the buffer would leak and get overwritten by the shared buffer pass. Also make the chained-add test actually fail when aliasing is absent instead of just printing informational messages. Co-authored with Claude.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19305
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 2 Cancelled JobsAs of commit 55ceea4 with merge base cf01617 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@claude review this PR |
This PR needs a
|
|
Claude finished @digantdesai's task in 1m 37s —— View job Review: WebGPU memory aliasing for intermediate tensor buffers
Overall: Clean, well-structured implementation. The approach of using ETVK's Issues1. Global mutable state for memory stats ( static WebGPUMemoryStats s_last_memory_stats;This is a process-wide global that gets overwritten on every 2. Destructor correctness — shared buffer double-release guard ( for (size_t i = 0; i < tensors_.size(); i++) {
if (tensors_[i].buffer &&
(i >= tensor_mem_obj_ids_.size() || tensor_mem_obj_ids_[i] < 0)) {
wgpuBufferRelease(tensors_[i].buffer);
}
}The guard 3. size_t total_bytes() const {
return shared_buffer_bytes + unshared_tensor_buffer_bytes +
staging_buffer_bytes + uniform_buffer_bytes;
}This is correct (it equals Minor Suggestions4. Test assertion could verify specific aliasing count ( The test asserts 5. The export functions are defined but only callable programmatically or via the test script's What looks good
Verdict: LGTM with minor suggestions. The core aliasing logic is correct and well-tested. The global stats accessor is the only thing I'd recommend addressing before merge (even if just renaming to clarify it's test-only). |
USE ETVK's mem_obj_id for the WebGPU runtime to implement memory aliasing