gpu: transpose patches faster by a10y · Pull Request #6829 · vortex-data/vortex

a10y · 2026-03-06T16:50:04Z

Eliminate all of the wasteful allocating and deallocating, replace it with a 3-pass algorithm that reuses the buffers.

Drops the CPU transpose time by a factor of 5-10 in my benchmarks on the other branch

Tested using existing test suite

Eliminate all of the wasteful allocation, replace it with a 3-pass algorithm that inserts indices/values directly into their final positions. Signed-off-by: Andrew Duffy <andrew@a10y.dev>

codspeed-hq · 2026-03-06T16:58:43Z

Merging this PR will improve performance by 23.62%

⚡ 3 improved benchmarks
✅ 391 untouched benchmarks
⏩ 2052 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`take_map[(0.1, 1.0)]`	4.2 ms	3.5 ms	+20.8%
⚡	Simulation	`take_map[(0.1, 0.5)]`	2.6 ms	2.1 ms	+23.62%
⚡	Simulation	`take_map[(0.1, 0.1)]`	1,007.5 µs	908.7 µs	+10.88%

_{Comparing transpose-patches-fix (c9e1e79) with develop (5d6a3c8)²}

2052 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on develop (761c404) during the generation of this report, so 5d6a3c8 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

a10y added the changelog/performance A performance improvement label Mar 6, 2026

gpu: transpose patches 4x faster

c9e1e79

Eliminate all of the wasteful allocation, replace it with a 3-pass algorithm that inserts indices/values directly into their final positions. Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the transpose-patches-fix branch from 4d3533b to c9e1e79 Compare March 6, 2026 16:52

a10y marked this pull request as ready for review March 6, 2026 17:18

a10y requested a review from 0ax1 March 6, 2026 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu: transpose patches faster#6829

gpu: transpose patches faster#6829
a10y wants to merge 1 commit intodevelopfrom
transpose-patches-fix

a10y commented Mar 6, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

a10y commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 6, 2026

Merging this PR will improve performance by 23.62%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

a10y commented Mar 6, 2026 •

edited

Loading