Skip to content

Use memcpy in uninitialized_meow#6161

Open
AlexGuteniev wants to merge 3 commits intomicrosoft:mainfrom
AlexGuteniev:copyum
Open

Use memcpy in uninitialized_meow#6161
AlexGuteniev wants to merge 3 commits intomicrosoft:mainfrom
AlexGuteniev:copyum

Conversation

@AlexGuteniev
Copy link
Contributor

Resolves #6029

memcpy is the same as memmove when it is usual library call.

But intrinsic optimizations are possible for known size for memcpy but not for memmove.
These optimizations happen when size is known at compile time (compile time constant or propagated constant), and this size is small enough (the boundary is at about kilobyte order of magnitude).

Uninitialized algorithms require non-overlapped range: See [uninitialized.copy]/1, and the others. It would have been strange to have overlap between initialized memory and uninitialized memory.

There hasn't been _Copy_memcpy / _Copy_memcpy_n, so I've added them. They might be useful in other places, besides uninitialized algorithms, as some other algorithms have useful non-overlap guarantee, but for now lets focus on uninitialized algorithms.

Benchmark results show overall improvement, although for some larger amount of data that is 800 bytes it flips. Probably the compiler should have its internal threshold to always use the library memcpy set to a lower value. For very large amounts where the compiler does emit memcpy the results before and after the change do match.

Benchmark Before After Speedup
bm_uninitialized_copy<1, uint8_t, highly_aligned> 1.18 ns 1.16 ns 1.02
bm_uninitialized_copy<5, uint8_t, highly_aligned> 1.41 ns 1.17 ns 1.21
bm_uninitialized_copy<15, uint8_t, highly_aligned> 1.63 ns 1.42 ns 1.15
bm_uninitialized_copy<26, uint8_t, highly_aligned> 1.41 ns 1.19 ns 1.18
bm_uninitialized_copy<32, uint8_t, highly_aligned> 1.41 ns 1.18 ns 1.19
bm_uninitialized_copy<38, uint8_t, highly_aligned> 2.32 ns 1.40 ns 1.66
bm_uninitialized_copy<60, uint8_t, highly_aligned> 2.39 ns 1.43 ns 1.67
bm_uninitialized_copy<64, uint8_t, highly_aligned> 2.33 ns 1.41 ns 1.65
bm_uninitialized_copy<125, uint8_t, highly_aligned> 2.33 ns 1.91 ns 1.22
bm_uninitialized_copy<800, uint8_t, highly_aligned> 5.68 ns 7.00 ns 0.81
bm_uninitialized_copy<3000, uint8_t, highly_aligned> 15.0 ns 15.1 ns 0.99
bm_uninitialized_copy<9000, uint8_t, highly_aligned> 41.7 ns 41.9 ns 1.00
bm_uninitialized_copy<1, uint8_t, not_highly_aligned> 1.30 ns 1.18 ns 1.10
bm_uninitialized_copy<5, uint8_t, not_highly_aligned> 1.41 ns 1.18 ns 1.19
bm_uninitialized_copy<15, uint8_t, not_highly_aligned> 1.66 ns 1.40 ns 1.19
bm_uninitialized_copy<26, uint8_t, not_highly_aligned> 1.40 ns 1.16 ns 1.21
bm_uninitialized_copy<32, uint8_t, not_highly_aligned> 1.39 ns 1.17 ns 1.19
bm_uninitialized_copy<38, uint8_t, not_highly_aligned> 2.31 ns 1.42 ns 1.63
bm_uninitialized_copy<60, uint8_t, not_highly_aligned> 2.35 ns 1.88 ns 1.25
bm_uninitialized_copy<64, uint8_t, not_highly_aligned> 2.32 ns 1.87 ns 1.24
bm_uninitialized_copy<125, uint8_t, not_highly_aligned> 2.82 ns 2.85 ns 0.99
bm_uninitialized_copy<800, uint8_t, not_highly_aligned> 5.57 ns 12.4 ns 0.45
bm_uninitialized_copy<3000, uint8_t, not_highly_aligned> 15.0 ns 15.3 ns 0.98
bm_uninitialized_copy<9000, uint8_t, not_highly_aligned> 42.4 ns 42.6 ns 1.00

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner March 15, 2026 06:59
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Mar 15, 2026
@StephanTLavavej StephanTLavavej added the performance Must go faster label Mar 15, 2026
@StephanTLavavej StephanTLavavej changed the title Use memcpy in uninitialized_copy, uninitialized_copy_n, uninitialized_move, , uninitialized_move_n Use memcpy in uninitialized_meow Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Status: Initial Review

Development

Successfully merging this pull request may close these issues.

<algorithm>: std::uninitialized_copy used memmove instead of memcpy

2 participants