Enable vectorized minmax_element using Neon on ARM64 #5949

hazzlim · 2025-12-08T17:36:10Z

Implement the namespace _Sorting algorithms using Neon, and enable _VECTORIZED_MINMAX_ELEMENT on ARM64 targets.

Implement the namespace _Sorting algorithms using Neon, and enable _VECTORIZED_MINMAX on ARM64 targets.

hazzlim · 2025-12-08T17:39:13Z

I have only enabled _VECTORIZED_MINMAX_ELEMENT in the first instance, as it seemed to make some sense to enable the other _Sorting algorithms in separate PRs.

This PR does not vectorize (u)int64_t on ARM64 as this was not faster than the scalar code.

The benchmark results are below:

Name	MSVC Speedup	Clang Speedup
bm<uint8_t, Op::Min>/8021	24.735	9.268
bm<uint8_t, Op::Min>/63	5.182	2.995
bm<uint8_t, Op::Max>/8021	24.695	9.561
bm<uint8_t, Op::Max>/63	4.896	2.976
bm<uint8_t, Op::Both>/8021	19.184	7.811
bm<uint8_t, Op::Both>/63	1.977	1.841
bm<uint16_t, Op::Min>/8021	12.053	4.524
bm<uint16_t, Op::Min>/31	3.052	2.089
bm<uint16_t, Op::Max>/8021	11.808	4.756
bm<uint16_t, Op::Max>/31	2.933	2.047
bm<uint16_t, Op::Both>/8021	5.426	4.052
bm<uint16_t, Op::Both>/31	1.413	1.521
bm<uint32_t, Op::Min>/8021	6.133	1.908
bm<uint32_t, Op::Min>/15	1.544	1.094
bm<uint32_t, Op::Max>/8021	6.074	1.92
bm<uint32_t, Op::Max>/15	1.53	1.132
bm<uint32_t, Op::Both>/8021	3.146	2.877
bm<uint32_t, Op::Both>/15	0.869	1.195
bm<int8_t, Op::Min>/8021	24.735	9.211
bm<int8_t, Op::Min>/63	5.222	2.778
bm<int8_t, Op::Max>/8021	25.244	9.286
bm<int8_t, Op::Max>/63	5.417	2.889
bm<int8_t, Op::Both>/8021	11.538	11.25
bm<int8_t, Op::Both>/63	1.989	1.76
bm<int16_t, Op::Min>/8021	11.953	4.667
bm<int16_t, Op::Min>/31	3.029	1.872
bm<int16_t, Op::Max>/8021	11.808	4.571
bm<int16_t, Op::Max>/31	3.123	1.882
bm<int16_t, Op::Both>/8021	6.582	5.729
bm<int16_t, Op::Both>/31	1.414	1.541
bm<int32_t, Op::Min>/8021	6.25	1.88
bm<int32_t, Op::Min>/15	1.6	1.135
bm<int32_t, Op::Max>/8021	6.133	1.867
bm<int32_t, Op::Max>/15	1.674	1.094
bm<int32_t, Op::Both>/8021	3.222	1.784
bm<int32_t, Op::Both>/15	0.877	0.903
bm<float, Op::Min>/8021	8.928	4.364
bm<float, Op::Min>/15	1.87	1.358
bm<float, Op::Max>/8021	9.111	4.267
bm<float, Op::Max>/15	2.062	1.371
bm<float, Op::Both>/8021	5.227	1.626
bm<float, Op::Both>/15	0.913	0.7
bm<double, Op::Min>/8021	4.426	2.029
bm<double, Op::Min>/7	0.929	0.731
bm<double, Op::Max>/8021	4.563	2.133
bm<double, Op::Max>/7	0.977	0.725
bm<double, Op::Both>/8021	2.583	0.786
bm<double, Op::Both>/7	0.445	0.402

stl/inc/xutility

stl/src/vector_algorithms.cpp

…nals.

stl/inc/algorithm

stl/inc/xutility

stl/src/vector_algorithms.cpp

Enable vectorized minmax_element using Neon on ARM64

e9ad403

Implement the namespace _Sorting algorithms using Neon, and enable _VECTORIZED_MINMAX on ARM64 targets.

hazzlim requested a review from a team as a code owner December 8, 2025 17:36

github-project-automation bot added this to STL Code Reviews Dec 8, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews Dec 8, 2025

This comment was marked as resolved.

Sign in to view

StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Dec 8, 2025

StephanTLavavej requested changes Dec 8, 2025

View reviewed changes

stl/inc/xutility Outdated Show resolved Hide resolved

github-project-automation bot moved this from Initial Review to Work In Progress in STL Code Reviews Dec 8, 2025

hazzlim added 4 commits December 9, 2025 12:17

Roll into _Is_min_max_optimization_safe

2329cc4

Don't const-qualify unnamed parameter

58f953b

Don't define minmax,is_sorted_until on ARM64 for now

9b849a3

Remove _Traits_8_neon and don't define *_8 functions

b3cd60c

This comment was marked as resolved.

Sign in to view

Use non-floating point types where necessary in _Traits_d_neon

05f1ee9

AlexGuteniev reviewed Dec 9, 2025

View reviewed changes

stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

AlexGuteniev reviewed Dec 9, 2025

View reviewed changes

stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved

stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved

StephanTLavavej moved this from Work In Progress to Initial Review in STL Code Reviews Dec 9, 2025

hazzlim added 3 commits December 9, 2025 23:02

Unify _Get_v_pos interface

1f1ee32

Use _tzcnt_u32/_lzcnt_u32 on avx

3a0a371

Don't declare _8 functions

0b93055

StephanTLavavej self-assigned this Dec 10, 2025

This comment was marked as resolved.

Sign in to view

AlexGuteniev reviewed Dec 10, 2025

View reviewed changes

stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved

hazzlim added 2 commits December 10, 2025 11:55

Add missing const qualifiers

017caa1

Don't add dispatch for minmax,is_sorted_until for now

f995cbc

This comment was marked as resolved.

Sign in to view

AlexGuteniev approved these changes Dec 10, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

hazzlim mentioned this pull request Dec 15, 2025

Add Neon implementation of minmax #5963

Draft

StephanTLavavej added 9 commits January 6, 2026 04:41

Merge branch 'main' into minmax-element-pr

d139232

Cleanup handling of 64-bit integers on ARM64.

cce3375

_Get_first_h_pos(), _Get_last_h_pos(): const params, noexcept.

61327c0

Drop const for template param bool _Sign.

460655f

Uglify: m => _Mx, M => _Mx, r => _Rx

a889815

Fix preprocessor comments.

bd21ec9

Chain together ARM64 and ARM64EC preprocessor conditionals.

981bb91

(Pre-existing) Chain together ARM64 and ARM64EC preprocessor conditio…

8051595

…nals.

Don't modify _Minmax_impl to inspect _Traits::_Has_unsigned_cmp yet.

3893d1b

StephanTLavavej reviewed Jan 6, 2026

View reviewed changes

StephanTLavavej approved these changes Jan 6, 2026

View reviewed changes

StephanTLavavej removed their assignment Jan 6, 2026

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jan 6, 2026

Enable vectorized minmax_element using Neon on ARM64 #5949

Are you sure you want to change the base?

Enable vectorized minmax_element using Neon on ARM64 #5949

Conversation

hazzlim commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hazzlim commented Dec 8, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hazzlim commented Dec 8, 2025 •

edited

Loading