Remove the explicit branch hint from const_panic #150386

saethlin · 2025-12-26T02:06:06Z

This was asked for in PR review for equivalence with assert!, but we don't need likely/unlikely intrinsics here (and indeed they are a bit of an antipattern), because codegen knows about the pattern that assert! produces where only one target of a SwitchInt leads to a block terminated by a #[cold] call. All our panic entrypoints are #[cold].

assert! does not use the likely/unlikely intrinsics, maybe it did in the past.

So the intrinsic call in const_assert! should be completely unnecessary. But currently it isn't, because const_panic! stamps out a runtime wrapper for its entrypoint which is not given #[inline], citing compile times. It's not obvious to me why not using #[inline] would even be good in this case.

I think the runtime wrapper should have #[inline] or #[cold], so I'm going to do 3 perf experiments here.
Just changing const_assert!: https://perf.rust-lang.org/compare.html?start=fabece9e9491d0a3c3655dba488881968b7c5f7a&end=f6b082cfcf880ad80b33523d5047c11654d907a8&stat=instructions:u
Also adding #[inline] to const_panic!'s runtime wrapper: https://perf.rust-lang.org/compare.html?start=fabece9e9491d0a3c3655dba488881968b7c5f7a&end=f467cc7e49b74b47da0e2cde49fc1a3140e00ff6&stat=instructions:u
Also adding #[cold] to const_panic!'s runtime wrapper: https://perf.rust-lang.org/compare.html?start=2e854a9344154564259de51385e9ec9506c0f3b7&end=fd8e098558ff08ac002b05ff78e9f91ffa6f3738&stat=instructions:u

The previous comment indicated perf would be worse with #[inline] on the runtime branch, but the results in this PR show that it is more mixed, and net positive. And in fact, the only regression in serde seems to be because we created a very tiny additional amount of codegen, but pushed the size of the second CGU just high enough that we no longer merge all CGUs together. So now serde (in debug with incremental disabled) has 2 CGUs instead of 1. If we just so happened to be still below the threshold where the CGUs don't merge, adding #[inline] would have been nearly all improvements.

saethlin · 2025-12-26T02:06:25Z

@bors try @rust-timer queue

Remove the explicit branch hint from const_panic

rust-bors · 2025-12-26T04:26:50Z

☀️ Try build successful (CI)
Build commit: f6b082c (f6b082cfcf880ad80b33523d5047c11654d907a8, parent: fabece9e9491d0a3c3655dba488881968b7c5f7a)

saethlin · 2025-12-26T04:27:49Z

@bors try @rust-timer queue

Remove the explicit branch hint from const_panic

saethlin · 2025-12-26T04:30:15Z

@bors try @rust-timer queue

Remove the explicit branch hint from const_panic

saethlin · 2025-12-26T04:31:21Z

@bors try

Remove the explicit branch hint from const_panic

rust-timer · 2025-12-26T05:08:04Z

Finished benchmarking commit (f6b082c): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-0.0%	[-0.1%, -0.0%]	3
All ❌✅ (primary)	-0.2%	[-0.2%, -0.2%]	1

Max RSS (memory usage)

Results (primary -1.9%, secondary -1.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.3%	[2.3%, 2.3%]	1
Regressions ❌ (secondary)	2.6%	[1.8%, 3.6%]	3
Improvements ✅ (primary)	-6.1%	[-6.1%, -6.1%]	1
Improvements ✅ (secondary)	-4.1%	[-4.9%, -3.7%]	6
All ❌✅ (primary)	-1.9%	[-6.1%, 2.3%]	2

Cycles

Results (primary -2.4%, secondary 1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.2%	[1.8%, 7.0%]	4
Improvements ✅ (primary)	-2.4%	[-2.4%, -2.4%]	1
Improvements ✅ (secondary)	-2.6%	[-2.9%, -2.4%]	3
All ❌✅ (primary)	-2.4%	[-2.4%, -2.4%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 482.281s -> 482.922s (0.13%)
Artifact size: 392.44 MiB -> 390.42 MiB (-0.51%)

rust-bors · 2025-12-26T06:48:00Z

☀️ Try build successful (CI)
Build commit: f467cc7 (f467cc7e49b74b47da0e2cde49fc1a3140e00ff6, parent: fabece9e9491d0a3c3655dba488881968b7c5f7a)

saethlin · 2025-12-26T13:17:42Z

@rust-timer build f467cc7

saethlin · 2025-12-26T13:53:03Z

@bors try @rust-timer queue

rustbot · 2025-12-26T19:24:27Z

tgross35 is currently at their maximum review capacity.
They may take a while to respond.

tgross35

What is the context/goal here? I'm guessing it may be related to rust-lang/compiler-builtins#1046, but I don't think this is a direct fix.

One suggestion then r=me for the change itself. Cc @RalfJung in case you have anything.

View changes since this review

tgross35 · 2025-12-28T06:51:56Z

tests/codegen-llvm/intrinsics/likely_const_assert.rs

+pub fn test_assert(x: bool) {
+    core::panic::const_assert!(x, "", "",);
+}


I think it would be good to put this in the existing likely_assert.rs to make it obvious that they should be acting the same. Also save a per-test time overhead.

Just FYI, putting codegen tests for multiple similar functions in the same file has a hidden hazard that I discovered in #148849. You need the right CHECK-DAGs in place or the test for a function which is textually earlier can skip past the end of the function and match IR in another function.

Oh, that is gnarly... specifically it matches IR in a function that is otherwise ignored in the test, right? It can't match one of the functions we check with CHECK-LABEL since then it would later fail to find that label.

This seems worth at least tracking in a separate issue, and eventually we should systematically do that for all codegen tests.

RalfJung · 2025-12-28T12:07:46Z

library/core/src/panic.rs

                @capture { $($arg: $ty = $arg),* } -> !:
-                #[noinline]
-                if const #[track_caller] #[inline] { // Inline this, to prevent codegen
+                if const #[track_caller] { // #[inline] on the const branch lets it skip codegen


The comment seems to explain why an attribute is present that's not actually present...?

I removed #[noinline], so I'm basically documenting why it isn't present. Maybe there just no need to document using the default behavior?

I had to refresh my memory of the macro... the docs seem a bit outdated; it says "The runtime function is always marked as #[inline]" but actually both functions are marked as inline by default and noinline can be used to suppress that.

This here is the only user of #[noinline] so if we don't want that any more, maybe we should just remove it?

Anyway for the comment, the one this comment is attached to indeed does not seem useful. The one a few lines down is confusing. It's unclear what "it" refers to, and also there's no panic_fmt visible here, so... what is t talking about?^^

It's unclear what "it" refers to

I was intending the two comments to read as one sentence but of course they aren't actually one so it doesn't work.

When assert! fails, it calls a panic entrypoint,, which like all the panic entrypoint functions is #[cold]. Since the failure branch leads directly to a block which is terminated by a call to a #[cold] function, codegen produces a branch hint. This "SwitchInt arm leads to #[cold]" logic is actually how intrinsics::likely is implemented.

But in const_panic! we have this wrapper function around the runtime failure arm to make const_eval_select work. Adding #[inline] to the wrapper function means that the wrapper gets inlined away in MIR and the cold-ness of what it wraps is now visible to codegen. Which brings back the branch hint information that we lose by just removing intrinsics::likely.

saethlin · 2025-12-28T14:53:32Z

I'm guessing it may be related to rust-lang/compiler-builtins#1046, but I don't think this is a direct fix.

Correct. I ran into the const_panic do_panic::runtime wrapper for the nth time while looking into that issue. Every time I see it not optimized out in immediate-abort builds, I grumble to myself then check the code and see the comment saying it is justified. In the specific context that it popped up recently I realized that the inlining should result in less code so I figured I should redo the perf experiment and see exactly what happens.

I'm working on an idea that could be a direct fix for that compiler-builtins issue, but this PR should stand on its own merits.

rustbot · 2025-12-29T04:29:45Z

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

rustbot · 2025-12-29T04:29:47Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

library/core/src/intrinsics/mod.rs

saethlin · 2025-12-29T17:18:34Z

@bors r=tgross35

bors · 2025-12-29T17:18:37Z

📌 Commit 315646a has been approved by tgross35

It is now in the queue for this repository.

bors · 2025-12-30T04:45:40Z

⌛ Testing commit 315646a with merge d874dce...

bors · 2025-12-30T08:01:49Z

☀️ Test successful - checks-actions
Approved by: tgross35
Pushing d874dce to main...

github-actions · 2025-12-30T08:04:32Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing b082bd5 (parent) -> d874dce (this PR)

Test differences

Show 26 test diffs

26 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard d874dce1252fe409ce13e64d773046853bf5e6ca --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

dist-aarch64-msvc: 7267.6s -> 5657.4s (-22.2%)
aarch64-msvc-2: 7404.7s -> 5861.9s (-20.8%)
x86_64-rust-for-linux: 3134.0s -> 2639.5s (-15.8%)
x86_64-gnu-tools: 3711.0s -> 3202.5s (-13.7%)
pr-check-1: 2023.2s -> 1750.0s (-13.5%)
i686-gnu-1: 8529.5s -> 7423.7s (-13.0%)
test-various: 7531.3s -> 6634.2s (-11.9%)
x86_64-gnu-gcc: 3442.3s -> 3049.0s (-11.4%)
x86_64-msvc-ext2: 5615.8s -> 6225.6s (+10.9%)
x86_64-gnu-llvm-20: 4900.3s -> 4397.2s (-10.3%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-12-30T08:43:36Z

Finished benchmarking commit (d874dce): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.5%, 0.7%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.5%, -0.4%]	2
Improvements ✅ (secondary)	-0.1%	[-0.1%, -0.0%]	3
All ❌✅ (primary)	0.1%	[-0.5%, 0.7%]	4

Max RSS (memory usage)

Results (primary -2.4%, secondary -0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.1%	[2.1%, 2.1%]	1
Improvements ✅ (primary)	-2.4%	[-3.6%, -0.4%]	3
Improvements ✅ (secondary)	-3.3%	[-3.3%, -3.3%]	1
All ❌✅ (primary)	-2.4%	[-3.6%, -0.4%]	3

Cycles

Results (primary -6.8%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.2%	[2.0%, 2.4%]	2
Improvements ✅ (primary)	-6.8%	[-6.8%, -6.8%]	1
Improvements ✅ (secondary)	-4.4%	[-4.4%, -4.4%]	1
All ❌✅ (primary)	-6.8%	[-6.8%, -6.8%]	1

Binary size

Results (primary 0.5%, secondary 0.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.5%, 0.7%]	3
Regressions ❌ (secondary)	0.7%	[0.7%, 0.7%]	2
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	1
Improvements ✅ (secondary)	-0.1%	[-0.1%, -0.1%]	1
All ❌✅ (primary)	0.5%	[-0.0%, 0.7%]	4

Bootstrap: 482.402s -> 482.826s (0.09%)
Artifact size: 390.86 MiB -> 390.83 MiB (-0.01%)