Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to optimize cpuwide_time_ns() on ARM64 by using the architectural counter frequency (CNTFRQ_EL0) to populate detail::invariant_cpu_freq, shifting initialization away from per-call logic.
Changes:
- On
__aarch64__, routecpuwide_time_ns()to use the cycle-counter path instead of theclock_gettimepath. - Move initialization of
detail::invariant_cpu_freqinto a pre-main initializer, usingCNTFRQ_EL0on ARM64.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/butil/time.h |
Changes cpuwide_time_ns() preprocessor gating to include ARM64 in the invariant-counter fast path and removes lazy init recursion. |
src/butil/time.cpp |
Adds a pre-main initializer to set detail::invariant_cpu_freq, reading CNTFRQ_EL0 on ARM64. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -301,11 +301,6 @@ inline int64_t cpuwide_time_ns() { | |||
| } else if (!cpu_freq) { | |||
| // Lack of necessary features, return system-wide monotonic time instead. | |||
| return monotonic_time_ns(); | |||
There was a problem hiding this comment.
cpuwide_time_ns() can fall through without returning a value when detail::invariant_cpu_freq is negative (e.g., still -1 before init_invariant_cpu_freq() runs, or if initialization fails and leaves it < 0). This is undefined behavior. Please ensure the function always returns a value for the < 0 case (e.g., treat it like the 0 case and return monotonic_time_ns(), or perform a safe one-time initialization here).
| return monotonic_time_ns(); | |
| return monotonic_time_ns(); | |
| } else { | |
| // Negative frequency (e.g. uninitialized or failed initialization), | |
| // fall back to system-wide monotonic time to avoid undefined behavior. | |
| return monotonic_time_ns(); |
| int64_t invariant_cpu_freq = -1; | ||
|
|
||
| static void __attribute__((constructor)) init_invariant_cpu_freq() { | ||
| int64_t baseFreq = -1; | ||
| #if defined(__aarch64__) | ||
| __asm__ __volatile__("mrs %0, CNTFRQ_EL0" : "=r"(baseFreq)); | ||
| #else | ||
| baseFreq = detail::read_invariant_cpu_frequency(); | ||
| #endif | ||
|
|
||
| detail::invariant_cpu_freq = baseFreq; | ||
| } | ||
|
|
There was a problem hiding this comment.
Using __attribute__((constructor)) in a C++ TU is compiler-specific and can be problematic for portability/toolchains. This repo already has BAIDU_GLOBAL_INIT() (which avoids the attribute in C++) and many places use pthread_once/function-local statics for one-time init; consider switching to an established mechanism so initialization is reliable across supported compilers.
| int64_t invariant_cpu_freq = -1; | |
| static void __attribute__((constructor)) init_invariant_cpu_freq() { | |
| int64_t baseFreq = -1; | |
| #if defined(__aarch64__) | |
| __asm__ __volatile__("mrs %0, CNTFRQ_EL0" : "=r"(baseFreq)); | |
| #else | |
| baseFreq = detail::read_invariant_cpu_frequency(); | |
| #endif | |
| detail::invariant_cpu_freq = baseFreq; | |
| } | |
| static int64_t init_invariant_cpu_freq() { | |
| int64_t baseFreq = -1; | |
| #if defined(__aarch64__) | |
| __asm__ __volatile__("mrs %0, CNTFRQ_EL0" : "=r"(baseFreq)); | |
| #else | |
| baseFreq = detail::read_invariant_cpu_frequency(); | |
| #endif | |
| return baseFreq; | |
| } | |
| int64_t invariant_cpu_freq = init_invariant_cpu_freq(); |
| #else | ||
| baseFreq = detail::read_invariant_cpu_frequency(); | ||
| #endif | ||
|
|
||
| detail::invariant_cpu_freq = baseFreq; | ||
| } |
There was a problem hiding this comment.
init_invariant_cpu_freq() runs unconditionally at load time and, on non-aarch64, reads /proc/cpuinfo even in builds where cpuwide_time_ns() uses clock_gettime and never consults detail::invariant_cpu_freq (i.e., !BAIDU_INTERNAL && !__aarch64__). This can add avoidable startup I/O/latency. Consider gating initialization behind the same preprocessor conditions as the fast-path usage, or lazily initializing on first use.
| int64_t baseFreq = -1; | ||
| #if defined(__aarch64__) | ||
| __asm__ __volatile__("mrs %0, CNTFRQ_EL0" : "=r"(baseFreq)); | ||
| #else | ||
| baseFreq = detail::read_invariant_cpu_frequency(); | ||
| #endif | ||
|
|
||
| detail::invariant_cpu_freq = baseFreq; |
There was a problem hiding this comment.
Local variable name baseFreq doesn’t match the surrounding naming style in this file (which predominantly uses snake_case like invariant_tsc, seen_decpoint, endp). Renaming to something like base_freq (and ideally indicating units, e.g. _hz) would improve consistency and readability.
| int64_t baseFreq = -1; | |
| #if defined(__aarch64__) | |
| __asm__ __volatile__("mrs %0, CNTFRQ_EL0" : "=r"(baseFreq)); | |
| #else | |
| baseFreq = detail::read_invariant_cpu_frequency(); | |
| #endif | |
| detail::invariant_cpu_freq = baseFreq; | |
| int64_t base_freq_hz = -1; | |
| #if defined(__aarch64__) | |
| __asm__ __volatile__("mrs %0, CNTFRQ_EL0" : "=r"(base_freq_hz)); | |
| #else | |
| base_freq_hz = detail::read_invariant_cpu_frequency(); | |
| #endif | |
| detail::invariant_cpu_freq = base_freq_hz; |
What problem does this PR solve?
Issue Number: n/a
Problem Summary: cpugettime optimize for ARM64
What is changed and the side effects?
Changed:
detail::invariant_cpu_freq value get by CNTFRQ_EL0
Side effects:
Performance effects:
Breaking backward compatibility: 无
Check List: