Skip to content

Commit 94a5e2c

Browse files
authored
Cranelift: robustify timing infrastructure against mis-use and/or system clock bugs. (#12709)
In #12692, it was observed that the computation of time spent in nested timing spans, minus child time, was underflowing. Correct operation of the handling of nested spans depends on the invariant that the accumulated time for child spans is less than or equal to a parent span once timing is completed. This property should hold as long as the system clock is monotonic, and as long as timing tokens are dropped in-order, so that the elapsed time of a parent truly is computed after the elapsed time of a child ends. The timing state may also temporarily violate this invariant whenever a token is pending (still on stack and timing): the child time of any completed child spans will be counted, but the parent has not yet been. Hence, taking a snapshot of the state and computing "parent minus children" on that snapshot may observe cases that yield underflow. This PR makes the infrastructure more robust along a few different dimensions: - It hardens the clock source we use to have a locally-ensured guarantee of monotonicity, since we rely on this for logical correctness. In particular, for each thread (since timing spans never move between threads), we track the last `Instant` that was used by the timing infrastructure, and use that value (zero time passed) if the system clock moves backward. - It hardens the assert about pass-timing token drop order from a `debug_assert` to an `assert`. If this invariant is being violated, we want to know about it noisily, rather than observing a subtraction underflow or other inconsistency later. - It adds an assert in `take_current()` to ensure that a snapshot is never taken when any pass timing is in progress. This should address any theoretically possible sources of #12692, as far as I can tell.
1 parent 5afb603 commit 94a5e2c

1 file changed

Lines changed: 34 additions & 3 deletions

File tree

cranelift/codegen/src/timing.rs

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -220,21 +220,51 @@ mod enabled {
220220
/// Take the current accumulated pass timings and reset the timings for the current thread.
221221
///
222222
/// Only applies when [`DefaultProfiler`] is used.
223+
///
224+
/// # Panics
225+
///
226+
/// Panics when any pass timing token is currently active.
223227
pub fn take_current() -> PassTimes {
228+
assert_eq!(
229+
CURRENT_PASS.with(|p| p.get()),
230+
Pass::None,
231+
"Cannot take_current() on profiler while a pass is active"
232+
);
224233
PASS_TIME.with(|rc| mem::take(&mut *rc.borrow_mut()))
225234
}
226235

227236
// Information about passes in a single thread.
228237
thread_local! {
229238
static CURRENT_PASS: Cell<Pass> = const { Cell::new(Pass::None) };
239+
static LAST_INSTANT: Cell<Option<Instant>> = Cell::new(None);
240+
}
241+
242+
/// Provides a guaranteed-never-decreasing `Instant`. This is
243+
/// supposed to be guaranteed by the operating system APIs that
244+
/// are used to implement `Instant::now()`, but apparently
245+
/// sometimes kernel bugs may result in this moving backward, and
246+
/// if it does, our invariants here are violated, possibly
247+
/// resulting in panics when we process nested time spans. So we
248+
/// backstop against any bugs by tracking the last-returned
249+
/// `Instant` and never regressing before it.
250+
fn monotonic_instant() -> Instant {
251+
let now = match LAST_INSTANT.get() {
252+
None => Instant::now(),
253+
Some(prev) => {
254+
let system_now = Instant::now();
255+
core::cmp::max(prev, system_now)
256+
}
257+
};
258+
LAST_INSTANT.set(Some(now));
259+
now
230260
}
231261

232262
impl Profiler for DefaultProfiler {
233263
fn start_pass(&self, pass: Pass) -> Box<dyn Any> {
234264
let prev = CURRENT_PASS.with(|p| p.replace(pass));
235265
log::debug!("timing: Starting {pass}, (during {prev})");
236266
Box::new(DefaultTimingToken {
237-
start: Instant::now(),
267+
start: monotonic_instant(),
238268
pass,
239269
prev,
240270
})
@@ -260,10 +290,11 @@ mod enabled {
260290
/// Dropping a timing token indicated the end of the pass.
261291
impl Drop for DefaultTimingToken {
262292
fn drop(&mut self) {
263-
let duration = self.start.elapsed();
293+
let now = monotonic_instant();
294+
let duration = self.start.duration_since(now);
264295
log::debug!("timing: Ending {}: {}ms", self.pass, duration.as_millis());
265296
let old_cur = CURRENT_PASS.with(|p| p.replace(self.prev));
266-
debug_assert_eq!(self.pass, old_cur, "Timing tokens dropped out of order");
297+
assert_eq!(self.pass, old_cur, "Timing tokens dropped out of order");
267298
PASS_TIME.with(|rc| {
268299
let mut table = rc.borrow_mut();
269300
table.pass[self.pass.idx()].total += duration;

0 commit comments

Comments
 (0)