Skip to content

feat(core): implement consensus/timer#433

Open
iamquang95 wants to merge 4 commits into
mainfrom
iamquang95/core-consensus-timer
Open

feat(core): implement consensus/timer#433
iamquang95 wants to merge 4 commits into
mainfrom
iamquang95/core-consensus-timer

Conversation

@iamquang95
Copy link
Copy Markdown
Collaborator

Fix: #158

This PR:

  • Port the consensus/timer to pluto.
  • Use cancellable Tokio sleep futures as the Rust timer primitive.
  • Cover timer behavior with parity-focused unit tests, including proposal timeout and eager timer deadlines.

@iamquang95 iamquang95 changed the title Iamquang95/core consensus timer feat(core): implement consensus/timer May 22, 2026
@iamquang95
Copy link
Copy Markdown
Collaborator Author

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Claude finished @iamquang95's task in 4m 43s —— View job


PR Review: feat(core): implement consensus/timer

Overall this is a clean, well-documented port of charon/core/consensus/timer/roundtimer.go. The strategy selection logic, deadline math, eager-double-linear state, and feature gating all match the Go reference. The Rust-side improvements are tasteful: cancellation by drop, a proper Error/Result surface, and Mutex poison handled instead of unwrapping.

What looks good

  • Parity with the three Go timer strategies (increasing, eager_dlinear, linear) and GetRoundTimerFunc precedence. The Rust selector in get_round_timer_func (timer.rs:292-310) mirrors the Go branching exactly.
  • Faithful reproduction of the upstream Go bug in linearRoundTimer where 200*(round-1)+200 is in nanoseconds despite the comment claiming "starting at 400 milliseconds". The Rust comment at timer.rs:357-358 calls this out explicitly and the test expectations encode nanos (Duration::from_nanos(400) etc.) — good behavior to make the parity intentional and visible.
  • Eager double-linear state machine: deadline math (first_deadline + timeout) matches Go's first.Add(timeout). The test walkthrough at timer.rs:441-491 covers the round-2/round-3 doubling case nicely.
  • FeatureSetGuard correctly serializes tests via a static Mutex and restores GLOBAL_STATE on drop. Drop order leaves the lock as the last to be released — exactly what you want.
  • Cancellation via Drop of the RoundTimerFuture replaces Go's explicit stop() callback. Cleaner ergonomics.
  • Defensive overflow handling: checked_mul/checked_add/checked_sub everywhere arithmetic touches i64/Duration — strictly better than Go which would silently produce negative durations.

Suggestions

  1. Misleading error variant in duration_from_nanos (timer.rs:372-376) — a failed u64::try_from(timeout_nanos) is mapped to Error::InvalidRound, but at this point the round has already been validated by ensure_non_negative_round. A negative timeout_nanos here would represent an arithmetic overflow during timeout computation, not an invalid round. Suggest Error::DurationOverflow { round } to match intent. In practice this path is currently unreachable for non-negative rounds (since 200*(round-1)+200 >= 0 for round >= 0), but the variant choice is what a future reader will use to diagnose. Fix this →

  2. Slightly misleading doc comment (timer.rs:231) — // Deadline is either double the first timeout. reads ambiguously. The code is first_deadline + timeout, which is "double the timeout from the original start time". Consider: // Deadline = first_deadline + timeout (which is the original start + 2 * timeout).

  3. TimerType::eager divergence from Go (timer.rs:89-91) — Go's implementation uses strings.Contains(string(t), "eager"), which would automatically pick up any future "eager_*" timer types. Rust uses an explicit matches!(self, Self::EagerDoubleLinear). This is arguably better (type-safe, no string sniffing), but worth flagging as a divergence — if Charon adds another eager-mode timer upstream, the Rust side needs an extra arm.

  4. PROPOSAL_TIMEOUT is private (timer.rs:45) while INC_ROUND_START, INC_ROUND_INCREASE, LINEAR_ROUND_INC are all pub. Go inlines 1500 * time.Millisecond so there's no upstream constant either — but for consistency in Pluto, consider exposing it (or downgrading the others to private if they're not externally referenced yet). Minor.

  5. expect on global feature lock (timer.rs:314-316) — feature_enabled panics on a poisoned global RwLock. The timer methods themselves propagate TimerStatePoisoned for their local mutex but go through feature_enabled which doesn't. For consistency you could lift this into a Result, though existing pluto-featureset callers seem to follow the same panic-on-poison pattern, so this may be fine as-is.

  6. Round 0 is silently accepted by ensure_non_negative_round (timer.rs:378-384) — produces a 750ms timer for Increasing, 0 for Linear, 200ns for the linear-timer non-1 branch. QBFT rounds are 1-indexed, so this may not matter, but worth confirming whether the future consumer ever passes round 0. The Go code has the same behavior.

  7. Optional nit on the negative-round test (timer.rs:647-664) — currently only -4 is exercised. Adding i64::MIN would give confidence that the error formatting path doesn't itself overflow.

  8. feature_enabled called multiple times in get_round_timer_func (timer.rs:292-310) — re-locks the global RwLock each time. Not hot, mirrors Go's pattern, fine. Could be hoisted into locals if you ever profile and care.

Verdict

LGTM modulo the small Error::InvalidRoundError::DurationOverflow correction (item 1) and the doc-comment polish (item 2). Everything else is style/nit. The parity discipline (preserving the upstream nanos bug intentionally, plus a comment explaining it) is exactly what AGENTS.md asks for.

Note: I was unable to run cargo fmt --check / cargo clippy / cargo test locally (cargo commands require approval in this environment). Please verify the workspace gates pass in CI.
· iamquang95/core-consensus-timer

@iamquang95
Copy link
Copy Markdown
Collaborator Author

All good, no need to fix any findings

Copy link
Copy Markdown
Contributor

@mskrzypkows mskrzypkows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@varex83 varex83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@varex83agent varex83agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

Clean, well-documented port of Charon's consensus/timer package. The cancellable Tokio sleep model is a nice Rust-native fit, parity with the Go reference (including the deliberate preservation of Charon's nanoseconds-vs-milliseconds quirk in linear_timer_timeout) is faithful, and tests are well-structured around tokio's paused clock plus a serialised featureset guard. No bug or major severity issues. Findings below are mostly cleanup/consistency suggestions:

  • Error modelTimerStatePoisoned is reachable only on programmer error and is inconsistent with how GLOBAL_STATE poisoning is handled (.expect); duration_from_nanos reports InvalidRound for what is really a DurationOverflow; DurationOverflow/DeadlineOverflow carry identical payloads and could merge.
  • Naminglinear_round_timeout (used by EagerDoubleLinear) vs linear_timer_timeout (used by LinearRoundTimer) differ only by one word but compute very different things; with_duty shadows the workspace's builder-mutator convention; the trait method RoundTimer::timer collides with locals named timer.
  • API surface#[derive(Default)] would replace three hand-written Default impls that just call new(); Clone on IncreasingRoundTimer/LinearRoundTimer is never used through the trait object.
  • Tests — missing coverage for round = 0, round = i64::MAX, the doubling-with-ProposalTimeout combination, and FEATURESET_TEST_LOCK cascades on test-panic poisoning.

Verdict: COMMENT — leave to author discretion. None of the findings block merge; the major ones are worth addressing in this PR, the rest are polish.

.ok_or(Error::DurationOverflow { round })
}

fn linear_timer_timeout(round: i64) -> Result<Duration> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusingly similar helper names: linear_round_timeout (line 345, used by EagerDoubleLinearRoundTimer::timer) and linear_timer_timeout (line 354, used by LinearRoundTimer::timer) differ by one word but compute very different things (LINEAR_ROUND_INC * round vs the 200*(round-1) + 200 nanoseconds formula). A future refactor could easily swap them at the two call sites (lines 223 and 284) with no compile error. Suggest renaming the second to something like linear_subsequent_round_timeout or moving it onto LinearRoundTimer as an inherent method. Charon avoided this by inlining the expression directly inside linearRoundTimer.Timer.

let mut first_deadlines = self
.first_deadlines
.lock()
.map_err(|_| Error::TimerStatePoisoned)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent poison handling: this line maps the per-instance Mutex poison to Error::TimerStatePoisoned, but feature_enabled (line 315) — also called inside timer() via proposal_timeout_duration.expects on GLOBAL_STATE poisoning. So one poisoned-lock path returns a recoverable error while the other panics, inside the same function. Per workspace convention (see crates/dkg/src/sync/client.rs:247,254), poisoned-lock = .expect. Suggest dropping TimerStatePoisoned (line 68-70) and using .expect("timer state lock poisoned") here for consistency; the critical section is too small to plausibly poison the lock anyway.

}

fn duration_from_nanos(timeout_nanos: i64, round: i64) -> Result<Duration> {
let timeout_nanos = u64::try_from(timeout_nanos).map_err(|_| Error::InvalidRound { round })?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong error variant. By the time duration_from_nanos is reached, ensure_non_negative_round(round)? (line 355) has already proven round >= 0. The u64::try_from(timeout_nanos) conversion here can only fail if the computed timeout_nanos is negative — which is an arithmetic-overflow condition, not an invalid-round one. Should be Error::DurationOverflow { round }. (Note: today this branch is actually unreachable for round >= 0 since (round-1)*200 + 200 >= 0, but using the wrong variant if it ever becomes reachable would be misleading.)

},
/// Timer duration arithmetic overflowed.
#[error("timer duration overflow for round: {round}")]
DurationOverflow {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DurationOverflow (line 58) and DeadlineOverflow (line 64) both carry only round: i64 and originate from interchangeable arithmetic paths — DurationOverflow from round arithmetic (Duration::checked_mul/checked_add/u32::try_from), DeadlineOverflow from Instant::checked_add (line 402). From a caller's perspective the failure is the same: "timer arithmetic overflowed for round N". Consider collapsing into a single Overflow { round } (or TimerArithmeticOverflow { round }).

}
}

impl Default for IncreasingRoundTimer {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three timer structs have a manual Default impl (lines 146, 209, 269) that just calls Self::new(), where new() itself produces duty: None + (for EagerDoubleLinear) an empty HashMap. The whole boilerplate can be replaced with #[derive(Default)] on each struct — Option, Mutex<HashMap<_,_>>, and HashMap all implement Default. Either drop new() in favour of T::default() / T::with_duty(...), or keep new() and derive Default to remove the three manual impls.

#[derive(Debug)]
pub struct EagerDoubleLinearRoundTimer {
duty: Option<Duty>,
first_deadlines: Mutex<HashMap<i64, Instant>>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::sync::Mutex is the right choice here — the critical section (lines 226-240) is short, synchronous, and the guard is dropped before the returned future is constructed. A future contributor reviewing async code might be tempted to "fix" this to tokio::sync::Mutex, which would be both unnecessary and would force timer() to become async. A one-line comment on the field — e.g. // std::sync::Mutex: critical section is bounded and no guard escapes this function — would prevent that.

Also worth a brief note that first_deadlines grows monotonically with rounds (matching Charon's firstDeadlines map[int64]time.Time — same parity-preserved property) so a future reader doesn't add eviction logic prematurely.

}

/// Creates an increasing round timer for a duty.
pub fn with_duty(duty: Duty) -> Self {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_duty shadows the workspace's with_* convention, which is overwhelmingly a consuming builder mutator (e.g. with_timeout, with_relays, with_external_ip in crates/p2p, crates/peerinfo, crates/parsigex). The three with_duty constructors here are alternative-new associated functions, not mutators on an existing instance. Rust API Guidelines (C-CTOR) suggest naming after the input or distinguishing role. Consider IncreasingRoundTimer::for_duty(duty) (and same for the other two) to avoid clashing with workspace conventions.

}
}

fn wake_duration(duration: Duration) -> Duration {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wake_duration clamps any duration to a 1ms minimum. The reason — that tokio's paused clock cannot reliably wake a sub-microsecond sleep_until — isn't obvious from the function name or body. A one-line doc comment explaining the paused-time edge case (and why the helper exists for the linear_round_timer test with its 400/600/800-nanosecond timeouts) would help future readers.

/// Round produced a negative duration.
#[error("invalid consensus round: {round}")]
InvalidRound {
/// Invalid round.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the round: i64 field-level doc comments on each Error variant (lines 53, 59, 65) restate the variant title and add no information — the variant-level #[error("...round: {round}")] already conveys what round means. Either tighten to a single-token description ("The offending round number.") or accept the duplication.

///
/// The caller owns cancellation by dropping the returned future before it
/// fires.
fn timer(&self, round: i64) -> Result<RoundTimerFuture>;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the trait method RoundTimer::timer shares its name with both the type and common local variable names — call sites read let timer = ...; timer.timer(round). Charon also uses Timer for the method, so this preserves a 1:1 mapping with upstream (which has value for porters), but a Rust-idiomatic alternative like start(round) or deadline(round) would read more clearly. Up to author discretion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement core/consensus/timer

4 participants