Skip to content

refactor: improve the deadline module#423

Merged
therustmonk merged 24 commits into
mainfrom
kd-may18-errors
May 22, 2026
Merged

refactor: improve the deadline module#423
therustmonk merged 24 commits into
mainfrom
kd-may18-errors

Conversation

@therustmonk
Copy link
Copy Markdown
Collaborator

No description provided.

@therustmonk therustmonk changed the title WIP: improve the deadline module refactor: improve the deadline module May 19, 2026
@therustmonk therustmonk marked this pull request as ready for review May 19, 2026 16:09

/// Wall-clock start of the given slot: `genesis_time + slot *
/// slot_duration`.
fn slot_start(&self, slot: SlotNumber) -> Result<DateTime<Utc>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have a unit test for it

Comment thread crates/core/src/deadline/calculator.rs Outdated

/// Network-delay margin added to every deadline: `slot_duration /
/// MARGIN_FACTOR`.
fn margin(&self) -> Result<Millis> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

label = %self.label,
duty = %curr_duty,
"Deadliner output channel full"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe return some small sleep time here to retry in a while?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We left it like this on purpose to match the reference behavior, but I agree that can be the better choice here.

@therustmonk
Copy link
Copy Markdown
Collaborator Author

@claude review the changes

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @therustmonk's task in 3m 32s —— View job


Code Review: PR #423 — refactor: improve the deadline module

  • Read changed files
  • Compare against Charon Go source for parity
  • Analyze correctness, design, and style
  • Post review feedback

Summary

Nice refactor overall. Splitting the 743-line deadline.rs into mod.rs / calculator.rs / millis.rs reads cleanly, the Millis newtype removes a lot of u64/i64 try_from juggling, and the trait split (DeadlineCalculator / Deadliner) makes the deadliner generic over the deadline function, which lets NoopDeadliner and the production calculator coexist without Arc<dyn Fn>. The previously-flagged unit tests for slot_start and margin have been added (calculator.rs:123-190). The deadliner FSM is functionally faithful to charon/core/deadline.go — same input/output channel pattern, same "drop on full" semantics, same recompute-on-insert logic.

A few correctness and design points below.


Correctness

1. Sentinel duty can leak to the output channel after ~10 years of idle  — crates/core/src/deadline/mod.rs:323-342

When self.duties is empty, curr_deadline = MAX_UTC and remaining_duration() returns FAR_FUTURE_DURATION (~10 years). If the timer ever fires while no duties are tracked, handle_expired will try_send(self.curr_duty.clone()) — and curr_duty is the sentinel Duty::new(SlotNumber::new(0), DutyType::Unknown). That's a real (Unknown, slot 0) duty being shipped to the consumer. Astronomically unlikely in practice, but trivial to guard:

fn handle_expired(&mut self) -> Option<Duration> {
    if self.duties.is_empty() {
        // Sentinel timer fired with no real duty; just reschedule.
        return Some(self.remaining_duration());
    }
    // ...rest unchanged
}

Fix this →

2. OUTPUT_BUFFER = 256 deviates from Charon's outputBuffer = 10  — mod.rs:179

charon/core/deadline.go:118 uses 10 — "big enough to support all duty types, which can expire at the same time while external consumer is synchronously adding duties." Project rule is functional equivalence with Go; bumping to 256 hides backpressure that would surface earlier under Go. Either drop to 10 to match, or add a comment justifying the deviation. Same question for INPUT_BUFFER = 256 vs Go's unbuffered input channel (the oneshot reply still serializes writers, so it's behaviorally equivalent — but worth a sentence saying so).

3. mskrzypkows's comment on mod.rs:333 (channel-full path)

The reviewer suggests returning a small sleep time to retry instead of dropping the duty. The current behavior — log + drop + advance — matches charon/core/deadline.go:184-188, which also drops on full via default:. So this is a parity decision, not a bug. Worth replying on the thread that this is intentional Charon parity; if retry is desired it should be a separate decision applied symmetrically with Charon.

4. handle_input calls recompute_curr() after inserting the strictly-earlier duty  — mod.rs:294-297

Not a bug, just wasteful: we already know the new duty has the smallest deadline, so an O(n) scan re-confirms what we already know. Could be:

self.duties.insert(duty.clone());
self.curr_duty = duty;
self.curr_deadline = deadline;
Some(self.remaining_duration())

Skip if you want to keep the code path uniform — the HashSet is small in practice.


Design / API

5. DeadlinerLink vs DeadlinerImpl naming is inverted  — mod.rs:121, 158

The thing that impl Deadliner for X is DeadlinerLink (private). The thing called DeadlinerImpl is the task state, doesn't implement the trait, and is only exposed for its start() constructor. Reader-hostile. Two options:

  • Rename: DeadlinerImplDeadlinerTask (or keep private and expose pub fn start_deadliner(...)); DeadlinerLinkDeadlinerHandle.
  • Make DeadlinerImpl private and add a free function pub fn start<C: DeadlineCalculator>(...) -> Arc<dyn Deadliner>.

The doc comment on DeadlinerImpl already calls it "Owned state of the background task" — the type name should reflect that.

6. Deadliner trait docstring conflates layers  — mod.rs:102-104

add doc says it returns false "if the duty never expires (e.g., Exit, BuilderRegistration)" — but that's a property of DutyDeadlineCalculator, not of Deadliner. Any calculator returning Ok(None) causes add to return false. Reword to "if the calculator reports the duty has no deadline" to keep the abstraction clean.

7. Millis doc duplication  — millis.rs:1-3 and millis.rs:12-14

Module-level doc and struct-level doc say the same thing nearly verbatim. Drop one.


Tests

8. deadliner() test uses real wall-clock and a 500ms slot  — mod.rs:436-520

Real-time tests with sub-second deadlines are notorious on loaded CI. The 5s timeout for ~1s expirations is generous, but consider tokio::time::pause() + advance() for deterministic timing. Not blocking — flagging in case it goes flaky later.

9. tests::deadliner doesn't assert ordering across receiving channel  — mod.rs:503-510

The test does actual_duties.sort_by_key(...) before comparing — fair, since two duties with the same deadline have nondeterministic emission order. But the test doesn't directly verify that duties arrive in deadline order, which is the deadliner's actual contract. Worth a follow-up test with two duties of clearly different deadlines that asserts the earlier one arrives first.

10. Voluntary-exits test path is misleading  — mod.rs:417-421 and mod.rs:490-499

The TestCalculator returns Some(start + 1h) for DutyType::Exit instead of None. So the test labeled "voluntary_exits" actually exercises the "non-expired with future deadline" path, not the "never expires" path. The "never expires" branch is covered by never_expire_duties, but rename the local var (voluntary_exitsfuture_duties or similar) to avoid the false-friend reading.


Other

  • crates/core/src/types.rs:73-77DutyType::never_expires() is well-placed and matches the Go switch.
  • Millis::checked_div doc says "overflow or division by zero" but i64::checked_div only fails on division by zero (and MIN / -1); the doc is slightly misleading. Minor.
  • The module-level doc example (mod.rs:7-33) uses DeadlinerImpl::start — works because pub struct DeadlinerImpl<C> is reachable through the public deadline module, but if you move to a free function (see point 5), update the doc.

Verdict

Solid refactor, well-tested, faithful to Charon. The points above are mostly polish; the only real correctness item is #1 (sentinel-duty leak after 10y idle) and the #2 buffer-size deviation from Charon, which the golden rule on functional equivalence asks us to justify or revert.
Branch

Comment thread crates/core/src/deadline/calculator.rs Outdated
use super::{Result, millis::Millis, to_chrono_duration};

/// Fraction of slot duration to use as a margin for network delays.
const MARGIN_FACTOR: i32 = 12;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be i64 from the begin, no casting later

Comment thread crates/core/src/deadline/calculator.rs Outdated

/// Duty-type-specific offset from slot start.
fn duty_duration(&self, duty_type: &DutyType) -> Result<Millis> {
let secs = Millis::from(self.slot_duration);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msecs is better name i think, it's Millis

@emlautarom1
Copy link
Copy Markdown
Collaborator

@therustmonk Could you check #430? It makes use of the deadliner and shows some challenges with the existing API that could be taken care of in this PR.

Copy link
Copy Markdown
Collaborator

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor looks promising but it does not improve usages in a meaningful way.

I've left a comment on how I would address it based on the needs of modules like aggsigdb (#430), but I suggest to also explore how we can make existing modules better to use (ex. parsigdb).

Lastly, regarding the internals, some time ago I explored a different approach that uses tokio's scheduling for most of the work: https://github.com/NethermindEth/pluto/tree/emlautarom1/deadline_alt . Maybe something is worth salvaging from that experiment, but the main goal should now be to make it easier for callers to use this module.

Comment on lines +95 to +113
pub trait Deadliner: Send + Sync {
/// Adds a duty for deadline scheduling.
///
/// Returns `true` if the duty was added for future deadline scheduling.
/// This method is idempotent and returns `true` if the duty was previously
/// added and still awaits deadline scheduling.
///
/// Returns `false` if:
/// - The duty has already expired and cannot be scheduled
/// - The calculator reports the duty has no deadline (`Ok(None)`)
/// - The calculator failed to compute the deadline (`Err(_)`)
async fn add(&self, duty: Duty) -> bool;

/// Returns the channel for receiving deadlined duties.
///
/// This method may only be called once and returns `None` on subsequent
/// calls. The returned channel should only be used by a single task.
fn c(&self) -> Option<mpsc::Receiver<Duty>>;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Deadliner interface is hard to use as is:

  • It's not intended to be cloned, so Arc<dyn Deadliner> is unnecessary, that is, a Deadliner instance is never shared by multiple "services" (ex. aggsigdb), instead it's owned.
  • In a service, there might be multiple places where Duty are added
  • The channel c is used only once by a service and it's always expected to be Some.

If you squint a bit you can see how a Deadliner is a mpsc channel, where there are multiple places where a Duty can be registered, while there is a single place where the deadlined duties are read. Also, consider that a Deadliner behaves like an Actor: https://rust-lang-nursery.github.io/rust-cookbook/concurrency/actor.html

I roughly suggest:

  • Make the construction of a deadliner return (handle, c), where c is the current Deadliner.c(), a background task is listening for messages, and handle allows submitting Duty to the background task (through mpsc channel, thus we support multiple handles by cloning)
  • The parameters to the constructor remain the same (ct, label and calculator), they're good.
  • Submitting through a handle waits for a response as we're doing now (through oneshot), but the response is more explicit, separating the "successfully scheduled, "already expired", "no deadline", "failed to compute deadline".
  • The background task emits values on c when the deadline is reached.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emlautarom1 Yes, I thought about what we can do here, and overall I came to similar conclusions, fully agree with you.

The only thing is the (handle, c) pair, I think it's a good design. However, in that case we would also need to pass c around, OR switch to a stricter c() that returns Result, but an explicit c looks better to me.

Here's what I was thinking. Maybe we should merge the refactoring first, since it's the most compatible with the current changes, but already gives us flexibility for future improvements.

Then I can make a follow-up PR for the (handle, c) pair, since it looks like everyone will need to rebase for that change anyway, and it will be easier if the PR is small.

So basically:

  1. Smoothly migrate to this PR, since it's well compatible with the current implementation.
  2. Add a series of PRs (or one PR) that introduce more radical changes.

What do you think? Or do you still think it's better to include the API changes here as well?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the explicit c around is better than the c(), too many issues on callers otherwise.

No issues on having it as a stacked PRs (as long as we actually look into it and does not get forgotten 🫤)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I’ll open a separate (follow-up) PR based on this one, but with the updates, so you can use them and the team can rebase their branches on top of the changes from this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next PR with API changes (in progress): #434

Comment on lines +83 to +87
pub trait DeadlineCalculator: Send + Sync + 'static {
/// Computes the deadline for the given duty. See trait docs for return
/// semantics.
fn deadline(&self, duty: &Duty) -> Result<Option<DateTime<Utc>>>;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this interface is the important one for testing. I would explore if we can make Deadliner a concrete type and control the logic through the calculator instead. If we can do that we can simplify quite the usages (no need to Box<dyn Deadliner>, etc.)

Copy link
Copy Markdown
Collaborator

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal changes LGTM, waiting for a follow-up PR to address this module's API.

@therustmonk therustmonk merged commit 1bcec93 into main May 22, 2026
15 checks passed
@therustmonk therustmonk deleted the kd-may18-errors branch May 22, 2026 14:51
@therustmonk
Copy link
Copy Markdown
Collaborator Author

@emlautarom1 The next PR with API changes: #434

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants