Skip to content

Deterministic runtime crate#5016

Open
Shubham8287 wants to merge 45 commits into
masterfrom
shub/sim
Open

Deterministic runtime crate#5016
Shubham8287 wants to merge 45 commits into
masterfrom
shub/sim

Conversation

@Shubham8287
Copy link
Copy Markdown
Contributor

@Shubham8287 Shubham8287 commented May 13, 2026

Description of Changes.

Introduces deterministic runtime crate.
Integrate it with RelationalDB.

I think best steps to review:

  • Read the README of runtime crate.
  • Look at the integration with existing crates - durability, core, snapshot, etc.
  • Read runtime crate's code.

Draft branch to Test code - #5019

API and ABI breaking changes

NA

Expected complexity level and risk

Does not intend to change any production functionality, but it's big code.

Testing

  • new crate contains unit and integration tests.
  • Existing tests should work for production.

@Shubham8287 Shubham8287 changed the title Deterministic Runtime crate Deterministic runtime crate May 13, 2026
@Shubham8287 Shubham8287 self-assigned this May 14, 2026
Comment thread crates/runtime/src/lib.rs
Tokio(tokio::task::JoinHandle<T>),
#[cfg(feature = "simulation")]
Simulation(sim::JoinHandle<T>),
Detached(PhantomData<T>),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is interesting and could use some commentary!

Comment thread crates/runtime/src/sim_std.rs
Comment on lines +37 to +38
| `tokio::sync` primitives | Constrained | Core crates above runtime | These can be replay-compatible only when all participating tasks remain simulator-owned and progress stays on simulator-controlled async paths | Wake ordering or blocking semantics diverge once code depends on a real runtime or host-driven progress | Audit per primitive and push deep-core paths toward runtime-owned or single-threaded structures |
| `parking_lot::{}` and `std::sync::{}` | Constrained | Core crates, especially datastore | Safe only where access stays single-threaded or non-contended under DST | Host synchronization leaks nondeterministic acquisition order | Keep out of deep-core execution paths; prefer runtime-owned or single-threaded structures |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be quite a limitation. Are we planning to provide custom synchronization primitives?

Copy link
Copy Markdown
Contributor Author

@Shubham8287 Shubham8287 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may eventually need some custom synchronization primitives, mainly for places that rely on blocking behavior and can't be refactored to use async primitives (moving them to async path should be the first choice)

For async paths,, tokio::sync should take us pretty far. Tokio documents it as runtime agnostic, so the non-blocking APIs should work with our runtime as long as execution stays inside the normal wake/poll path. Note, blocking_send and brothers are the problem here, since they hand control back to the OS by parking the thread or doing similar stuff.

datastore::Locking uses parking_lot. Which is only deterministic until not contended. As module runs already single-threaded, contention doesn't happen from there. I could be wrong, but I think it could only happen due to subscriptions so we are fine until we do testing without subscriptions and then we either have to switch this lock to async one or implement custom.

Comment thread crates/runtime/src/sim/time/mod.rs Outdated
/// Shared virtual clock and timer registry for one simulation runtime.
///
/// Virtual clock that only advances when explicitly driven — no wall-clock
/// progression, like Tokio's time-pause mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in more detail (in the comment) what the difference to tokio paused-time are? Or is this comment saying that it is equivalent to tokio?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, It is conceptually similar to tokio paused-time written to work with our custom executor with handful of APIs. I have updated the comment.

async fn create_snapshot(repo: Arc<SnapshotRepository>) -> anyhow::Result<TxOffset> {
let start = Instant::now();
let rt = tokio::runtime::Handle::current();
let rt = spacetimedb_runtime::Handle::tokio_current();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we should eventually thread through runtime handles from main?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm yes, as of now, but we can also implement something like Tokio's context if needed in future: https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.enter. I deffered to do it, until we have to do it.

let original = PTHREAD_ATTR_INIT.get_or_init(|| unsafe {
// `RTLD_NEXT` skips this interposed function and finds the libc
// implementation that would have been called without the simulator.
let ptr = libc::dlsym(libc::RTLD_NEXT, c"pthread_attr_init".as_ptr().cast());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants