Deterministic runtime crate#5016
Conversation
Co-authored-by: Shubham Mishra <shivam828787@gmail.com> Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
| Tokio(tokio::task::JoinHandle<T>), | ||
| #[cfg(feature = "simulation")] | ||
| Simulation(sim::JoinHandle<T>), | ||
| Detached(PhantomData<T>), |
There was a problem hiding this comment.
This one is interesting and could use some commentary!
| | `tokio::sync` primitives | Constrained | Core crates above runtime | These can be replay-compatible only when all participating tasks remain simulator-owned and progress stays on simulator-controlled async paths | Wake ordering or blocking semantics diverge once code depends on a real runtime or host-driven progress | Audit per primitive and push deep-core paths toward runtime-owned or single-threaded structures | | ||
| | `parking_lot::{}` and `std::sync::{}` | Constrained | Core crates, especially datastore | Safe only where access stays single-threaded or non-contended under DST | Host synchronization leaks nondeterministic acquisition order | Keep out of deep-core execution paths; prefer runtime-owned or single-threaded structures | |
There was a problem hiding this comment.
This seems to be quite a limitation. Are we planning to provide custom synchronization primitives?
There was a problem hiding this comment.
I think we may eventually need some custom synchronization primitives, mainly for places that rely on blocking behavior and can't be refactored to use async primitives (moving them to async path should be the first choice)
For async paths,, tokio::sync should take us pretty far. Tokio documents it as runtime agnostic, so the non-blocking APIs should work with our runtime as long as execution stays inside the normal wake/poll path. Note, blocking_send and brothers are the problem here, since they hand control back to the OS by parking the thread or doing similar stuff.
datastore::Locking uses parking_lot. Which is only deterministic until not contended. As module runs already single-threaded, contention doesn't happen from there. I could be wrong, but I think it could only happen due to subscriptions so we are fine until we do testing without subscriptions and then we either have to switch this lock to async one or implement custom.
| /// Shared virtual clock and timer registry for one simulation runtime. | ||
| /// | ||
| /// Virtual clock that only advances when explicitly driven — no wall-clock | ||
| /// progression, like Tokio's time-pause mode. |
There was a problem hiding this comment.
Can you explain in more detail (in the comment) what the difference to tokio paused-time are? Or is this comment saying that it is equivalent to tokio?
There was a problem hiding this comment.
Yeah, It is conceptually similar to tokio paused-time written to work with our custom executor with handful of APIs. I have updated the comment.
| async fn create_snapshot(repo: Arc<SnapshotRepository>) -> anyhow::Result<TxOffset> { | ||
| let start = Instant::now(); | ||
| let rt = tokio::runtime::Handle::current(); | ||
| let rt = spacetimedb_runtime::Handle::tokio_current(); |
There was a problem hiding this comment.
Does this mean that we should eventually thread through runtime handles from main?
There was a problem hiding this comment.
umm yes, as of now, but we can also implement something like Tokio's context if needed in future: https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.enter. I deffered to do it, until we have to do it.
| let original = PTHREAD_ATTR_INIT.get_or_init(|| unsafe { | ||
| // `RTLD_NEXT` skips this interposed function and finds the libc | ||
| // implementation that would have been called without the simulator. | ||
| let ptr = libc::dlsym(libc::RTLD_NEXT, c"pthread_attr_init".as_ptr().cast()); |
Description of Changes.
Introduces deterministic runtime crate.
Integrate it with RelationalDB.
I think best steps to review:
durability,core,snapshot, etc.Draft branch to Test code - #5019
API and ABI breaking changes
NA
Expected complexity level and risk
Does not intend to change any production functionality, but it's big code.
Testing