You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make zig-js support multithreaded JavaScript faithfully, in three layers:
Layer A (standards track): isolated agents on real OS threads, SharedArrayBuffer, blocking Atomics.wait/notify, Atomics.waitAsync, structured clone, ArrayBuffer transfer, and the real test262 $262.agent semantics. This is what moves the test262 score and is the prerequisite for everything else.
Layer B (Bun-inspired, GIL'd): the shared-memory Thread / Lock / Condition / ThreadLocal API and Atomics-on-plain-properties from Shared-memory threads for JavaScriptCore (experimental, not working yet) oven-sh/WebKit#249, implemented under a single VM lock (concurrency, not parallelism). The PR's own phase 1 proved this mode is independently useful and testable.
Layer C (chartered, not scheduled): removing the GIL — a truly parallel shared heap. Explicitly blocked on a real GC (we are arena-allocated today) and recorded here only so its prerequisites get designed in, not designed out.
TC39 context: https://github.com/tc39/proposal-structs (Stage 2 — shared structs, Atomics.Mutex/Condition) is the standards future of Layer B/C. Evaluate it only after the Layer A stack is correct.
Where the engine is today (grounded)
Current overall: 41,898/47,928 test262 (87.4%). Relevant buckets:
Bucket
Files
Passing now
Blocked on
built-ins/Atomics
390
~308
real blocking wait/notify, waitAsync (101 files), real agents
built-ins/SharedArrayBuffer
104
103
edge case TBD
Tests using $262.agent.start
109
cooperative subset
concurrent agents
CanBlockIsTrue / CanBlockIsFalse flagged
7 / 2
skipped (CanBlockIsFalse)
[[CanBlock]] modelling
Architecture facts every task below relies on:
Context is single-thread-affine by construction (src/context.zig:16-34): it owns the arena (arena_state), global object, root Shape, environment, exception slot, and the microtask queue (microtasks, drained in evaluate at src/context.zig:141). Nothing is synchronized. ~2.9k LOC.
No GC. Everything (values, objects, strings, AST) lives in the context's std.heap.ArenaAllocator until Context.destroy(). ArenaAllocator is not thread-safe.
Zero threading code today. No std.Thread, mutexes, or Zig atomics anywhere in src/.
The agent model is cooperative, not concurrent (src/interpreter.zig:18504-18691): g_agent: AgentState (line 18531, a process-global) queues agent source strings; $262.agent.broadcast runs them sequentially on the calling thread via agentRunSync (line 18560), each in a fresh arena/realm. threadlocal var t_is_agent (line 18536) guards re-entry. This passes every test where agents don't need to make progress concurrently, and fails every test where they do.
Atomics.wait/notify are fake (src/interpreter.zig:~18856-18947): wait never blocks (returns "ok"/"not-equal"/"timed-out" immediately), notify wakes nothing, waitAsync returns a rejected promise, monotonicNow returns 0, agent.sleep is a no-op.
SAB backing storage is unsound for real threads: ArrayBufferData (src/value.zig:88-101) holds data: []u8 in some context's arena; broadcast shares the raw pointer via makeSharedArrayBufferOver (src/interpreter.zig:18641). Works only because everything is one thread and the parent outlives the agents. With real threads this is a use-after-free and an unsynchronized grow.
Per-interpreter, already thread-friendly:Interpreter holds its eval state in instance fields (no "current interpreter" global); re_legacy Annex-B regex state is per-interpreter (src/interpreter.zig:391-394); the parser is reentrant; each Context already has its own microtask queue and root_shape. Isolated agents therefore do not need shape locks, string-interning locks, or heap locks — the only shared mutable state in Layer A is SAB bytes plus the agent-coordination queues.
Execution is tree-walker (src/interpreter.zig, ~26.5k LOC) with a bytecode VM fast path (src/vm.zig, src/compiler.zig, fallback at src/context.zig:131-136). Both matter: any per-thread state or interrupt-check has to land in both.
C API (src/c_api.zig): JSC-shaped drop-in (JSGlobalContextCreate, JSEvaluateScript, …). No thread rules documented.
test262 runner (conformance/test262.zig): multi-process workers (crash isolation) — orthogonal to in-engine threads. It currently skips CanBlockIsFalse (and async/module categories) — see the flag handling around lines 192-232.
The PR is ~60k lines of C++ against JSC's butterflies/JIT/GC. The machinery does not transplant into a Zig tree-walker. What transplants is everything above the machinery, and it's a lot:
Copy nearly verbatim:
The JS test corpus — JSTests/threads/ (~100 plain-JS files: api/ thread/lock/condition/threadlocal/restrict lifecycle + error cases, atomics/ property-atomics incl. CAS storms and wait/notify/termination, arrays/ shared element read/write, push-resize races, COW, holes, typed-arrays-over-SAB, bench/ with its own harness, cve/ ~37 mechanism-class race tests). Vendor them under test/threads/ with a small shim mapping their helpers to ours. The cve/ and bench/ sets target JIT/GC machinery we don't have — port selectively, keep the rest as a checklist for Layer C.
The API surface, exactly:new Thread(fn, ...args), t.join() (blocks, returns value or rethrows the actual exception), t.asyncJoin(), Thread.current, t.id (main = 0), Lock (non-recursive, hold(fn), asyncHold, tryLock fast path, finally-equivalent release), Condition (wait(lock) = atomic release+park+reacquire, spurious wakeups allowed, asyncWait, notify/notifyAll), ThreadLocal (.value per-thread), Thread.restrict(obj) → ConcurrentAccessError on foreign access, Atomics.* extended to ordinary own data properties (load/store/add/sub/and/or/xor/exchange/compareExchange with SameValueZero so NaN CAS loops work, wait/waitAsync/notify), each one SeqCst atomic step.
The pinned semantic decisions (these were hard-won design calls; do not re-litigate, copy): promise reactions run on the settling thread; asyncJoin/asyncHold/waitAsync tickets settle on the requesting thread's loop; concurrent then() vs resolve() must not tear (per-promise lock in Layer B-parallel; free under GIL); each thread drains its own microtask queue, queues never interleave; join() settles when fn returned and the thread's queues are drained and nothing keeps it alive; termination drops undrained microtasks but published settlements stay visible; join() on a terminated thread rethrows an ordinary Error. Blocking join/wait is allowed except where the embedder forbids it (→ [[CanBlock]] plumbing is shared with Layer A).
The bring-up strategy: everything behind a flag; GIL-first phase as a semantic oracle, ungil later (for us: GIL is Layer B's shipping state, ungil is Layer C); frozen design specs in docs/threads/ with -history files, adversarially reviewed before code; "binding audit" — enumerate every VM-singular field reachable from a second thread and rule each one per-thread / locked / refusedbefore bring-up instead of discovering them by crashing; invariants as CI gates, not goals.
Their bug list as our checklist. Stack-overflow limits per-thread (our max_call_depth check in the interpreter + VM must be per-thread state, not Context state); exception state per-thread (our Context.exception slot is exactly the trap they hit — it must move to per-thread); parked threads must poll stop/termination words; per-VM scratch (regexp match state, date caches, string scratch) enumerated per-thread.
Do not attempt to port: TID-tagged/segmented butterflies, per-object cell locks, TTL/WTL watchpoints, the four JIT tiers' changes, the shared heap server, conservative stack scanning. Under a GIL none of it is needed; for Layer C, Pizlo's "Concurrent JavaScript: It Can Work!" (https://webkit.org/blog/7846/) is the reference design — but it presupposes a GC, which is why Layer C waits.
To pull the corpus: gh pr checkout 249 --repo oven-sh/WebKit in a WebKit clone (or gh api the files) and vendor JSTests/threads/ + docs/threads/ + THREAD.md as reference material.
Invariants (CI gates, not goals)
test262 total never regresses; every commit states the new total/percentage.
zig build conformance stays 33/33.
Single-threaded path stays allocation- and branch-identical where feasible: Layer A adds nothing to the non-agent hot path; Layer B is behind a Context creation option (e.g. enable_threads) and costs ~nothing off.
No new process-global mutable state without a written ruling (per-thread / locked / refused) in docs/threads/bindings.md.
New concurrency code carries stress tests and runs under TSan (sanitize_thread = true on the module in build.zig) in CI at least on demand.
Phase 0 — Thread-affinity guardrails and audit
Independent of everything; can start immediately; small.
Add owner_thread: std.Thread.Id to Context, set in create(); debug-mode assert (std.debug.assert behind a build option) in evaluate, evaluateModule, and the C API entry points that the caller is the owner. Release builds: zero cost.
Document C API thread rules in src/c_api.zig header comment: every JS*Ref is affine to its context's thread; cross-thread use is UB (JSC-style); one context per thread is the supported pattern.
Write docs/threads/bindings.md: enumerate every process-global and threadlocal in src/ (today: g_agent, g_agent_alloc, t_is_agent, plus anything a grep -n "^var \|^threadlocal" src/*.zig and a scan of comptime singletons turns up) with a per-thread/locked/refused ruling. This is the checklist Phase 2 executes against.
Vendor the PR-249 reference material (JSTests/threads/, docs/threads/ specs, THREAD.md) under reference/webkit-249/ (or a git submodule note), with a short README mapping their concepts to ours.
Acceptance: assertions fire under a deliberate cross-thread unit test; bindings.md exists and covers src/*.zig; no behavior change (zig build test262 unchanged).
Phase 1 — Shared buffer storage that can actually be shared
Prerequisite for Phases 2-6. The single most load-bearing change.
New SharedBufferStorage struct (suggest src/shared_buffer.zig): data: []u8, byte_len: usize (atomic for growable), max_byte_len: ?usize, refcount: std.atomic.Value(usize), allocated from a stable global allocator (page_allocator or a dedicated GPA) — never from a context arena.
Growable SABs reserve max_byte_length up front and grow in place by bumping the atomic length (grow-only, no realloc, no pointer movement — this is what makes lock-free views sound). ArrayBuffer.prototype.grow on shared buffers goes through a CAS loop per spec.
Rewire ArrayBufferData (src/value.zig:88-101): is_shared buffers point at SharedBufferStorage (retain/release); non-shared buffers keep the arena path unchanged. Each agent's Context holds a retain; Context.destroy() releases.
Replace makeSharedArrayBufferOver's raw-pointer wrap with storage sharing (retain + new view object in the receiving realm).
Typed-array/DataView read/write paths (taRead/taWrite, src/value.zig:104-198) on shared storage: loads/stores of element size ≤ 8 on naturally-aligned offsets compile to plain loads/stores (Zig guarantees no tearing for aligned ≤word accesses on our targets); Atomics.* ops use @atomicRmw/@cmpxchgStrong directly on the storage bytes.
Audit every place that touches ArrayBufferData.data for resize/detach assumptions (LC_ALL=C grep -a -n "ArrayBufferData\|\.detached\|max_byte_length" src/*.zig).
Acceptance: all currently-passing SAB/Atomics/agent tests still pass (cooperative mode keeps working on the new storage); a new Zig unit test shares one storage across two std.Threads doing atomic increments and joins clean under TSan.
Phase 2 — Real concurrent agents + blocking Atomics
The core. Unlocks the bulk of the remaining ~80 Atomics tests and the 109 $262.agent tests' concurrent subset.
2a. Agent infrastructure
Replace process-global g_agent (src/interpreter.zig:18531) with an AgentGroup struct (suggest src/agent.zig): mutex-protected report queue, list of agent handles, broadcast rendezvous state. Owned by the main Context (or a registry keyed off it); t_is_agent becomes a pointer/threadlocal handle to the current agent record.
Each agent record: std.Thread, its own Context (own arena, realm, shapes, microtask queue — Phase 0's affinity model holds), [[CanBlock]] = true, a state word (running/parked/done) the group can read.
$262.agent.start(src): spawn the OS thread immediately; the agent script runs, typically calls receiveBroadcast(cb) which parks the agent until a broadcast arrives. (Today's model — defer source until broadcast — is exactly what blocking tests can't tolerate.)
$262.agent.broadcast(sab): per test262 INTERPRETING.md, blocks the caller until every started agent has received the SAB (rendezvous: count parked receivers, hand each a retained storage ref, wake them, wait for ack).
report(msg): mutex push onto group queue (copy the string out of the agent arena — it must outlive the agent). getReport(): mutex pop or null.
sleep(ms) → std.Thread.sleep; monotonicNow() → std.time.Timer-based real monotonic ms (also fixes Atomics timeout tests that measure elapsed time).
leaving() marks the record done; group teardown joins all agent threads with a hard timeout so a wedged agent can't hang the runner (the runner's process-level crash isolation is the backstop).
Agent teardown: agent Context.destroy() releases SAB retains; reports already copied out. Define the rule (and assert it): no pointer into an agent arena may survive the agent — only SAB storage and copied report strings cross.
2b. Blocking wait/notify
Waiter-list table (suggest in src/shared_buffer.zig): global mutex-guarded hash map keyed by (storage_ptr, byte_offset) → FIFO waiter list, each waiter a std.Thread.Condition-based ticket (or std.Thread.Futex where the address/width allows — condvar version first; it's portable and spec-shaped: FIFO order, exact wake counts).
Atomics.wait(ta, i, expected, timeout) (replacing src/interpreter.zig:~18856): full spec — validate i32/i64 shared view, throw TypeError if the agent's [[CanBlock]] is false, enter the list's critical section, re-check value (return "not-equal"), park with timeout ("timed-out"), return "ok" when notified.
Atomics.notify(ta, i, count): wake up to count FIFO waiters, return the number actually woken; on non-shared buffers return 0 per spec.
[[CanBlock]]: field on the agent record; main agent's value is host-set. Runner: stop skipping CanBlockIsFalse (2 tests) by running those with main-agent can_block = false; CanBlockIsTrue (7) run with the default.
Termination/teardown: a parked waiter must be wakeable by group teardown (poll a stop word on wake; bounded re-park) — this is PR 249's "stop the world waited 30s for a world that couldn't hear it" lesson.
2c. The bindings audit, executed
Work through docs/threads/bindings.md: per-thread call-depth counter (max_call_depth checks in both interpreter and VM), per-context exception slot already exists — verify nothing else (date caches, string scratch, regex scratch in ../zig-regex) is process-global. re_legacy is per-interpreter — confirm and mark ruled.
Acceptance:zig build test262 -Doptimize=ReleaseFast shows the Atomics bucket jumping (target: >370/390 with waitAsync still excluded); the 109 agent tests' blocking subset passes; a 1000-iteration loop of the heaviest wait/notify tests is flake-free; TSan-clean on the new agent + waiter paths.
Phase 3 — Atomics completion
Depends on 2b. Mostly mechanical after the waiter table exists.
Atomics.waitAsync (101 test files — the single biggest bucket left): enqueue an async waiter (promise capability + owning context) on the same waiter list; notify marks it and signals the owning agent. Event-loop integration: Context.evaluate's drain loop (src/context.zig:141) must not return while async waiters are outstanding — block on a context-level inbox with the nearest timeout deadline, then enqueue the resolution microtask and keep draining. Immediate cases ("not-equal", timeout 0) resolve synchronously-queued per spec.
Atomics.isLockFree(n): honest answer from target info (1/2/4/8 → true on our targets) — and make it consistent, the spec requires a stable answer per agent cluster.
Atomics.pause(): real spin-hint (std.atomic.spinLoopHint), validate the iteration-count argument per spec.
Re-check Atomics.add/sub/and/or/xor/exchange/compareExchange on the Phase-1 storage use real @atomicRmw/@cmpxchgStrong (not read-modify-write in interpreter code) — under cooperative agents this was unobservable; under Phase 2 it's a correctness bug.
Independent of Phases 2-3 (only touches the value graph + ArrayBufferData); parallelizable with them once Phase 1 lands.
structuredClone(value, { transfer }) + an internal serialize/deserialize pair usable across contexts (suggest src/structured_clone.zig). HTML algorithm subset for engine types: primitives + BigInt, plain objects (own enumerable data props), Arrays incl. holes (holes map on Object), Date, RegExp (source+flags), Map, Set, Boolean/Number/String wrappers, Error types (name/message/cause/stack-as-string), ArrayBuffer (byte copy; resizable → preserve max), SAB (share storage, retain), TypedArray/DataView (clone buffer or share if SAB), cycles + identity via memo map. DataCloneError for functions, symbols, proxies, promises, WeakMap/WeakSet.
Two-phase design: serialize to a context-independent byte/IR form, deserialize into the target context's arena — this is the worker postMessage wire format, so don't shortcut with a direct graph-copy that holds pointers into the source arena.
ArrayBuffer.prototype.transfer / transferToFixedLength: move the byte payload to a fresh buffer, set detached = true on the source (src/value.zig flag exists); ensure every TA/DataView path checks detachment (the test262 detachArrayBuffer harness already exercises most of this).
Transfer list in structured clone: transferred ArrayBuffers detach the source and move storage; SABs are not transferable (throw).
Acceptance: new unit + JS tests for round-tripping every supported type incl. cycles and identity (a.x === a.y preserved); transfer detaches; test262 arraybuffer-transfer features pass.
Phase 5 — Embedder Worker API + C API rules
Depends on Phases 2 + 4. This is the public face of Layer A.
Zig API (suggest src/worker.zig): Worker.spawn(allocator, source_or_module, options) → own thread + own Context; postMessage(value, transfer) both directions via Phase-4 serialization over a mutex+condvar channel; message delivery enqueues onto the receiving context's job inbox (same mechanism as Phase 3's waitAsync inbox); terminate() (sets stop word — interpreter and VM loops poll it at call/loop back-edges, the same interrupt check Layer B needs) and join().
Module workers: Worker accepts the ModuleHost hook (src/context.zig:160-163) so workers can load module graphs.
C API: minimal JSWorker* surface or explicitly defer; either way, land the documented thread rules from Phase 0 and a JSContextGroupRef-shaped story for the agent cluster.
Host scheduling hooks: a vtable for "agent has pending jobs, please wake/drain" so embedders with their own loops can integrate.
Acceptance: an example (examples/worker.zig or a conformance smoke test) spawns 4 workers, round-trips structured messages, shares a SAB counter via Atomics, terminates one mid-loop, and exits clean under TSan.
Phase 6 — Thread/Lock/Condition/ThreadLocal under a GIL (Layer B)
Depends on Phases 1-3. This is where PR 249 is copied most directly — API, semantics, and tests — minus the parallel-heap machinery a GIL makes unnecessary.
Context creation option enable_threads; off = today's engine, byte-identical behavior.
The GIL: one std.Thread.Mutex per Context guarding all heap/interpreter access (arena allocation is then safe — exactly one thread runs VM code at a time). Spawned Threads share the Context: same realm, same globalThis, same heap, same module graph — the PR's headline property comes free under the lock.
Per-thread state (the PR's bindings audit, our edition): Interpreter instance, call-depth counter, microtask queue (lift microtasks from Context to a per-thread record; main thread keeps the existing one), re_legacy, current exception. Context.exception stays as the C-API boundary slot for the owner thread only.
Scheduling: the VM loop (src/vm.zigexecLoop) and tree-walker check a yield/stop word every N back-edges/calls and release+reacquire the GIL — without this, one thread starves all others. All blocking points (join, Lock contention, Condition.wait, Atomics.wait, parked receiveBroadcast) release the GIL while parked.
The API, per the PR spec: Thread (ctor errors per their thread-ctor-errors.js; join returns value/rethrows the actual exception object; asyncJoin; ids from a recycled space, main = 0; join settles only after the thread's microtask queue drains), Lock.hold/asyncHold, Condition.wait/asyncWait/notify/notifyAll (atomic release+park+reacquire; spurious wakeups allowed), ThreadLocal, Thread.restrict (owner-thread tag on Object — nullable field, checked only when set — throwing ConcurrentAccessError).
Atomics on plain properties: under the GIL every property op is already one atomic step — implement the semantics (SameValueZero compareExchange, missing-property behavior per their property-store-missing-define-race.js, wait/notify on (object, key) via a second waiter table keyed by object pointer + property key) and the GIL makes them trivially correct. The PR's atomics/property-*.js tests are the spec.
Promise semantics across threads, as pinned by the PR: reactions run on the settling thread's queue; asyncJoin tickets settle on the requester's queue; termination drops undrained microtasks, keeps published settlements; terminated join rethrows plain Error.
Port the PR test corpus: api/ and atomics/ and arrays/ largely as-is behind a thin harness shim (new zig build threads-test target or a conformance suite extension); bench/ for serial-overhead tracking; cve/ selectively (most target JIT/GC races we structurally can't have under a GIL — record each skipped one with a reason; they become Layer C's checklist).
Acceptance: ported api/ + atomics/ + arrays/ corpus green; flag-off test262 run identical to pre-phase totals; a measurable-concurrency smoke test (two threads alternating under the back-edge yield) passes; documented serial overhead with enable_threads off is ~0.
Phase 7 — GIL removal (Layer C) — chartered, not scheduled
Do not start until a real GC replaces the arena (planned tier-5 work). Record the prerequisites now so earlier phases don't paint us into corners:
Tracing GC with safepoints (the arena model cannot express cross-thread object lifetimes).
Shape-transition synchronization (src/shape.zig transition map is the first thing two mutators would tear) — per-thread shape spaces or transition locks; Pizlo's TTL inference is the reference for keeping unshared objects fast.
String strategy: today strings are uninterned arena slices (no shared intern table to race on — keep it that way until Layer C chooses a sharded table).
Value representation: if NaN-boxing lands first, pointer-width atomicity of Value must be a design input.
The PR's phase-2 bring-up ladder (corpus under real parallelism, TSan campaign to zero unsuppressed, serial-perf gate, stress amplifiers) is the template; their bring-up log enumerates the bug classes to expect.
Phase 8 — Track TC39 structs
Re-evaluate proposal-structs (fixed-layout structs, shared structs, Atomics.Mutex/Condition) once Phases 1-3 are green; shared structs map naturally onto SharedBufferStorage + fixed shapes, and the proposal's sync primitives overlap Layer B's Lock/Condition — decide then whether Layer B's API stays engine-specific (like Bun's) or aligns with the proposal's names.
Verification
Compiler: zig 0.17-dev (system 0.16 will not build this repo; on the primary dev machine: ~/.local/share/zig-0.17-dev/zig).
zig build conformance — 33 smoke tests, must stay 100%.
zig build test262 -Doptimize=ReleaseFast — the real metric (~5 min). Every commit message states the new valid total + percentage.
Targeted: zig build diag -Doptimize=ReleaseFast -- run built-ins/Atomics clusters failures by thrown error.
TSan: build the threaded unit/stress tests with sanitize_thread = true; new concurrency code must be clean or carry a justified suppression (PR-249 rule).
Done = the test262 worker/shared-memory coverage runs under the upstream harness with real concurrent agents (no cooperative shims), built-ins/Atomics + SharedArrayBuffer ≥ 98%, the ported PR-249 api/atomics/arrays corpus is green under Layer B, and the stress suite (transfer/detach races, shared-TA atomics storms, worker teardown, reentrancy) runs flake-free in CI.
Notes for agents executing this plan
Dependency graph: P0 ∥ P4-prep anytime → P1 → P2 → {P3 ∥ P5} ; P4 after P1 ; P6 after P1-P3 ; P7 blocked on GC ; P8 after P3. Independent streams that can run in parallel worktrees today: (a) P0 audit/guardrails, (b) P1 storage, (c) P4 structured clone scaffolding (serializer IR + non-shared types), (d) vendoring the PR-249 corpus + harness shim.
src/interpreter.zig contains high bytes — plain grep treats it as binary; always use LC_ALL=C grep -a.
Commit style: conventional commits, small and single-purpose, each stating the new test262 total (e.g. feat: blocking Atomics.wait via waiter table (42,1xx/47,928, 87.x%)).
Keep conformance/runner.zig smoke tests green at every commit; run the full test262 suite before any push that touches src/.
When in doubt about semantics, the order of authority is: ECMA-262 / test262 INTERPRETING.md for Layer A; the PR-249 spec docs (docs/threads/, THREAD.md) for Layer B; Pizlo 2017 for Layer C.
Goal
Make zig-js support multithreaded JavaScript faithfully, in three layers:
SharedArrayBuffer, blockingAtomics.wait/notify,Atomics.waitAsync, structured clone,ArrayBuffertransfer, and the real test262$262.agentsemantics. This is what moves the test262 score and is the prerequisite for everything else.Thread/Lock/Condition/ThreadLocalAPI and Atomics-on-plain-properties from Shared-memory threads for JavaScriptCore (experimental, not working yet) oven-sh/WebKit#249, implemented under a single VM lock (concurrency, not parallelism). The PR's own phase 1 proved this mode is independently useful and testable.TC39 context: https://github.com/tc39/proposal-structs (Stage 2 — shared structs,
Atomics.Mutex/Condition) is the standards future of Layer B/C. Evaluate it only after the Layer A stack is correct.Where the engine is today (grounded)
Current overall: 41,898/47,928 test262 (87.4%). Relevant buckets:
built-ins/AtomicswaitAsync(101 files), real agentsbuilt-ins/SharedArrayBuffer$262.agent.startCanBlockIsTrue/CanBlockIsFalseflaggedCanBlockIsFalse)[[CanBlock]]modellingArchitecture facts every task below relies on:
Contextis single-thread-affine by construction (src/context.zig:16-34): it owns the arena (arena_state), global object, rootShape, environment, exception slot, and the microtask queue (microtasks, drained inevaluateatsrc/context.zig:141). Nothing is synchronized. ~2.9k LOC.std.heap.ArenaAllocatoruntilContext.destroy().ArenaAllocatoris not thread-safe.std.Thread, mutexes, or Zig atomics anywhere insrc/.src/interpreter.zig:18504-18691):g_agent: AgentState(line 18531, a process-global) queues agent source strings;$262.agent.broadcastruns them sequentially on the calling thread viaagentRunSync(line 18560), each in a fresh arena/realm.threadlocal var t_is_agent(line 18536) guards re-entry. This passes every test where agents don't need to make progress concurrently, and fails every test where they do.Atomics.wait/notifyare fake (src/interpreter.zig:~18856-18947): wait never blocks (returns"ok"/"not-equal"/"timed-out"immediately), notify wakes nothing,waitAsyncreturns a rejected promise,monotonicNowreturns 0,agent.sleepis a no-op.ArrayBufferData(src/value.zig:88-101) holdsdata: []u8in some context's arena; broadcast shares the raw pointer viamakeSharedArrayBufferOver(src/interpreter.zig:18641). Works only because everything is one thread and the parent outlives the agents. With real threads this is a use-after-free and an unsynchronized grow.Interpreterholds its eval state in instance fields (no "current interpreter" global);re_legacyAnnex-B regex state is per-interpreter (src/interpreter.zig:391-394); the parser is reentrant; eachContextalready has its own microtask queue androot_shape. Isolated agents therefore do not need shape locks, string-interning locks, or heap locks — the only shared mutable state in Layer A is SAB bytes plus the agent-coordination queues.src/interpreter.zig, ~26.5k LOC) with a bytecode VM fast path (src/vm.zig,src/compiler.zig, fallback atsrc/context.zig:131-136). Both matter: any per-thread state or interrupt-check has to land in both.src/c_api.zig): JSC-shaped drop-in (JSGlobalContextCreate,JSEvaluateScript, …). No thread rules documented.conformance/test262.zig): multi-process workers (crash isolation) — orthogonal to in-engine threads. It currently skipsCanBlockIsFalse(and async/module categories) — see the flag handling around lines 192-232.What we lift from oven-sh/WebKit#249 — and what we don't
The PR is ~60k lines of C++ against JSC's butterflies/JIT/GC. The machinery does not transplant into a Zig tree-walker. What transplants is everything above the machinery, and it's a lot:
Copy nearly verbatim:
JSTests/threads/(~100 plain-JS files:api/thread/lock/condition/threadlocal/restrict lifecycle + error cases,atomics/property-atomics incl. CAS storms and wait/notify/termination,arrays/shared element read/write, push-resize races, COW, holes, typed-arrays-over-SAB,bench/with its own harness,cve/~37 mechanism-class race tests). Vendor them undertest/threads/with a small shim mapping their helpers to ours. Thecve/andbench/sets target JIT/GC machinery we don't have — port selectively, keep the rest as a checklist for Layer C.new Thread(fn, ...args),t.join()(blocks, returns value or rethrows the actual exception),t.asyncJoin(),Thread.current,t.id(main = 0),Lock(non-recursive,hold(fn),asyncHold, tryLock fast path, finally-equivalent release),Condition(wait(lock)= atomic release+park+reacquire, spurious wakeups allowed,asyncWait,notify/notifyAll),ThreadLocal(.valueper-thread),Thread.restrict(obj)→ConcurrentAccessErroron foreign access,Atomics.*extended to ordinary own data properties (load/store/add/sub/and/or/xor/exchange/compareExchangewith SameValueZero so NaN CAS loops work,wait/waitAsync/notify), each one SeqCst atomic step.asyncJoin/asyncHold/waitAsynctickets settle on the requesting thread's loop; concurrentthen()vsresolve()must not tear (per-promise lock in Layer B-parallel; free under GIL); each thread drains its own microtask queue, queues never interleave;join()settles when fn returned and the thread's queues are drained and nothing keeps it alive; termination drops undrained microtasks but published settlements stay visible;join()on a terminated thread rethrows an ordinaryError. Blockingjoin/waitis allowed except where the embedder forbids it (→[[CanBlock]]plumbing is shared with Layer A).docs/threads/with-historyfiles, adversarially reviewed before code; "binding audit" — enumerate every VM-singular field reachable from a second thread and rule each one per-thread / locked / refused before bring-up instead of discovering them by crashing; invariants as CI gates, not goals.max_call_depthcheck in the interpreter + VM must be per-thread state, not Context state); exception state per-thread (ourContext.exceptionslot is exactly the trap they hit — it must move to per-thread); parked threads must poll stop/termination words; per-VM scratch (regexp match state, date caches, string scratch) enumerated per-thread.Do not attempt to port: TID-tagged/segmented butterflies, per-object cell locks, TTL/WTL watchpoints, the four JIT tiers' changes, the shared heap server, conservative stack scanning. Under a GIL none of it is needed; for Layer C, Pizlo's "Concurrent JavaScript: It Can Work!" (https://webkit.org/blog/7846/) is the reference design — but it presupposes a GC, which is why Layer C waits.
To pull the corpus:
gh pr checkout 249 --repo oven-sh/WebKitin a WebKit clone (orgh apithe files) and vendorJSTests/threads/+docs/threads/+THREAD.mdas reference material.Invariants (CI gates, not goals)
zig build conformancestays 33/33.Contextcreation option (e.g.enable_threads) and costs ~nothing off.docs/threads/bindings.md.sanitize_thread = trueon the module inbuild.zig) in CI at least on demand.Phase 0 — Thread-affinity guardrails and audit
Independent of everything; can start immediately; small.
owner_thread: std.Thread.IdtoContext, set increate(); debug-mode assert (std.debug.assertbehind a build option) inevaluate,evaluateModule, and the C API entry points that the caller is the owner. Release builds: zero cost.src/c_api.zigheader comment: everyJS*Refis affine to its context's thread; cross-thread use is UB (JSC-style); one context per thread is the supported pattern.docs/threads/bindings.md: enumerate every process-global and threadlocal insrc/(today:g_agent,g_agent_alloc,t_is_agent, plus anything agrep -n "^var \|^threadlocal" src/*.zigand a scan ofcomptimesingletons turns up) with a per-thread/locked/refused ruling. This is the checklist Phase 2 executes against.JSTests/threads/,docs/threads/specs,THREAD.md) underreference/webkit-249/(or a git submodule note), with a short README mapping their concepts to ours.Acceptance: assertions fire under a deliberate cross-thread unit test;
bindings.mdexists and coverssrc/*.zig; no behavior change (zig build test262unchanged).Phase 1 — Shared buffer storage that can actually be shared
Prerequisite for Phases 2-6. The single most load-bearing change.
SharedBufferStoragestruct (suggestsrc/shared_buffer.zig):data: []u8,byte_len: usize(atomic for growable),max_byte_len: ?usize,refcount: std.atomic.Value(usize), allocated from a stable global allocator (page_allocator or a dedicated GPA) — never from a context arena.max_byte_lengthup front and grow in place by bumping the atomic length (grow-only, no realloc, no pointer movement — this is what makes lock-free views sound).ArrayBuffer.prototype.growon shared buffers goes through a CAS loop per spec.ArrayBufferData(src/value.zig:88-101):is_sharedbuffers point atSharedBufferStorage(retain/release); non-shared buffers keep the arena path unchanged. Each agent'sContextholds a retain;Context.destroy()releases.makeSharedArrayBufferOver's raw-pointer wrap with storage sharing (retain + new view object in the receiving realm).taRead/taWrite,src/value.zig:104-198) on shared storage: loads/stores of element size ≤ 8 on naturally-aligned offsets compile to plain loads/stores (Zig guarantees no tearing for aligned ≤word accesses on our targets);Atomics.*ops use@atomicRmw/@cmpxchgStrongdirectly on the storage bytes.ArrayBufferData.datafor resize/detach assumptions (LC_ALL=C grep -a -n "ArrayBufferData\|\.detached\|max_byte_length" src/*.zig).Acceptance: all currently-passing SAB/Atomics/agent tests still pass (cooperative mode keeps working on the new storage); a new Zig unit test shares one storage across two
std.Threads doing atomic increments and joins clean under TSan.Phase 2 — Real concurrent agents + blocking Atomics
The core. Unlocks the bulk of the remaining ~80 Atomics tests and the 109
$262.agenttests' concurrent subset.2a. Agent infrastructure
g_agent(src/interpreter.zig:18531) with anAgentGroupstruct (suggestsrc/agent.zig): mutex-protected report queue, list of agent handles, broadcast rendezvous state. Owned by the mainContext(or a registry keyed off it);t_is_agentbecomes a pointer/threadlocal handle to the current agent record.std.Thread, its ownContext(own arena, realm, shapes, microtask queue — Phase 0's affinity model holds),[[CanBlock]] = true, a state word (running/parked/done) the group can read.$262.agent.start(src): spawn the OS thread immediately; the agent script runs, typically callsreceiveBroadcast(cb)which parks the agent until a broadcast arrives. (Today's model — defer source until broadcast — is exactly what blocking tests can't tolerate.)$262.agent.broadcast(sab): per test262INTERPRETING.md, blocks the caller until every started agent has received the SAB (rendezvous: count parked receivers, hand each a retained storage ref, wake them, wait for ack).report(msg): mutex push onto group queue (copy the string out of the agent arena — it must outlive the agent).getReport(): mutex pop ornull.sleep(ms)→std.Thread.sleep;monotonicNow()→std.time.Timer-based real monotonic ms (also fixesAtomicstimeout tests that measure elapsed time).leaving()marks the record done; group teardown joins all agent threads with a hard timeout so a wedged agent can't hang the runner (the runner's process-level crash isolation is the backstop).Context.destroy()releases SAB retains; reports already copied out. Define the rule (and assert it): no pointer into an agent arena may survive the agent — only SAB storage and copied report strings cross.2b. Blocking wait/notify
src/shared_buffer.zig): global mutex-guarded hash map keyed by(storage_ptr, byte_offset)→ FIFO waiter list, each waiter astd.Thread.Condition-based ticket (orstd.Thread.Futexwhere the address/width allows — condvar version first; it's portable and spec-shaped: FIFO order, exact wake counts).Atomics.wait(ta, i, expected, timeout)(replacingsrc/interpreter.zig:~18856): full spec — validate i32/i64 shared view, throwTypeErrorif the agent's[[CanBlock]]is false, enter the list's critical section, re-check value (return"not-equal"), park with timeout ("timed-out"), return"ok"when notified.Atomics.notify(ta, i, count): wake up tocountFIFO waiters, return the number actually woken; on non-shared buffers return 0 per spec.[[CanBlock]]: field on the agent record; main agent's value is host-set. Runner: stop skippingCanBlockIsFalse(2 tests) by running those with main-agentcan_block = false;CanBlockIsTrue(7) run with the default.2c. The bindings audit, executed
docs/threads/bindings.md: per-thread call-depth counter (max_call_depthchecks in both interpreter and VM), per-context exception slot already exists — verify nothing else (date caches, string scratch, regex scratch in../zig-regex) is process-global.re_legacyis per-interpreter — confirm and mark ruled.Acceptance:
zig build test262 -Doptimize=ReleaseFastshows the Atomics bucket jumping (target: >370/390 with waitAsync still excluded); the 109 agent tests' blocking subset passes; a 1000-iteration loop of the heaviest wait/notify tests is flake-free; TSan-clean on the new agent + waiter paths.Phase 3 — Atomics completion
Depends on 2b. Mostly mechanical after the waiter table exists.
Atomics.waitAsync(101 test files — the single biggest bucket left): enqueue an async waiter (promise capability + owning context) on the same waiter list;notifymarks it and signals the owning agent. Event-loop integration:Context.evaluate's drain loop (src/context.zig:141) must not return while async waiters are outstanding — block on a context-level inbox with the nearest timeout deadline, then enqueue the resolution microtask and keep draining. Immediate cases ("not-equal",timeout 0) resolve synchronously-queued per spec.Atomics.isLockFree(n): honest answer from target info (1/2/4/8 → true on our targets) — and make it consistent, the spec requires a stable answer per agent cluster.Atomics.pause(): real spin-hint (std.atomic.spinLoopHint), validate the iteration-count argument per spec.Atomics.add/sub/and/or/xor/exchange/compareExchangeon the Phase-1 storage use real@atomicRmw/@cmpxchgStrong(not read-modify-write in interpreter code) — under cooperative agents this was unobservable; under Phase 2 it's a correctness bug.Acceptance:
built-ins/Atomics≥ 380/390;built-ins/SharedArrayBuffer104/104.Phase 4 — Structured clone + ArrayBuffer transfer
Independent of Phases 2-3 (only touches the value graph +
ArrayBufferData); parallelizable with them once Phase 1 lands.structuredClone(value, { transfer })+ an internal serialize/deserialize pair usable across contexts (suggestsrc/structured_clone.zig). HTML algorithm subset for engine types: primitives + BigInt, plain objects (own enumerable data props), Arrays incl. holes (holesmap onObject), Date, RegExp (source+flags), Map, Set, Boolean/Number/String wrappers, Error types (name/message/cause/stack-as-string), ArrayBuffer (byte copy; resizable → preserve max), SAB (share storage, retain), TypedArray/DataView (clone buffer or share if SAB), cycles + identity via memo map.DataCloneErrorfor functions, symbols, proxies, promises, WeakMap/WeakSet.postMessagewire format, so don't shortcut with a direct graph-copy that holds pointers into the source arena.ArrayBuffer.prototype.transfer/transferToFixedLength: move the byte payload to a fresh buffer, setdetached = trueon the source (src/value.zigflag exists); ensure every TA/DataView path checks detachment (the test262detachArrayBufferharness already exercises most of this).Acceptance: new unit + JS tests for round-tripping every supported type incl. cycles and identity (
a.x === a.ypreserved); transfer detaches; test262arraybuffer-transferfeatures pass.Phase 5 — Embedder Worker API + C API rules
Depends on Phases 2 + 4. This is the public face of Layer A.
src/worker.zig):Worker.spawn(allocator, source_or_module, options)→ own thread + ownContext;postMessage(value, transfer)both directions via Phase-4 serialization over a mutex+condvar channel; message delivery enqueues onto the receiving context's job inbox (same mechanism as Phase 3's waitAsync inbox);terminate()(sets stop word — interpreter and VM loops poll it at call/loop back-edges, the same interrupt check Layer B needs) andjoin().Workeraccepts theModuleHosthook (src/context.zig:160-163) so workers can load module graphs.JSWorker*surface or explicitly defer; either way, land the documented thread rules from Phase 0 and aJSContextGroupRef-shaped story for the agent cluster.vtablefor "agent has pending jobs, please wake/drain" so embedders with their own loops can integrate.Acceptance: an example (
examples/worker.zigor a conformance smoke test) spawns 4 workers, round-trips structured messages, shares a SAB counter via Atomics, terminates one mid-loop, and exits clean under TSan.Phase 6 —
Thread/Lock/Condition/ThreadLocalunder a GIL (Layer B)Depends on Phases 1-3. This is where PR 249 is copied most directly — API, semantics, and tests — minus the parallel-heap machinery a GIL makes unnecessary.
Contextcreation optionenable_threads; off = today's engine, byte-identical behavior.std.Thread.MutexperContextguarding all heap/interpreter access (arena allocation is then safe — exactly one thread runs VM code at a time). SpawnedThreads share the Context: same realm, same globalThis, same heap, same module graph — the PR's headline property comes free under the lock.Interpreterinstance, call-depth counter, microtask queue (liftmicrotasksfromContextto a per-thread record; main thread keeps the existing one),re_legacy, current exception.Context.exceptionstays as the C-API boundary slot for the owner thread only.src/vm.zigexecLoop) and tree-walker check a yield/stop word every N back-edges/calls and release+reacquire the GIL — without this, one thread starves all others. All blocking points (join,Lockcontention,Condition.wait,Atomics.wait, parkedreceiveBroadcast) release the GIL while parked.Thread(ctor errors per theirthread-ctor-errors.js;joinreturns value/rethrows the actual exception object;asyncJoin; ids from a recycled space, main = 0; join settles only after the thread's microtask queue drains),Lock.hold/asyncHold,Condition.wait/asyncWait/notify/notifyAll(atomic release+park+reacquire; spurious wakeups allowed),ThreadLocal,Thread.restrict(owner-thread tag onObject— nullable field, checked only when set — throwingConcurrentAccessError).compareExchange, missing-property behavior per theirproperty-store-missing-define-race.js,wait/notifyon(object, key)via a second waiter table keyed by object pointer + property key) and the GIL makes them trivially correct. The PR'satomics/property-*.jstests are the spec.asyncJointickets settle on the requester's queue; termination drops undrained microtasks, keeps published settlements; terminatedjoinrethrows plainError.api/andatomics/andarrays/largely as-is behind a thin harness shim (newzig build threads-testtarget or a conformance suite extension);bench/for serial-overhead tracking;cve/selectively (most target JIT/GC races we structurally can't have under a GIL — record each skipped one with a reason; they become Layer C's checklist).Acceptance: ported
api/+atomics/+arrays/corpus green; flag-off test262 run identical to pre-phase totals; a measurable-concurrency smoke test (two threads alternating under the back-edge yield) passes; documented serial overhead withenable_threadsoff is ~0.Phase 7 — GIL removal (Layer C) — chartered, not scheduled
Do not start until a real GC replaces the arena (planned tier-5 work). Record the prerequisites now so earlier phases don't paint us into corners:
src/shape.zigtransition map is the first thing two mutators would tear) — per-thread shape spaces or transition locks; Pizlo's TTL inference is the reference for keeping unshared objects fast.Valuemust be a design input.Phase 8 — Track TC39 structs
proposal-structs(fixed-layout structs, shared structs,Atomics.Mutex/Condition) once Phases 1-3 are green; shared structs map naturally ontoSharedBufferStorage+ fixed shapes, and the proposal's sync primitives overlap Layer B'sLock/Condition— decide then whether Layer B's API stays engine-specific (like Bun's) or aligns with the proposal's names.Verification
~/.local/share/zig-0.17-dev/zig).zig build conformance— 33 smoke tests, must stay 100%.zig build test262 -Doptimize=ReleaseFast— the real metric (~5 min). Every commit message states the new valid total + percentage.zig build diag -Doptimize=ReleaseFast -- run built-ins/Atomicsclusters failures by thrown error.sanitize_thread = true; new concurrency code must be clean or carry a justified suppression (PR-249 rule).built-ins/Atomics+SharedArrayBuffer≥ 98%, the ported PR-249api/atomics/arrayscorpus is green under Layer B, and the stress suite (transfer/detach races, shared-TA atomics storms, worker teardown, reentrancy) runs flake-free in CI.Notes for agents executing this plan
src/interpreter.zigcontains high bytes — plaingreptreats it as binary; always useLC_ALL=C grep -a.feat: blocking Atomics.wait via waiter table (42,1xx/47,928, 87.x%)).conformance/runner.zigsmoke tests green at every commit; run the full test262 suite before any push that touchessrc/.INTERPRETING.mdfor Layer A; the PR-249 spec docs (docs/threads/,THREAD.md) for Layer B; Pizlo 2017 for Layer C.