-
Notifications
You must be signed in to change notification settings - Fork 0
Two phase compiler
What / why / where. Ephapax's compiler is split across two
implementations: a totality-checked Idris2 frontend (parser,
typechecker, IR codec) and a Rust backend (IR transforms,
WebAssembly codegen). Source .eph files flow through the
Idris2 side, get serialised as an S-expression IR, and are picked
up by the Rust side for backend work. This page traces that
pipeline file-by-file. Live source lives in
idris2/src/Ephapax/ (frontend) and
src/ephapax-ir/ + src/ephapax-wasm/
(backend). The Zig FFI seam — used by the frontend's lexer for
speed — lives at
idris2/ffi/zig/tokbuf.zig.
If you only care about what's enforced, read Linear and affine and Region calculus first; this page is about where in the pipeline each enforcement happens.
.eph source
│
│ Ephapax.Parse.Lexer (Idris2)
│ ↳ uses Zig tokbuf via %foreign (idris2/ffi/zig/tokbuf.zig)
▼
token stream
│
│ Ephapax.Parse.Parser (Idris2, LL(k) recursive descent)
▼
surface AST (Ephapax.IR.AST)
│
│ Ephapax.Affine.Typecheck (Idris2, %default total)
│ ↳ enforces let / let! discipline and region rules
▼
typed AST
│
│ Ephapax.IR.Decode + Ephapax.Affine.Emit (Idris2)
▼
S-expression IR (Ephapax.IR.SExpr)
│
│ ── boundary ── serialised, written to disk, re-read in Rust
│
▼
Rust IR (ephapax-ir)
│
│ ephapax-wasm (Rust, wasm-encoder backend)
▼
WebAssembly module (.wasm)
The two phases meet at the S-expression IR. Either side can be swapped wholesale without changing the other — the IR is the contract.
The frontend lives in
idris2/src/Ephapax/. Nine modules; all under
%default total (eight files) or %default covering (one file —
the recursive-descent parser, where the SCT can't trace the
mutual recursion through the Stream record). Zero believe_me,
zero assert_total, zero assert_smaller. Outcome of the
9-PR campaign #89–#100 merged 2026-05-20. See
Proof status for the per-file table.
The modules, in dependency order:
| Module | Default | Purpose |
|---|---|---|
IR/SExpr.idr |
%default total |
Generic S-expression parser; fueled. |
Parse/Stream.idr |
%default total |
Token-stream record; fueled by s.len - s.index. |
Parse/Util.idr |
%default total |
Combinators (many, sepBy); fueled. |
Parse/Lexer.idr |
%default total |
Char-level lexer; 57 recursive sites all fueled. |
Parse/Parser.idr |
%default covering |
LL(k) descent through ~30 mutually-recursive Stream-indexed parsers. |
IR/AST.idr |
%default total (6 covering markers) |
Surface AST + Show/Eq instances. |
IR/Decode.idr |
%default total (7 covering markers) |
Encode/decode for the AST. |
Affine/Typecheck.idr |
%default total |
The dyadic + region typechecker. |
Affine/Emit.idr |
%default total (1 covering marker) |
One-line wrapper that calls Decode.encode. |
Affine/Typecheck.idr is the structurally interesting one: it
runs the same let/let! + region rules as the Rust
typechecker, but in a totality-checked metalanguage. The two
checkers are kept in lockstep by the
conformance/ suite — every valid/*.eph must
pass both, every invalid/*.eph must be rejected by both.
The Idris2 lexer outsources tokenisation to a small Zig library
for speed. The seam is in
Ephapax.Parse.ZigBuffer:
%foreign "C:eph_tokbuf_new,libephapax_tokbuf"
prim__tokbuf_new : Int -> AnyPtr
%foreign "C:eph_tokbuf_free,libephapax_tokbuf"
prim__tokbuf_free : AnyPtr -> PrimIO ()The actual Zig implementation is in
idris2/ffi/zig/tokbuf.zig; the C ABI header is in
tokbuf.h. The Zig side does no parsing — only
token buffer management — so it doesn't escape the totality
guarantees that the Idris2 side carries. (The
guarantees are about the Idris2 code; calls across %foreign
are by-construction outside the Idris2 type system.)
The backend lives in src/ as ~19 crates; see
TOPOLOGY.md for the full dependency graph. The
two that matter for this page are:
| Crate | Purpose |
|---|---|
ephapax-typing |
Rust-side typechecker (runs against .eph directly, parallel to the Idris2 one). |
ephapax-ir |
IR data structures; reads the S-expression format the Idris2 side emits. |
ephapax-wasm |
WebAssembly codegen via wasm-encoder. |
ephapax-typing is the Rust crate that owns
BindingForm and
check_region — the structures referred to
throughout Linear and affine and
Region calculus. It exists alongside the
Idris2 typechecker as a hedge: in practice the Rust crate is what
the CLI invokes, while the Idris2 typechecker is what carries the
totality story. Both must agree on every conformance test.
ephapax-ir reads the S-expression IR. The IR is intentionally
small: a few node kinds, every node a typed S-expression, no
metadata that the typechecker hasn't already established.
ephapax-wasm does the lowering. Region exits become bulk-free
calls on a per-region arena; let! bindings get explicit consume
points; borrows lower to non-owning pointer passes. There is no
allocator runtime to ship — only the per-region arenas and a
single region_exit function.
The cost of the split is real (two parsers, two typecheckers, serialisation overhead). The benefits:
-
Proof concerns separate from codegen concerns. The parts that need to be totality-checked (parser, typechecker, region rules) live in Idris2, where the metalanguage gives you totality with
%default total. The parts that need to produce fast WASM live in Rust, where the ecosystem is mature (wasm-encoder,wasmtime). Neither half is asked to be good at the other's job. -
A swappable IR. The S-expression boundary means we could replace the Idris2 frontend with another (e.g. an OCaml one, an Agda one) without touching the backend, or vice versa. The IR is documented in
Ephapax.IR.SExprandEphapax.IR.Decode. -
A clean place to land the totality work. The 9-PR campaign that brought the frontend to
%default total(#89–#100, merged 2026-05-20) only had to touch one half of the compiler. The Rust side stayed the same. -
Two independent oracles for conformance. Every
valid/*.ephruns through both checkers and they must agree. When they disagree, it's a real bug somewhere in one of the four data structures (Mode/BindingForm,Entry/CtxEntry) and the disagreement points right at it.
The cost is paid once at the IR boundary and is bounded; the benefits accrue with every PR.
- Linear and affine — what the typechecker enforces
- Region calculus — how regions flow through the pipeline
- What can go wrong — the failure modes, by error code
- Proof status — per-file totality table for Phase 1; Coq theorem table for the semantics
-
TOPOLOGY.md— full module dependency graph - Glossary — "IR", "S-expression", "WASM", "covering", "total", "partial"