Skip to content

Two phase compiler

Jonathan D.A. Jewell edited this page May 20, 2026 · 1 revision

Two-phase compiler

What / why / where. Ephapax's compiler is split across two implementations: a totality-checked Idris2 frontend (parser, typechecker, IR codec) and a Rust backend (IR transforms, WebAssembly codegen). Source .eph files flow through the Idris2 side, get serialised as an S-expression IR, and are picked up by the Rust side for backend work. This page traces that pipeline file-by-file. Live source lives in idris2/src/Ephapax/ (frontend) and src/ephapax-ir/ + src/ephapax-wasm/ (backend). The Zig FFI seam — used by the frontend's lexer for speed — lives at idris2/ffi/zig/tokbuf.zig.

If you only care about what's enforced, read Linear and affine and Region calculus first; this page is about where in the pipeline each enforcement happens.


Pipeline at a glance

.eph source
   │
   │  Ephapax.Parse.Lexer  (Idris2)
   │      ↳ uses Zig tokbuf via %foreign (idris2/ffi/zig/tokbuf.zig)
   ▼
token stream
   │
   │  Ephapax.Parse.Parser  (Idris2, LL(k) recursive descent)
   ▼
surface AST  (Ephapax.IR.AST)
   │
   │  Ephapax.Affine.Typecheck  (Idris2, %default total)
   │      ↳ enforces let / let! discipline and region rules
   ▼
typed AST
   │
   │  Ephapax.IR.Decode  +  Ephapax.Affine.Emit  (Idris2)
   ▼
S-expression IR  (Ephapax.IR.SExpr)
   │
   │  ── boundary ──  serialised, written to disk, re-read in Rust
   │
   ▼
Rust IR  (ephapax-ir)
   │
   │  ephapax-wasm (Rust, wasm-encoder backend)
   ▼
WebAssembly module  (.wasm)

The two phases meet at the S-expression IR. Either side can be swapped wholesale without changing the other — the IR is the contract.


Phase 1: Idris2 frontend

The frontend lives in idris2/src/Ephapax/. Nine modules; all under %default total (eight files) or %default covering (one file — the recursive-descent parser, where the SCT can't trace the mutual recursion through the Stream record). Zero believe_me, zero assert_total, zero assert_smaller. Outcome of the 9-PR campaign #89–#100 merged 2026-05-20. See Proof status for the per-file table.

The modules, in dependency order:

Module Default Purpose
IR/SExpr.idr %default total Generic S-expression parser; fueled.
Parse/Stream.idr %default total Token-stream record; fueled by s.len - s.index.
Parse/Util.idr %default total Combinators (many, sepBy); fueled.
Parse/Lexer.idr %default total Char-level lexer; 57 recursive sites all fueled.
Parse/Parser.idr %default covering LL(k) descent through ~30 mutually-recursive Stream-indexed parsers.
IR/AST.idr %default total (6 covering markers) Surface AST + Show/Eq instances.
IR/Decode.idr %default total (7 covering markers) Encode/decode for the AST.
Affine/Typecheck.idr %default total The dyadic + region typechecker.
Affine/Emit.idr %default total (1 covering marker) One-line wrapper that calls Decode.encode.

Affine/Typecheck.idr is the structurally interesting one: it runs the same let/let! + region rules as the Rust typechecker, but in a totality-checked metalanguage. The two checkers are kept in lockstep by the conformance/ suite — every valid/*.eph must pass both, every invalid/*.eph must be rejected by both.

The Zig FFI seam

The Idris2 lexer outsources tokenisation to a small Zig library for speed. The seam is in Ephapax.Parse.ZigBuffer:

%foreign "C:eph_tokbuf_new,libephapax_tokbuf"
prim__tokbuf_new : Int -> AnyPtr

%foreign "C:eph_tokbuf_free,libephapax_tokbuf"
prim__tokbuf_free : AnyPtr -> PrimIO ()

The actual Zig implementation is in idris2/ffi/zig/tokbuf.zig; the C ABI header is in tokbuf.h. The Zig side does no parsing — only token buffer management — so it doesn't escape the totality guarantees that the Idris2 side carries. (The guarantees are about the Idris2 code; calls across %foreign are by-construction outside the Idris2 type system.)


Phase 2: Rust backend

The backend lives in src/ as ~19 crates; see TOPOLOGY.md for the full dependency graph. The two that matter for this page are:

Crate Purpose
ephapax-typing Rust-side typechecker (runs against .eph directly, parallel to the Idris2 one).
ephapax-ir IR data structures; reads the S-expression format the Idris2 side emits.
ephapax-wasm WebAssembly codegen via wasm-encoder.

ephapax-typing is the Rust crate that owns BindingForm and check_region — the structures referred to throughout Linear and affine and Region calculus. It exists alongside the Idris2 typechecker as a hedge: in practice the Rust crate is what the CLI invokes, while the Idris2 typechecker is what carries the totality story. Both must agree on every conformance test.

ephapax-ir reads the S-expression IR. The IR is intentionally small: a few node kinds, every node a typed S-expression, no metadata that the typechecker hasn't already established.

ephapax-wasm does the lowering. Region exits become bulk-free calls on a per-region arena; let! bindings get explicit consume points; borrows lower to non-owning pointer passes. There is no allocator runtime to ship — only the per-region arenas and a single region_exit function.


Why split it this way?

The cost of the split is real (two parsers, two typecheckers, serialisation overhead). The benefits:

  1. Proof concerns separate from codegen concerns. The parts that need to be totality-checked (parser, typechecker, region rules) live in Idris2, where the metalanguage gives you totality with %default total. The parts that need to produce fast WASM live in Rust, where the ecosystem is mature (wasm-encoder, wasmtime). Neither half is asked to be good at the other's job.

  2. A swappable IR. The S-expression boundary means we could replace the Idris2 frontend with another (e.g. an OCaml one, an Agda one) without touching the backend, or vice versa. The IR is documented in Ephapax.IR.SExpr and Ephapax.IR.Decode.

  3. A clean place to land the totality work. The 9-PR campaign that brought the frontend to %default total (#89–#100, merged 2026-05-20) only had to touch one half of the compiler. The Rust side stayed the same.

  4. Two independent oracles for conformance. Every valid/*.eph runs through both checkers and they must agree. When they disagree, it's a real bug somewhere in one of the four data structures (Mode/BindingForm, Entry/CtxEntry) and the disagreement points right at it.

The cost is paid once at the IR boundary and is bounded; the benefits accrue with every PR.


Cross-references

Clone this wiki locally