Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion compiler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ What this leaves is a small, fast, deterministic core: 47-bit inline integers +
* **Lexer**: Hand-written, LUT-driven scanner (`modules/lexer/{mod,scan,tables}.rs`) over the language's token kinds. Tokens are `(start, end, kind)` offsets into the source buffer; no string copies during lexing. Indentation tracked as INDENT/DEDENT pairs against an explicit stack; UTF-8 BOM stripped.
* **Parser**: Single-pass, Pratt precedence climbing (`modules/parser/`). Emits SSA-versioned bytecode directly (`x` -> `x_1`, `x_2`, ...) with explicit `Phi` opcodes at control-flow joins. No intermediate AST.
* **Optimizer**: One peephole pass (`modules/vm/optimizer.rs`): constant folding over adjacent literal arithmetic / comparison / unary operands, Phi-noop elimination, and dead-instruction compaction with jump-operand remapping. Deliberately leaves `LoadName` alone to preserve the inline-cache slot.
* **VM**: Stack-based interpreter over `Vec<Instruction>`, where each `Instruction` is `(opcode: OpCode, operand: u16)`. The hot loop lives in `modules/vm/dispatch.rs` as a flat `match` on the opcode (Rust lowers it to a jump table); the VM struct and constructor live in `modules/vm/mod.rs`, with `init.rs` / `helpers.rs` / `gc.rs` covering module init, stack/iter primitives, and the collector. The hot path is split across handler modules (`handlers/{arith,data,format,function,methods,methods_helpers,mod}.rs`). `LoadAttr + Call(0)` is fused into a `CallMethod` / `CallMethodArgs` super-instruction at first execution and cached per call site.
* **VM**: Stack-based interpreter over `Vec<Instruction>`, where each `Instruction` is `(opcode: OpCode, operand: u16)`. The hot loop lives in `modules/vm/dispatch.rs` as a flat `match` on the opcode (Rust lowers it to a jump table); the VM struct and constructor live in `modules/vm/mod.rs`, with `init.rs` / `helpers.rs` / `gc.rs` covering module init, stack/iter primitives, and the collector. The hot path is split across handler modules (`handlers/{arith,data,format,function,methods,methods_helpers,mod}.rs`) and a per-type method package (`handlers/builtin_methods/{mod,prelude,string,bytes,list,dict,set}.rs`) where each builtin method is a plain `pub fn` indexed by a static descriptor table. `LoadAttr + Call(0)` is fused into a `CallMethod` / `CallMethodArgs` super-instruction at first execution and cached per call site.
* **Inline Caching**: Two orthogonal per-instruction caches (`modules/vm/cache.rs`). The **scalar IC** records operand type tags for arithmetic and comparison sites; after 4 stable hits it promotes the slot to a typed `FastOp` (`AddInt`, `AddFloat`, `LtFloat`, `EqStr`, ...) with a type-tag guard so a miss falls back to the generic handler. The **instance-dunder IC** caches `(class_idx, method)` for monomorphic instance binop, comparison, and `__getitem__` sites and bypasses `resolve_attr_silent` once promoted; a class-identity miss invalidates without disturbing the scalar slot.
* **Template Memoization**: Pure functions called with the same arguments return a cached result after 2 hits, bypassing full execution. Functions are tagged impure on first observed side effect (`StoreItem`, `StoreAttr`, `print`, `input`, `raise`, `yield`).
* **Memory**: NaN-boxed 64-bit `Val` (47-bit signed inline int, IEEE-754 float, bool, None, 28-bit heap index). Heap is an arena of `HeapObj` slots managed by a mark-and-sweep GC. Strings and bytes ≤ 128 bytes are interned. **Integers are 47-bit inline with automatic i128 (`LongInt`) promotion on overflow**, hard-capped at ±2^127.
Expand Down Expand Up @@ -124,6 +124,14 @@ Mark-and-sweep with roots: operand stack, with-stack, pending yields, event queu
│ │ ├── gc.rs
│ │ ├── handlers
│ │ │ ├── arith.rs
│ │ │ ├── builtin_methods
│ │ │ │ ├── bytes.rs
│ │ │ │ ├── dict.rs
│ │ │ │ ├── list.rs
│ │ │ │ ├── mod.rs
│ │ │ │ ├── prelude.rs
│ │ │ │ ├── set.rs
│ │ │ │ └── string.rs
│ │ │ ├── data.rs
│ │ │ ├── dunder.rs
│ │ │ ├── format.rs
Expand Down
4 changes: 2 additions & 2 deletions compiler/src/modules/vm/dispatch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ impl<'a> VM<'a> {
self.push(v);
}

// V3: extracted to exec_arith_or_compare so VM::dispatch doesn't fuse the IC/deopt cycle into its own symbol.
// Extracted to exec_arith_or_compare so VM::dispatch doesn't fuse the IC/deopt cycle into its own symbol.
OpCode::Add | OpCode::Sub | OpCode::Mul
| OpCode::Mod | OpCode::FloorDiv
| OpCode::Eq | OpCode::Lt | OpCode::NotEq
Expand Down Expand Up @@ -594,7 +594,7 @@ impl<'a> VM<'a> {
Ok(())
}

/* V3: heavy arms extracted out of `dispatch` so wasm-opt can dedup prologues and the dispatcher itself stays compact. */
/* Heavy arms extracted out of `dispatch` so wasm-opt can dedup prologues and the dispatcher itself stays compact. */

#[inline(never)]
fn exec_arith_or_compare(&mut self, opcode: OpCode, rip: usize, cache: &mut OpcodeCache, chunk: &SSAChunk, slots: &mut [Val]) -> Result<(), VmErr> {
Expand Down
119 changes: 119 additions & 0 deletions compiler/src/modules/vm/handlers/builtin_methods/bytes.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
Built-in methods for `bytes` receivers. Arity is checked by the dispatcher.
*/

use super::prelude::*;

// `bytes.decode([encoding])` — invalid UTF-8 errors as ValueError.
pub fn decode(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
if let Some(arg) = pos.first() {
let enc = val_to_str(vm, *arg)?;
if !matches!(enc.as_str(), "utf-8" | "utf8" | "ascii") {
return Err(cold_value("unsupported encoding (expected 'utf-8' or 'ascii')"));
}
}
let text = alloc::string::String::from_utf8(buf)
.map_err(|_| cold_value("invalid UTF-8 in bytes.decode()"))?;
let v = vm.heap.alloc(HeapObj::Str(text))?;
vm.push(v); Ok(())
}

// `bytes.hex()` — lowercase hex of every byte. No separator.
pub fn hex(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let mut out = alloc::string::String::with_capacity(buf.len() * 2);
const HEX: &[u8; 16] = b"0123456789abcdef";
for &b in &buf {
out.push(HEX[(b >> 4) as usize] as char);
out.push(HEX[(b & 0x0F) as usize] as char);
}
let v = vm.heap.alloc(HeapObj::Str(out))?;
vm.push(v); Ok(())
}

// bytes-only; strings go through `string::startswith`.
pub fn startswith(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let prefix = recv_bytes(vm, pos[0])?;
vm.push(Val::bool(buf.starts_with(&prefix)));
Ok(())
}

pub fn endswith(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let suffix = recv_bytes(vm, pos[0])?;
vm.push(Val::bool(buf.ends_with(&suffix)));
Ok(())
}

pub fn find(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let sub = recv_bytes(vm, pos[0])?;
let idx = buf.windows(sub.len()).position(|w| w == sub.as_slice()).map(|i| i as i64).unwrap_or(-1);
vm.push(Val::int(idx));
Ok(())
}

pub fn index(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let sub = recv_bytes(vm, pos[0])?;
let idx = buf.windows(sub.len()).position(|w| w == sub.as_slice()).ok_or(cold_value("subsection not found"))?;
vm.push(Val::int(idx as i64));
Ok(())
}

pub fn count(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let sub = recv_bytes(vm, pos[0])?;
if sub.is_empty() {
vm.push(Val::int(buf.len() as i64 + 1));
return Ok(());
}
let mut n = 0i64;
let mut i = 0usize;
while i + sub.len() <= buf.len() {
if buf[i..i + sub.len()] == sub[..] { n += 1; i += sub.len(); }
else { i += 1; }
}
vm.push(Val::int(n));
Ok(())
}

pub fn replace(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let old = recv_bytes(vm, pos[0])?;
let new = recv_bytes(vm, pos[1])?;
if old.is_empty() {
let v = vm.heap.alloc(HeapObj::Bytes(buf))?;
vm.push(v); return Ok(());
}
let mut out: Vec<u8> = Vec::with_capacity(buf.len());
let mut i = 0usize;
while i < buf.len() {
if i + old.len() <= buf.len() && buf[i..i + old.len()] == old[..] {
out.extend_from_slice(&new); i += old.len();
} else {
out.push(buf[i]); i += 1;
}
}
let v = vm.heap.alloc(HeapObj::Bytes(out))?;
vm.push(v); Ok(())
}

pub fn split(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let buf = recv_bytes(vm, recv)?;
let sep = recv_bytes(vm, pos[0])?;
if sep.is_empty() { return Err(cold_value("empty separator")); }
let mut parts: Vec<Val> = Vec::new();
let mut start = 0usize;
let mut i = 0usize;
while i + sep.len() <= buf.len() {
if buf[i..i + sep.len()] == sep[..] {
parts.push(vm.heap.alloc(HeapObj::Bytes(buf[start..i].to_vec()))?);
i += sep.len(); start = i;
} else { i += 1; }
}
parts.push(vm.heap.alloc(HeapObj::Bytes(buf[start..].to_vec()))?);
vm.alloc_and_push_list(parts)
}
98 changes: 98 additions & 0 deletions compiler/src/modules/vm/handlers/builtin_methods/dict.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
/*
Built-in methods for `dict` receivers. Arity is checked by the dispatcher; `mutating` is marked by the dispatcher when `MethodDesc::mutating` is true.
*/

use super::prelude::*;

pub fn keys(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let entries = dict_entries(vm, recv)?;
let keys: Vec<Val> = entries.into_iter().map(|(k, _)| k).collect();
vm.alloc_and_push_list(keys)
}

pub fn values(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let entries = dict_entries(vm, recv)?;
let vals: Vec<Val> = entries.into_iter().map(|(_, v)| v).collect();
vm.alloc_and_push_list(vals)
}

pub fn items(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let entries = dict_entries(vm, recv)?;
let mut items: Vec<Val> = Vec::with_capacity(entries.len());
for (k, vv) in entries {
let t = vm.heap.alloc(HeapObj::Tuple(vec![k, vv]))?;
items.push(t);
}
vm.alloc_and_push_list(items)
}

// `dict.copy()` — shallow copy; mutations don't affect the original.
pub fn copy(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let entries = dict_entries(vm, recv)?;
let mut dm = DictMap::with_capacity(entries.len());
for (k, v) in entries { dm.insert(k, v); }
vm.alloc_and_push_dict(dm)
}

// `dict.popitem()` — pop the last (k, v); KeyError on empty dict.
pub fn popitem(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let pair = dict_mut(vm, recv, "popitem: receiver is not a dict", |dict| {
let (k, v) = dict.entries.last().copied().ok_or(cold_value("popitem(): dictionary is empty"))?;
dict.remove(&k);
Ok((k, v))
})?;
vm.alloc_and_push_tuple(vec![pair.0, pair.1])
}

pub fn get(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let default = if pos.len() == 2 { pos[1] } else { Val::none() };
let result = match vm.heap.get(recv) {
HeapObj::Dict(rc) => rc.borrow().get(&pos[0]).copied().unwrap_or(default),
_ => return Err(cold_type("get: receiver is not a dict")),
};
vm.push(result); Ok(())
}

pub fn update(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
// Accept a dict or an iterable of 2-element pairs.
let pairs: Vec<(Val, Val)> = if let HeapObj::Dict(rc) = vm.heap.get(pos[0]) {
rc.borrow().entries.clone()
} else {
let items = vm.extract_iter(pos[0], true)?;
let mut out = Vec::with_capacity(items.len());
for it in items {
let pair = match vm.heap.get(it) {
HeapObj::Tuple(v) if v.len() == 2 => (v[0], v[1]),
HeapObj::List(v) if v.borrow().len() == 2 => { let v = v.borrow(); (v[0], v[1]) }
_ => return Err(cold_value("dictionary update sequence element must have length 2")),
};
out.push(pair);
}
out
};
dict_mut(vm, recv, "update: receiver is not a dict", |dict| {
for (k, v) in pairs { dict.insert(k, v); }
Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn pop(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let default = if pos.len() == 2 { Some(pos[1]) } else { None };
let result = dict_mut(vm, recv, "pop: receiver is not a dict", |dict| {
match dict.remove(&pos[0]) {
Some(val) => Ok(val),
None => default.ok_or(cold_value("key not found")),
}
})?;
vm.push(result); Ok(())
}

pub fn setdefault(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let default = if pos.len() > 1 { pos[1] } else { Val::none() };
let result = dict_mut(vm, recv, "setdefault: receiver is not a dict", |dict| {
if let Some(v) = dict.get(&pos[0]).copied() { Ok(v) }
else { dict.insert(pos[0], default); Ok(default) }
})?;
vm.push(result); Ok(())
}
104 changes: 104 additions & 0 deletions compiler/src/modules/vm/handlers/builtin_methods/list.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
/*
Built-in methods for `list` receivers. Arity is checked by the dispatcher; `mutating` is marked by the dispatcher when `MethodDesc::mutating` is true.
*/

use super::prelude::*;

pub fn index(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let items = list_clone(vm, recv)?;
let idx = items.iter()
.position(|&v| eq_vals_with_heap(v, pos[0], &vm.heap))
.map(|i| i as i64)
.ok_or(cold_value("value not found in list"))?;
vm.push(Val::int(idx));
Ok(())
}

pub fn count(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let items = list_clone(vm, recv)?;
let n = items.iter().filter(|&&v| eq_vals_with_heap(v, pos[0], &vm.heap)).count() as i64;
vm.push(Val::int(n));
Ok(())
}

pub fn copy(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let items = list_clone(vm, recv)?;
vm.alloc_and_push_list(items)
}

pub fn append(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
list_mut(vm, recv, "append: receiver is not a list", |list| {
list.push(pos[0]); Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn clear(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
list_mut(vm, recv, "clear: receiver is not a list", |list| {
list.clear(); Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn reverse(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
list_mut(vm, recv, "reverse: receiver is not a list", |list| {
list.reverse(); Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn extend(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let items = vm.extract_iter(pos[0], true)?;
list_mut(vm, recv, "extend: receiver is not a list", |list| {
list.extend_from_slice(&items); Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn insert(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
if !pos[0].is_int() { return Err(cold_type("list indices must be integers")); }
list_mut(vm, recv, "insert: receiver is not a list", |list| {
let i = pos[0].as_int();
let ui = if i < 0 {
(list.len() as i64 + i).max(0) as usize
} else {
(i as usize).min(list.len())
};
list.insert(ui, pos[1]);
Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn remove(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let items = list_clone(vm, recv)?;
let idx = items.iter()
.position(|&v| eq_vals_with_heap(v, pos[0], &vm.heap))
.ok_or(cold_value("list.remove: value not found"))?;
list_mut(vm, recv, "remove: receiver is not a list", |list| {
list.remove(idx); Ok(())
})?;
vm.push(Val::none()); Ok(())
}

pub fn pop(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> {
let popped = list_mut(vm, recv, "pop: receiver is not a list", |list| {
if list.is_empty() { return Err(cold_value("pop from empty list")); }
if pos.is_empty() { return Ok(list.pop().unwrap()); }
if !pos[0].is_int() { return Err(cold_type("list indices must be integers")); }
let i = pos[0].as_int();
let ui = if i < 0 { (list.len() as i64 + i) as usize } else { i as usize };
if ui >= list.len() { return Err(cold_value("pop index out of range")); }
Ok(list.remove(ui))
})?;
vm.push(popped); Ok(())
}

pub fn sort(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> {
let mut sorted = list_clone(vm, recv)?;
vm.sort_by_lt(&mut sorted)?;
list_mut(vm, recv, "sort: receiver is not a list", |list| {
*list = sorted; Ok(())
})?;
vm.push(Val::none()); Ok(())
}
Loading
Loading