From c615b943e49810b03bbb7f6aefe4a4ee3d1942f1 Mon Sep 17 00:00:00 2001 From: dylan-sutton-chavez Date: Tue, 12 May 2026 21:35:09 -0600 Subject: [PATCH] refactor(vm): split builtin methods into per-type files with descriptor table --- compiler/README.md | 10 +- compiler/src/modules/vm/dispatch.rs | 4 +- .../vm/handlers/builtin_methods/bytes.rs | 119 +++ .../vm/handlers/builtin_methods/dict.rs | 98 +++ .../vm/handlers/builtin_methods/list.rs | 104 +++ .../vm/handlers/builtin_methods/mod.rs | 164 ++++ .../vm/handlers/builtin_methods/prelude.rs | 12 + .../vm/handlers/builtin_methods/set.rs | 109 +++ .../vm/handlers/builtin_methods/string.rs | 268 ++++++ compiler/src/modules/vm/handlers/methods.rs | 774 +----------------- .../modules/vm/handlers/methods_helpers.rs | 8 - compiler/src/modules/vm/handlers/mod.rs | 1 + documentation/implementation/design.md | 13 +- documentation/reference/wasm-abi.md | 2 +- 14 files changed, 902 insertions(+), 784 deletions(-) create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/bytes.rs create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/dict.rs create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/list.rs create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/mod.rs create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/prelude.rs create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/set.rs create mode 100644 compiler/src/modules/vm/handlers/builtin_methods/string.rs diff --git a/compiler/README.md b/compiler/README.md index 3ca0e43..9c098c0 100644 --- a/compiler/README.md +++ b/compiler/README.md @@ -22,7 +22,7 @@ What this leaves is a small, fast, deterministic core: 47-bit inline integers + * **Lexer**: Hand-written, LUT-driven scanner (`modules/lexer/{mod,scan,tables}.rs`) over the language's token kinds. Tokens are `(start, end, kind)` offsets into the source buffer; no string copies during lexing. Indentation tracked as INDENT/DEDENT pairs against an explicit stack; UTF-8 BOM stripped. * **Parser**: Single-pass, Pratt precedence climbing (`modules/parser/`). Emits SSA-versioned bytecode directly (`x` -> `x_1`, `x_2`, ...) with explicit `Phi` opcodes at control-flow joins. No intermediate AST. * **Optimizer**: One peephole pass (`modules/vm/optimizer.rs`): constant folding over adjacent literal arithmetic / comparison / unary operands, Phi-noop elimination, and dead-instruction compaction with jump-operand remapping. Deliberately leaves `LoadName` alone to preserve the inline-cache slot. -* **VM**: Stack-based interpreter over `Vec`, where each `Instruction` is `(opcode: OpCode, operand: u16)`. The hot loop lives in `modules/vm/dispatch.rs` as a flat `match` on the opcode (Rust lowers it to a jump table); the VM struct and constructor live in `modules/vm/mod.rs`, with `init.rs` / `helpers.rs` / `gc.rs` covering module init, stack/iter primitives, and the collector. The hot path is split across handler modules (`handlers/{arith,data,format,function,methods,methods_helpers,mod}.rs`). `LoadAttr + Call(0)` is fused into a `CallMethod` / `CallMethodArgs` super-instruction at first execution and cached per call site. +* **VM**: Stack-based interpreter over `Vec`, where each `Instruction` is `(opcode: OpCode, operand: u16)`. The hot loop lives in `modules/vm/dispatch.rs` as a flat `match` on the opcode (Rust lowers it to a jump table); the VM struct and constructor live in `modules/vm/mod.rs`, with `init.rs` / `helpers.rs` / `gc.rs` covering module init, stack/iter primitives, and the collector. The hot path is split across handler modules (`handlers/{arith,data,format,function,methods,methods_helpers,mod}.rs`) and a per-type method package (`handlers/builtin_methods/{mod,prelude,string,bytes,list,dict,set}.rs`) where each builtin method is a plain `pub fn` indexed by a static descriptor table. `LoadAttr + Call(0)` is fused into a `CallMethod` / `CallMethodArgs` super-instruction at first execution and cached per call site. * **Inline Caching**: Two orthogonal per-instruction caches (`modules/vm/cache.rs`). The **scalar IC** records operand type tags for arithmetic and comparison sites; after 4 stable hits it promotes the slot to a typed `FastOp` (`AddInt`, `AddFloat`, `LtFloat`, `EqStr`, ...) with a type-tag guard so a miss falls back to the generic handler. The **instance-dunder IC** caches `(class_idx, method)` for monomorphic instance binop, comparison, and `__getitem__` sites and bypasses `resolve_attr_silent` once promoted; a class-identity miss invalidates without disturbing the scalar slot. * **Template Memoization**: Pure functions called with the same arguments return a cached result after 2 hits, bypassing full execution. Functions are tagged impure on first observed side effect (`StoreItem`, `StoreAttr`, `print`, `input`, `raise`, `yield`). * **Memory**: NaN-boxed 64-bit `Val` (47-bit signed inline int, IEEE-754 float, bool, None, 28-bit heap index). Heap is an arena of `HeapObj` slots managed by a mark-and-sweep GC. Strings and bytes ≤ 128 bytes are interned. **Integers are 47-bit inline with automatic i128 (`LongInt`) promotion on overflow**, hard-capped at ±2^127. @@ -124,6 +124,14 @@ Mark-and-sweep with roots: operand stack, with-stack, pending yields, event queu │ │ ├── gc.rs │ │ ├── handlers │ │ │ ├── arith.rs +│ │ │ ├── builtin_methods +│ │ │ │ ├── bytes.rs +│ │ │ │ ├── dict.rs +│ │ │ │ ├── list.rs +│ │ │ │ ├── mod.rs +│ │ │ │ ├── prelude.rs +│ │ │ │ ├── set.rs +│ │ │ │ └── string.rs │ │ │ ├── data.rs │ │ │ ├── dunder.rs │ │ │ ├── format.rs diff --git a/compiler/src/modules/vm/dispatch.rs b/compiler/src/modules/vm/dispatch.rs index 5280a9e..7d0983c 100644 --- a/compiler/src/modules/vm/dispatch.rs +++ b/compiler/src/modules/vm/dispatch.rs @@ -369,7 +369,7 @@ impl<'a> VM<'a> { self.push(v); } - // V3: extracted to exec_arith_or_compare so VM::dispatch doesn't fuse the IC/deopt cycle into its own symbol. + // Extracted to exec_arith_or_compare so VM::dispatch doesn't fuse the IC/deopt cycle into its own symbol. OpCode::Add | OpCode::Sub | OpCode::Mul | OpCode::Mod | OpCode::FloorDiv | OpCode::Eq | OpCode::Lt | OpCode::NotEq @@ -594,7 +594,7 @@ impl<'a> VM<'a> { Ok(()) } - /* V3: heavy arms extracted out of `dispatch` so wasm-opt can dedup prologues and the dispatcher itself stays compact. */ + /* Heavy arms extracted out of `dispatch` so wasm-opt can dedup prologues and the dispatcher itself stays compact. */ #[inline(never)] fn exec_arith_or_compare(&mut self, opcode: OpCode, rip: usize, cache: &mut OpcodeCache, chunk: &SSAChunk, slots: &mut [Val]) -> Result<(), VmErr> { diff --git a/compiler/src/modules/vm/handlers/builtin_methods/bytes.rs b/compiler/src/modules/vm/handlers/builtin_methods/bytes.rs new file mode 100644 index 0000000..ee14f87 --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/bytes.rs @@ -0,0 +1,119 @@ +/* +Built-in methods for `bytes` receivers. Arity is checked by the dispatcher. +*/ + +use super::prelude::*; + +// `bytes.decode([encoding])` — invalid UTF-8 errors as ValueError. +pub fn decode(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + if let Some(arg) = pos.first() { + let enc = val_to_str(vm, *arg)?; + if !matches!(enc.as_str(), "utf-8" | "utf8" | "ascii") { + return Err(cold_value("unsupported encoding (expected 'utf-8' or 'ascii')")); + } + } + let text = alloc::string::String::from_utf8(buf) + .map_err(|_| cold_value("invalid UTF-8 in bytes.decode()"))?; + let v = vm.heap.alloc(HeapObj::Str(text))?; + vm.push(v); Ok(()) +} + +// `bytes.hex()` — lowercase hex of every byte. No separator. +pub fn hex(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let mut out = alloc::string::String::with_capacity(buf.len() * 2); + const HEX: &[u8; 16] = b"0123456789abcdef"; + for &b in &buf { + out.push(HEX[(b >> 4) as usize] as char); + out.push(HEX[(b & 0x0F) as usize] as char); + } + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +// bytes-only; strings go through `string::startswith`. +pub fn startswith(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let prefix = recv_bytes(vm, pos[0])?; + vm.push(Val::bool(buf.starts_with(&prefix))); + Ok(()) +} + +pub fn endswith(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let suffix = recv_bytes(vm, pos[0])?; + vm.push(Val::bool(buf.ends_with(&suffix))); + Ok(()) +} + +pub fn find(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let sub = recv_bytes(vm, pos[0])?; + let idx = buf.windows(sub.len()).position(|w| w == sub.as_slice()).map(|i| i as i64).unwrap_or(-1); + vm.push(Val::int(idx)); + Ok(()) +} + +pub fn index(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let sub = recv_bytes(vm, pos[0])?; + let idx = buf.windows(sub.len()).position(|w| w == sub.as_slice()).ok_or(cold_value("subsection not found"))?; + vm.push(Val::int(idx as i64)); + Ok(()) +} + +pub fn count(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let sub = recv_bytes(vm, pos[0])?; + if sub.is_empty() { + vm.push(Val::int(buf.len() as i64 + 1)); + return Ok(()); + } + let mut n = 0i64; + let mut i = 0usize; + while i + sub.len() <= buf.len() { + if buf[i..i + sub.len()] == sub[..] { n += 1; i += sub.len(); } + else { i += 1; } + } + vm.push(Val::int(n)); + Ok(()) +} + +pub fn replace(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let old = recv_bytes(vm, pos[0])?; + let new = recv_bytes(vm, pos[1])?; + if old.is_empty() { + let v = vm.heap.alloc(HeapObj::Bytes(buf))?; + vm.push(v); return Ok(()); + } + let mut out: Vec = Vec::with_capacity(buf.len()); + let mut i = 0usize; + while i < buf.len() { + if i + old.len() <= buf.len() && buf[i..i + old.len()] == old[..] { + out.extend_from_slice(&new); i += old.len(); + } else { + out.push(buf[i]); i += 1; + } + } + let v = vm.heap.alloc(HeapObj::Bytes(out))?; + vm.push(v); Ok(()) +} + +pub fn split(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let buf = recv_bytes(vm, recv)?; + let sep = recv_bytes(vm, pos[0])?; + if sep.is_empty() { return Err(cold_value("empty separator")); } + let mut parts: Vec = Vec::new(); + let mut start = 0usize; + let mut i = 0usize; + while i + sep.len() <= buf.len() { + if buf[i..i + sep.len()] == sep[..] { + parts.push(vm.heap.alloc(HeapObj::Bytes(buf[start..i].to_vec()))?); + i += sep.len(); start = i; + } else { i += 1; } + } + parts.push(vm.heap.alloc(HeapObj::Bytes(buf[start..].to_vec()))?); + vm.alloc_and_push_list(parts) +} diff --git a/compiler/src/modules/vm/handlers/builtin_methods/dict.rs b/compiler/src/modules/vm/handlers/builtin_methods/dict.rs new file mode 100644 index 0000000..2dd265e --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/dict.rs @@ -0,0 +1,98 @@ +/* +Built-in methods for `dict` receivers. Arity is checked by the dispatcher; `mutating` is marked by the dispatcher when `MethodDesc::mutating` is true. +*/ + +use super::prelude::*; + +pub fn keys(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let entries = dict_entries(vm, recv)?; + let keys: Vec = entries.into_iter().map(|(k, _)| k).collect(); + vm.alloc_and_push_list(keys) +} + +pub fn values(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let entries = dict_entries(vm, recv)?; + let vals: Vec = entries.into_iter().map(|(_, v)| v).collect(); + vm.alloc_and_push_list(vals) +} + +pub fn items(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let entries = dict_entries(vm, recv)?; + let mut items: Vec = Vec::with_capacity(entries.len()); + for (k, vv) in entries { + let t = vm.heap.alloc(HeapObj::Tuple(vec![k, vv]))?; + items.push(t); + } + vm.alloc_and_push_list(items) +} + +// `dict.copy()` — shallow copy; mutations don't affect the original. +pub fn copy(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let entries = dict_entries(vm, recv)?; + let mut dm = DictMap::with_capacity(entries.len()); + for (k, v) in entries { dm.insert(k, v); } + vm.alloc_and_push_dict(dm) +} + +// `dict.popitem()` — pop the last (k, v); KeyError on empty dict. +pub fn popitem(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let pair = dict_mut(vm, recv, "popitem: receiver is not a dict", |dict| { + let (k, v) = dict.entries.last().copied().ok_or(cold_value("popitem(): dictionary is empty"))?; + dict.remove(&k); + Ok((k, v)) + })?; + vm.alloc_and_push_tuple(vec![pair.0, pair.1]) +} + +pub fn get(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let default = if pos.len() == 2 { pos[1] } else { Val::none() }; + let result = match vm.heap.get(recv) { + HeapObj::Dict(rc) => rc.borrow().get(&pos[0]).copied().unwrap_or(default), + _ => return Err(cold_type("get: receiver is not a dict")), + }; + vm.push(result); Ok(()) +} + +pub fn update(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + // Accept a dict or an iterable of 2-element pairs. + let pairs: Vec<(Val, Val)> = if let HeapObj::Dict(rc) = vm.heap.get(pos[0]) { + rc.borrow().entries.clone() + } else { + let items = vm.extract_iter(pos[0], true)?; + let mut out = Vec::with_capacity(items.len()); + for it in items { + let pair = match vm.heap.get(it) { + HeapObj::Tuple(v) if v.len() == 2 => (v[0], v[1]), + HeapObj::List(v) if v.borrow().len() == 2 => { let v = v.borrow(); (v[0], v[1]) } + _ => return Err(cold_value("dictionary update sequence element must have length 2")), + }; + out.push(pair); + } + out + }; + dict_mut(vm, recv, "update: receiver is not a dict", |dict| { + for (k, v) in pairs { dict.insert(k, v); } + Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn pop(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let default = if pos.len() == 2 { Some(pos[1]) } else { None }; + let result = dict_mut(vm, recv, "pop: receiver is not a dict", |dict| { + match dict.remove(&pos[0]) { + Some(val) => Ok(val), + None => default.ok_or(cold_value("key not found")), + } + })?; + vm.push(result); Ok(()) +} + +pub fn setdefault(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let default = if pos.len() > 1 { pos[1] } else { Val::none() }; + let result = dict_mut(vm, recv, "setdefault: receiver is not a dict", |dict| { + if let Some(v) = dict.get(&pos[0]).copied() { Ok(v) } + else { dict.insert(pos[0], default); Ok(default) } + })?; + vm.push(result); Ok(()) +} diff --git a/compiler/src/modules/vm/handlers/builtin_methods/list.rs b/compiler/src/modules/vm/handlers/builtin_methods/list.rs new file mode 100644 index 0000000..8095fc3 --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/list.rs @@ -0,0 +1,104 @@ +/* +Built-in methods for `list` receivers. Arity is checked by the dispatcher; `mutating` is marked by the dispatcher when `MethodDesc::mutating` is true. +*/ + +use super::prelude::*; + +pub fn index(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let items = list_clone(vm, recv)?; + let idx = items.iter() + .position(|&v| eq_vals_with_heap(v, pos[0], &vm.heap)) + .map(|i| i as i64) + .ok_or(cold_value("value not found in list"))?; + vm.push(Val::int(idx)); + Ok(()) +} + +pub fn count(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let items = list_clone(vm, recv)?; + let n = items.iter().filter(|&&v| eq_vals_with_heap(v, pos[0], &vm.heap)).count() as i64; + vm.push(Val::int(n)); + Ok(()) +} + +pub fn copy(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let items = list_clone(vm, recv)?; + vm.alloc_and_push_list(items) +} + +pub fn append(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + list_mut(vm, recv, "append: receiver is not a list", |list| { + list.push(pos[0]); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn clear(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + list_mut(vm, recv, "clear: receiver is not a list", |list| { + list.clear(); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn reverse(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + list_mut(vm, recv, "reverse: receiver is not a list", |list| { + list.reverse(); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn extend(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let items = vm.extract_iter(pos[0], true)?; + list_mut(vm, recv, "extend: receiver is not a list", |list| { + list.extend_from_slice(&items); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn insert(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + if !pos[0].is_int() { return Err(cold_type("list indices must be integers")); } + list_mut(vm, recv, "insert: receiver is not a list", |list| { + let i = pos[0].as_int(); + let ui = if i < 0 { + (list.len() as i64 + i).max(0) as usize + } else { + (i as usize).min(list.len()) + }; + list.insert(ui, pos[1]); + Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn remove(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let items = list_clone(vm, recv)?; + let idx = items.iter() + .position(|&v| eq_vals_with_heap(v, pos[0], &vm.heap)) + .ok_or(cold_value("list.remove: value not found"))?; + list_mut(vm, recv, "remove: receiver is not a list", |list| { + list.remove(idx); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn pop(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let popped = list_mut(vm, recv, "pop: receiver is not a list", |list| { + if list.is_empty() { return Err(cold_value("pop from empty list")); } + if pos.is_empty() { return Ok(list.pop().unwrap()); } + if !pos[0].is_int() { return Err(cold_type("list indices must be integers")); } + let i = pos[0].as_int(); + let ui = if i < 0 { (list.len() as i64 + i) as usize } else { i as usize }; + if ui >= list.len() { return Err(cold_value("pop index out of range")); } + Ok(list.remove(ui)) + })?; + vm.push(popped); Ok(()) +} + +pub fn sort(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let mut sorted = list_clone(vm, recv)?; + vm.sort_by_lt(&mut sorted)?; + list_mut(vm, recv, "sort: receiver is not a list", |list| { + *list = sorted; Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} diff --git a/compiler/src/modules/vm/handlers/builtin_methods/mod.rs b/compiler/src/modules/vm/handlers/builtin_methods/mod.rs new file mode 100644 index 0000000..87e7f23 --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/mod.rs @@ -0,0 +1,164 @@ +/* +Builtin-method descriptor table + dispatcher. Each method body lives in the per-type file (string.rs / bytes.rs / list.rs / dict.rs / set.rs) as a plain `pub fn`. The descriptor carries name + function pointer + mutating flag + arity range, so the dispatcher checks arity uniformly and the method bodies stay focused on logic. +*/ + +mod prelude; +pub mod bytes; +pub mod dict; +pub mod list; +pub mod set; +pub mod string; + +use prelude::{VM, Val, VmErr, cold_type}; +use crate::s; +use alloc::string::String; + +pub type MethodFn = fn(&mut VM, Val, &[Val]) -> Result<(), VmErr>; + +pub struct MethodDesc { + pub name: &'static str, + pub func: MethodFn, + pub mutating: bool, + pub min_args: u8, + pub max_args: u8, // 255 = unbounded (unused today; reserved). +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub struct BuiltinMethodId(u8); + +impl BuiltinMethodId { + #[inline] pub fn name(self) -> &'static str { ALL_METHODS[self.0 as usize].name } +} + +// Contiguous-by-type so per-type lookup is a range scan over `ALL_METHODS`. +const STR_RANGE: core::ops::Range = 0..25; +const BYTES_RANGE: core::ops::Range = 25..34; +const LIST_RANGE: core::ops::Range = 34..45; +const DICT_RANGE: core::ops::Range = 45..54; +const SET_RANGE: core::ops::Range = 54..68; + +pub static ALL_METHODS: &[MethodDesc] = &[ + // str (0..25) + MethodDesc { name: "encode", func: string::encode, mutating: false, min_args: 0, max_args: 1 }, + MethodDesc { name: "upper", func: string::upper, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "lower", func: string::lower, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "strip", func: string::strip, mutating: false, min_args: 0, max_args: 1 }, + MethodDesc { name: "capitalize", func: string::capitalize, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "title", func: string::title, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "lstrip", func: string::lstrip, mutating: false, min_args: 0, max_args: 1 }, + MethodDesc { name: "rstrip", func: string::rstrip, mutating: false, min_args: 0, max_args: 1 }, + MethodDesc { name: "isdigit", func: string::isdigit, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "isalpha", func: string::isalpha, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "isalnum", func: string::isalnum, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "startswith", func: string::startswith, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "endswith", func: string::endswith, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "find", func: string::find, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "count", func: string::count, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "split", func: string::split, mutating: false, min_args: 0, max_args: 1 }, + MethodDesc { name: "join", func: string::join, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "replace", func: string::replace, mutating: false, min_args: 2, max_args: 2 }, + MethodDesc { name: "removeprefix", func: string::removeprefix, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "removesuffix", func: string::removesuffix, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "splitlines", func: string::splitlines, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "partition", func: string::partition, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "rpartition", func: string::rpartition, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "center", func: string::center, mutating: false, min_args: 1, max_args: 2 }, + MethodDesc { name: "zfill", func: string::zfill, mutating: false, min_args: 1, max_args: 1 }, + + // bytes (25..34) + MethodDesc { name: "decode", func: bytes::decode, mutating: false, min_args: 0, max_args: 1 }, + MethodDesc { name: "hex", func: bytes::hex, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "startswith", func: bytes::startswith, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "endswith", func: bytes::endswith, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "find", func: bytes::find, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "index", func: bytes::index, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "count", func: bytes::count, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "replace", func: bytes::replace, mutating: false, min_args: 2, max_args: 2 }, + MethodDesc { name: "split", func: bytes::split, mutating: false, min_args: 1, max_args: 1 }, + + // list (34..45) + MethodDesc { name: "index", func: list::index, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "count", func: list::count, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "copy", func: list::copy, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "append", func: list::append, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "clear", func: list::clear, mutating: true, min_args: 0, max_args: 0 }, + MethodDesc { name: "reverse", func: list::reverse, mutating: true, min_args: 0, max_args: 0 }, + MethodDesc { name: "extend", func: list::extend, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "insert", func: list::insert, mutating: true, min_args: 2, max_args: 2 }, + MethodDesc { name: "remove", func: list::remove, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "pop", func: list::pop, mutating: true, min_args: 0, max_args: 1 }, + MethodDesc { name: "sort", func: list::sort, mutating: true, min_args: 0, max_args: 0 }, + + // dict (45..54) + MethodDesc { name: "keys", func: dict::keys, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "values", func: dict::values, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "items", func: dict::items, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "copy", func: dict::copy, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "popitem", func: dict::popitem, mutating: true, min_args: 0, max_args: 0 }, + MethodDesc { name: "get", func: dict::get, mutating: false, min_args: 1, max_args: 2 }, + MethodDesc { name: "update", func: dict::update, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "pop", func: dict::pop, mutating: true, min_args: 1, max_args: 2 }, + MethodDesc { name: "setdefault", func: dict::setdefault, mutating: true, min_args: 1, max_args: 2 }, + + // set (54..68) + MethodDesc { name: "add", func: set::add, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "remove", func: set::remove, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "discard", func: set::discard, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "pop", func: set::pop, mutating: true, min_args: 0, max_args: 0 }, + MethodDesc { name: "clear", func: set::clear, mutating: true, min_args: 0, max_args: 0 }, + MethodDesc { name: "update", func: set::update, mutating: true, min_args: 1, max_args: 1 }, + MethodDesc { name: "copy", func: set::copy, mutating: false, min_args: 0, max_args: 0 }, + MethodDesc { name: "union", func: set::union, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "intersection", func: set::intersection, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "difference", func: set::difference, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "symmetric_difference", func: set::symmetric_difference, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "issubset", func: set::issubset, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "issuperset", func: set::issuperset, mutating: false, min_args: 1, max_args: 1 }, + MethodDesc { name: "isdisjoint", func: set::isdisjoint, mutating: false, min_args: 1, max_args: 1 }, +]; + +#[inline] +pub(crate) fn dispatch_method( + vm: &mut VM, id: BuiltinMethodId, recv: Val, pos: &[Val], kw: &[Val], +) -> Result<(), VmErr> { + if !kw.is_empty() { + return Err(cold_type("builtin method takes no keyword arguments")); + } + let m = &ALL_METHODS[id.0 as usize]; + let n = pos.len(); + if n < m.min_args as usize || (m.max_args != 255 && n > m.max_args as usize) { + return Err(arity_error(m.name, m.min_args, m.max_args, n)); + } + let result = (m.func)(vm, recv, pos); + if m.mutating && result.is_ok() { + vm.mark_impure(); + } + result +} + +pub fn lookup_method(ty: &str, attr: &str) -> Option { + let range = match ty { + "str" => STR_RANGE, + "bytes" => BYTES_RANGE, + "list" => LIST_RANGE, + "dict" => DICT_RANGE, + "set" => SET_RANGE, + _ => return None, + }; + ALL_METHODS[range.clone()] + .iter() + .position(|m| m.name == attr) + .map(|i| BuiltinMethodId((range.start + i) as u8)) +} + +#[cold] +fn arity_error(name: &str, min: u8, max: u8, got: usize) -> VmErr { + let msg: String = if min == max { + s!(str name, "() takes ", int min as i64, " arg(s), got ", int got as i64) + } else if max == 255 { + s!(str name, "() takes at least ", int min as i64, ", got ", int got as i64) + } else { + s!(str name, "() takes ", int min as i64, "..", int max as i64, " args, got ", int got as i64) + }; + VmErr::TypeMsg(msg) +} diff --git a/compiler/src/modules/vm/handlers/builtin_methods/prelude.rs b/compiler/src/modules/vm/handlers/builtin_methods/prelude.rs new file mode 100644 index 0000000..db9d2d7 --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/prelude.rs @@ -0,0 +1,12 @@ +/* +Internal prelude for `builtin_methods`. Each per-type file does `use super::prelude::*;` and gets the full surface — VM, Val, HeapObj, type helpers, and the receiver-unwrap primitives. +*/ + +pub(super) use super::super::{VM, Val, VmErr, HeapObj, DictMap}; +pub(super) use super::super::methods_helpers::{ + recv_str, recv_bytes, val_to_str, + list_clone, list_mut, dict_entries, dict_mut, set_clone, set_mut, + iter_to_vec, capitalize_first, title_case, +}; +pub(super) use crate::modules::vm::types::{cold_type, cold_value, eq_vals_with_heap}; +pub(super) use alloc::{string::{String, ToString}, vec, vec::Vec}; diff --git a/compiler/src/modules/vm/handlers/builtin_methods/set.rs b/compiler/src/modules/vm/handlers/builtin_methods/set.rs new file mode 100644 index 0000000..5afb623 --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/set.rs @@ -0,0 +1,109 @@ +/* +Built-in methods for `set` receivers. Arity is checked by the dispatcher; `mutating` is marked by the dispatcher when `MethodDesc::mutating` is true. +*/ + +use super::prelude::*; + +pub fn add(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + set_mut(vm, recv, "add: receiver is not a set", |set| { + set.insert(pos[0]); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn remove(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + set_mut(vm, recv, "remove: receiver is not a set", |set| { + // KeyError, not ValueError. + if !set.remove(&pos[0]) { return Err(VmErr::Raised("KeyError".into())); } + Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn discard(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + set_mut(vm, recv, "discard: receiver is not a set", |set| { + set.remove(&pos[0]); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn pop(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let popped = set_mut(vm, recv, "pop: receiver is not a set", |set| { + // HashSet has no `pop()` — grab via `iter()` and remove. Empty set raises. + let pick = set.iter().next().copied().ok_or(cold_value("pop from an empty set"))?; + set.remove(&pick); + Ok(pick) + })?; + vm.push(popped); Ok(()) +} + +pub fn clear(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + set_mut(vm, recv, "clear: receiver is not a set", |set| { + set.clear(); Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn update(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let items = iter_to_vec(vm, pos[0])?; + set_mut(vm, recv, "update: receiver is not a set", |set| { + for v in items { set.insert(v); } + Ok(()) + })?; + vm.push(Val::none()); Ok(()) +} + +pub fn copy(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let items = set_clone(vm, recv)?; + vm.alloc_and_push_set(items) +} + +pub fn union(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let mut out = set_clone(vm, recv)?; + out.extend(iter_to_vec(vm, pos[0])?); + vm.alloc_and_push_set(out) +} + +pub fn intersection(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let lhs = set_clone(vm, recv)?; + let rhs_items = iter_to_vec(vm, pos[0])?; + let rhs: crate::util::fx::FxHashSet = rhs_items.into_iter().collect(); + let out: Vec = lhs.into_iter().filter(|v| rhs.contains(v)).collect(); + vm.alloc_and_push_set(out) +} + +pub fn difference(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let lhs = set_clone(vm, recv)?; + let rhs_items = iter_to_vec(vm, pos[0])?; + let rhs: crate::util::fx::FxHashSet = rhs_items.into_iter().collect(); + let out: Vec = lhs.into_iter().filter(|v| !rhs.contains(v)).collect(); + vm.alloc_and_push_set(out) +} + +pub fn symmetric_difference(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let lhs: crate::util::fx::FxHashSet = set_clone(vm, recv)?.into_iter().collect(); + let rhs: crate::util::fx::FxHashSet = iter_to_vec(vm, pos[0])?.into_iter().collect(); + let out: Vec = lhs.symmetric_difference(&rhs).copied().collect(); + vm.alloc_and_push_set(out) +} + +pub fn issubset(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let lhs = set_clone(vm, recv)?; + let rhs: crate::util::fx::FxHashSet = iter_to_vec(vm, pos[0])?.into_iter().collect(); + vm.push(Val::bool(lhs.iter().all(|v| rhs.contains(v)))); + Ok(()) +} + +pub fn issuperset(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let lhs: crate::util::fx::FxHashSet = set_clone(vm, recv)?.into_iter().collect(); + let rhs = iter_to_vec(vm, pos[0])?; + vm.push(Val::bool(rhs.iter().all(|v| lhs.contains(v)))); + Ok(()) +} + +pub fn isdisjoint(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let lhs: crate::util::fx::FxHashSet = set_clone(vm, recv)?.into_iter().collect(); + let rhs = iter_to_vec(vm, pos[0])?; + vm.push(Val::bool(!rhs.iter().any(|v| lhs.contains(v)))); + Ok(()) +} diff --git a/compiler/src/modules/vm/handlers/builtin_methods/string.rs b/compiler/src/modules/vm/handlers/builtin_methods/string.rs new file mode 100644 index 0000000..2ef9dcd --- /dev/null +++ b/compiler/src/modules/vm/handlers/builtin_methods/string.rs @@ -0,0 +1,268 @@ +/* +Built-in methods for `str` receivers. Arity is checked by the dispatcher. +*/ + +use super::prelude::*; + +// `str.encode([encoding])` — UTF-8/ASCII only; other names error to block silent mismatches. +pub fn encode(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + if let Some(arg) = pos.first() { + let enc = val_to_str(vm, *arg)?; + match enc.as_str() { + "utf-8" | "utf8" => {} + "ascii" if !s.is_ascii() => { + return Err(cold_value("'ascii' codec can't encode non-ASCII characters")); + } + "ascii" => {} + _ => return Err(cold_value("unsupported encoding (expected 'utf-8' or 'ascii')")), + } + } + let v = vm.heap.alloc(HeapObj::Bytes(s.into_bytes()))?; + vm.push(v); Ok(()) +} + +// str: zero-arg transforms. +pub fn upper(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let v = vm.heap.alloc(HeapObj::Str(s.to_uppercase()))?; + vm.push(v); Ok(()) +} + +pub fn lower(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let v = vm.heap.alloc(HeapObj::Str(s.to_lowercase()))?; + vm.push(v); Ok(()) +} + +pub fn strip(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let out = if pos.is_empty() { + s.trim().to_string() + } else { + let p = val_to_str(vm, pos[0])?; + s.trim_matches(|c| p.contains(c)).to_string() + }; + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +pub fn capitalize(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let v = vm.heap.alloc(HeapObj::Str(capitalize_first(&s)))?; + vm.push(v); Ok(()) +} + +pub fn title(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let v = vm.heap.alloc(HeapObj::Str(title_case(&s)))?; + vm.push(v); Ok(()) +} + +pub fn lstrip(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let out = if pos.is_empty() { + s.trim_start().to_string() + } else { + let p = val_to_str(vm, pos[0])?; + s.trim_start_matches(|c| p.contains(c)).to_string() + }; + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +pub fn rstrip(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let out = if pos.is_empty() { + s.trim_end().to_string() + } else { + let p = val_to_str(vm, pos[0])?; + s.trim_end_matches(|c| p.contains(c)).to_string() + }; + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +pub fn isdigit(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + vm.push(Val::bool(!s.is_empty() && s.chars().all(|c| c.is_ascii_digit()))); + Ok(()) +} + +pub fn isalpha(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + vm.push(Val::bool(!s.is_empty() && s.chars().all(|c| c.is_alphabetic()))); + Ok(()) +} + +pub fn isalnum(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + vm.push(Val::bool(!s.is_empty() && s.chars().all(|c| c.is_alphanumeric()))); + Ok(()) +} + +pub fn startswith(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let p = val_to_str(vm, pos[0])?; + vm.push(Val::bool(s.starts_with(p.as_str()))); + Ok(()) +} + +pub fn endswith(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let p = val_to_str(vm, pos[0])?; + vm.push(Val::bool(s.ends_with(p.as_str()))); + Ok(()) +} + +pub fn find(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let sub = val_to_str(vm, pos[0])?; + let idx = s.find(sub.as_str()) + .map(|i| s[..i].chars().count() as i64) + .unwrap_or(-1); + vm.push(Val::int(idx)); + Ok(()) +} + +pub fn count(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let sub = val_to_str(vm, pos[0])?; + vm.push(Val::int(s.matches(sub.as_str()).count() as i64)); + Ok(()) +} + +pub fn split(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let parts: Vec = if pos.is_empty() { + s.split_whitespace() + .map(|p| vm.heap.alloc(HeapObj::Str(p.to_string()))) + .collect::>()? + } else { + let sep = val_to_str(vm, pos[0])?; + s.split(sep.as_str()) + .map(|p| vm.heap.alloc(HeapObj::Str(p.to_string()))) + .collect::>()? + }; + vm.alloc_and_push_list(parts) +} + +pub fn join(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let sep = recv_str(vm, recv)?; + let items = match vm.heap.get(pos[0]) { + HeapObj::List(rc) => rc.borrow().clone(), + HeapObj::Tuple(v) => v.clone(), + _ => return Err(cold_type("join() argument must be iterable")), + }; + let mut parts: Vec = Vec::with_capacity(items.len()); + for v in items { parts.push(val_to_str(vm, v)?); } + let v = vm.heap.alloc(HeapObj::Str(parts.join(sep.as_str())))?; + vm.push(v); Ok(()) +} + +pub fn replace(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let old = val_to_str(vm, pos[0])?; + let new = val_to_str(vm, pos[1])?; + let v = vm.heap.alloc(HeapObj::Str(s.replace(old.as_str(), new.as_str())))?; + vm.push(v); Ok(()) +} + +// `str.removeprefix` / `removesuffix` — strip if present, else return unchanged. +pub fn removeprefix(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let p = val_to_str(vm, pos[0])?; + let out = s.strip_prefix(p.as_str()).map(|t| t.to_string()).unwrap_or(s); + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +pub fn removesuffix(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let suf = val_to_str(vm, pos[0])?; + let out = s.strip_suffix(suf.as_str()).map(|t| t.to_string()).unwrap_or(s); + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +// `str.splitlines()` — split on \n / \r / \r\n, dropping the separator (keepends=False). +pub fn splitlines(vm: &mut VM, recv: Val, _pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let mut parts: Vec = Vec::new(); + for line in s.split_inclusive(['\n', '\r']) { + let trimmed = line.trim_end_matches(['\n', '\r']).to_string(); + parts.push(vm.heap.alloc(HeapObj::Str(trimmed))?); + } + // Drop the trailing empty segment that split_inclusive leaves when the input ends in a separator. + if let Some(last) = parts.last() + && let HeapObj::Str(t) = vm.heap.get(*last) + && t.is_empty() && s.ends_with(['\n', '\r']) { + parts.pop(); + } + vm.alloc_and_push_list(parts) +} + +// `str.partition` / `rpartition` — (head, sep, tail); on miss returns (s,"","") / ("","",s). +pub fn partition(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let sep = val_to_str(vm, pos[0])?; + if sep.is_empty() { return Err(cold_value("empty separator")); } + let (a, b, c): (String, String, String) = match s.find(sep.as_str()) { + Some(i) => (s[..i].to_string(), sep.clone(), s[i + sep.len()..].to_string()), + None => (s, String::new(), String::new()), + }; + let av = vm.heap.alloc(HeapObj::Str(a))?; + let bv = vm.heap.alloc(HeapObj::Str(b))?; + let cv = vm.heap.alloc(HeapObj::Str(c))?; + vm.alloc_and_push_tuple(vec![av, bv, cv]) +} + +pub fn rpartition(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + let sep = val_to_str(vm, pos[0])?; + if sep.is_empty() { return Err(cold_value("empty separator")); } + let (a, b, c): (String, String, String) = match s.rfind(sep.as_str()) { + Some(i) => (s[..i].to_string(), sep.clone(), s[i + sep.len()..].to_string()), + None => (String::new(), String::new(), s), + }; + let av = vm.heap.alloc(HeapObj::Str(a))?; + let bv = vm.heap.alloc(HeapObj::Str(b))?; + let cv = vm.heap.alloc(HeapObj::Str(c))?; + vm.alloc_and_push_tuple(vec![av, bv, cv]) +} + +// str: padding. +pub fn center(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + let s = recv_str(vm, recv)?; + if !pos[0].is_int() { return Err(cold_type("center() width must be an integer")); } + let width = pos[0].as_int() as usize; + let fill = if pos.len() > 1 { + val_to_str(vm, pos[1])?.chars().next().unwrap_or(' ') + } else { ' ' }; + // Padding measured in code points, not UTF-8 bytes (Unicode parity). + let pad = width.saturating_sub(s.chars().count()); + let left = pad / 2; + let right = pad - left; + let out = fill.to_string().repeat(left) + &s + &fill.to_string().repeat(right); + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} + +pub fn zfill(vm: &mut VM, recv: Val, pos: &[Val]) -> Result<(), VmErr> { + if !pos[0].is_int() { return Err(cold_type("zfill() requires an integer argument")); } + let s = recv_str(vm, recv)?; + let width = pos[0].as_int() as usize; + let nchars = s.chars().count(); + let out = if nchars >= width { + s + } else { + let pad = "0".repeat(width - nchars); + if s.starts_with('+') || s.starts_with('-') { + s[..1].to_string() + &pad + &s[1..] + } else { + pad + &s + } + }; + let v = vm.heap.alloc(HeapObj::Str(out))?; + vm.push(v); Ok(()) +} diff --git a/compiler/src/modules/vm/handlers/methods.rs b/compiler/src/modules/vm/handlers/methods.rs index a0adf81..ecb80e8 100644 --- a/compiler/src/modules/vm/handlers/methods.rs +++ b/compiler/src/modules/vm/handlers/methods.rs @@ -1,12 +1,14 @@ /* -Built-in methods (str/list/dict). The `define_methods!` macro generates the enum, lookup, and dispatcher from one table — adding a method is one row. +Attribute resolution for `LoadAttr` / `CallMethod`. Built-in method bodies live in `builtin_methods/`; this file owns `AttrLookup`, the resolver, and the `__getattr__` fallback. */ use super::*; -use super::methods_helpers::*; use crate::alloc::string::ToString; use crate::s; +pub use super::builtin_methods::BuiltinMethodId; +pub(crate) use super::builtin_methods::{dispatch_method, lookup_method}; + // `resolve_attr` result — every shape LoadAttr / CallMethod dispatches on. pub(crate) enum AttrLookup { ModuleAttr(Val), @@ -185,771 +187,3 @@ impl<'a> VM<'a> { } } } - -// Row: (Variant, "name", category, body). `mutating` auto-emits mark_impure; variant prefix picks the receiver. -// V1 refactor: per-method free function + static fn-pointer table. Lets LLVM/wasm-opt dedup prologues and stops fusing 71 bodies into one symbol. -type MethodFn = fn(&mut VM, Val, &[Val]) -> Result<(), VmErr>; - -macro_rules! define_methods { - ( $( ($variant:ident, $name:literal, $cat:ident, |$vm:ident, $recv:ident, $pos:ident| $body:block) ),* $(,)? ) => { - - $( - #[inline(never)] - #[allow(non_snake_case)] - fn $variant($vm: &mut VM, $recv: Val, $pos: &[Val]) -> Result<(), VmErr> { - let result: Result<(), VmErr> = (|| $body)(); - define_methods!(@maybe_impure $cat, $vm, result) - } - )* - - #[derive(Clone, Copy, Debug, PartialEq, Eq)] - #[repr(u8)] - pub enum BuiltinMethodId { - $( $variant ),* - } - - impl BuiltinMethodId { - #[inline] - pub fn name(self) -> &'static str { - match self { $( Self::$variant => $name ),* } - } - } - - static METHOD_TABLE: &[MethodFn] = &[ $( $variant ),* ]; - - pub(crate) fn dispatch_method(vm: &mut VM, id: BuiltinMethodId, recv: Val, pos: &[Val], kw: &[Val]) -> Result<(), VmErr> { - if !kw.is_empty() { - return Err(cold_type("builtin method takes no keyword arguments")); - } - METHOD_TABLE[id as usize](vm, recv, pos) - } - - // Off the hot path — CallMethod fusion bypasses LoadAttr+Call entirely. - pub fn lookup_method(ty: &str, attr: &str) -> Option { - let prefix = match ty { - "str" => "Str", - "bytes" => "Bytes", - "list" => "List", - "dict" => "Dict", - "set" => "Set", - _ => return None, - }; - $( - if attr == $name && stringify!($variant).starts_with(prefix) { - return Some(BuiltinMethodId::$variant); - } - )* - None - } - }; - - (@maybe_impure mutating, $vm:ident, $r:ident) => {{ - if $r.is_ok() { $vm.mark_impure(); } - $r - }}; - (@maybe_impure pure, $vm:ident, $r:ident) => { $r }; -} - -define_methods! { - // `str.encode([encoding])` — UTF-8/ASCII only; other names error to block silent mismatches. - (StrEncode, "encode", pure, |vm, recv, pos| { - check_arity(pos, 0, 1, "encode takes 0 or 1 arguments")?; - let s = recv_str(vm, recv)?; - if let Some(arg) = pos.first() { - let enc = val_to_str(vm, *arg)?; - match enc.as_str() { - "utf-8" | "utf8" => {} - "ascii" if !s.is_ascii() => { - return Err(cold_value("'ascii' codec can't encode non-ASCII characters")); - } - "ascii" => {} - _ => return Err(cold_value("unsupported encoding (expected 'utf-8' or 'ascii')")), - } - } - let v = vm.heap.alloc(HeapObj::Bytes(s.into_bytes()))?; - vm.push(v); Ok(()) - }), - - // `bytes.decode([encoding])` — invalid UTF-8 errors as ValueError. - (BytesDecode, "decode", pure, |vm, recv, pos| { - check_arity(pos, 0, 1, "decode takes 0 or 1 arguments")?; - let buf = recv_bytes(vm, recv)?; - if let Some(arg) = pos.first() { - let enc = val_to_str(vm, *arg)?; - if !matches!(enc.as_str(), "utf-8" | "utf8" | "ascii") { - return Err(cold_value("unsupported encoding (expected 'utf-8' or 'ascii')")); - } - } - let text = alloc::string::String::from_utf8(buf) - .map_err(|_| cold_value("invalid UTF-8 in bytes.decode()"))?; - let v = vm.heap.alloc(HeapObj::Str(text))?; - vm.push(v); Ok(()) - }), - - // `bytes.hex()` — lowercase hex of every byte. No separator. - (BytesHex, "hex", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "hex takes no arguments")?; - let buf = recv_bytes(vm, recv)?; - let mut out = alloc::string::String::with_capacity(buf.len() * 2); - const HEX: &[u8; 16] = b"0123456789abcdef"; - for &b in &buf { - out.push(HEX[(b >> 4) as usize] as char); - out.push(HEX[(b & 0x0F) as usize] as char); - } - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - - // `bytes.startswith` / `bytes.endswith` — bytes-only; strings go through `str.startswith`. - (BytesStartswith, "startswith", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "startswith takes 1 argument")?; - let buf = recv_bytes(vm, recv)?; - let prefix = recv_bytes(vm, pos[0])?; - vm.push(Val::bool(buf.starts_with(&prefix))); - Ok(()) - }), - (BytesEndswith, "endswith", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "endswith takes 1 argument")?; - let buf = recv_bytes(vm, recv)?; - let suffix = recv_bytes(vm, pos[0])?; - vm.push(Val::bool(buf.ends_with(&suffix))); - Ok(()) - }), - - // str: zero-arg transforms. - (StrUpper, "upper", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "upper takes no arguments")?; - let s = recv_str(vm, recv)?; - let v = vm.heap.alloc(HeapObj::Str(s.to_uppercase()))?; - vm.push(v); Ok(()) - }), - (StrLower, "lower", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "lower takes no arguments")?; - let s = recv_str(vm, recv)?; - let v = vm.heap.alloc(HeapObj::Str(s.to_lowercase()))?; - vm.push(v); Ok(()) - }), - (StrStrip, "strip", pure, |vm, recv, pos| { - check_arity(pos, 0, 1, "strip takes 0 or 1 arguments")?; - let s = recv_str(vm, recv)?; - let out = if pos.is_empty() { - s.trim().to_string() - } else { - let p = val_to_str(vm, pos[0])?; - s.trim_matches(|c| p.contains(c)).to_string() - }; - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - (StrCapitalize, "capitalize", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "capitalize takes no arguments")?; - let s = recv_str(vm, recv)?; - let v = vm.heap.alloc(HeapObj::Str(capitalize_first(&s)))?; - vm.push(v); Ok(()) - }), - (StrTitle, "title", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "title takes no arguments")?; - let s = recv_str(vm, recv)?; - let v = vm.heap.alloc(HeapObj::Str(title_case(&s)))?; - vm.push(v); Ok(()) - }), - - // str: optional separator. - (StrLstrip, "lstrip", pure, |vm, recv, pos| { - check_arity(pos, 0, 1, "lstrip takes 0 or 1 arguments")?; - let s = recv_str(vm, recv)?; - let out = if pos.is_empty() { - s.trim_start().to_string() - } else { - let p = val_to_str(vm, pos[0])?; - s.trim_start_matches(|c| p.contains(c)).to_string() - }; - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - (StrRstrip, "rstrip", pure, |vm, recv, pos| { - check_arity(pos, 0, 1, "rstrip takes 0 or 1 arguments")?; - let s = recv_str(vm, recv)?; - let out = if pos.is_empty() { - s.trim_end().to_string() - } else { - let p = val_to_str(vm, pos[0])?; - s.trim_end_matches(|c| p.contains(c)).to_string() - }; - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - - // str: predicates. - (StrIsDigit, "isdigit", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "isdigit takes no arguments")?; - let s = recv_str(vm, recv)?; - vm.push(Val::bool(!s.is_empty() && s.chars().all(|c| c.is_ascii_digit()))); - Ok(()) - }), - (StrIsAlpha, "isalpha", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "isalpha takes no arguments")?; - let s = recv_str(vm, recv)?; - vm.push(Val::bool(!s.is_empty() && s.chars().all(|c| c.is_alphabetic()))); - Ok(()) - }), - (StrIsAlnum, "isalnum", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "isalnum takes no arguments")?; - let s = recv_str(vm, recv)?; - vm.push(Val::bool(!s.is_empty() && s.chars().all(|c| c.is_alphanumeric()))); - Ok(()) - }), - - // str: queries with one string arg. - (StrStartswith, "startswith", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "startswith takes 1 argument")?; - let s = recv_str(vm, recv)?; - let p = val_to_str(vm, pos[0])?; - vm.push(Val::bool(s.starts_with(p.as_str()))); - Ok(()) - }), - (StrEndswith, "endswith", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "endswith takes 1 argument")?; - let s = recv_str(vm, recv)?; - let p = val_to_str(vm, pos[0])?; - vm.push(Val::bool(s.ends_with(p.as_str()))); - Ok(()) - }), - (StrFind, "find", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "find takes 1 argument")?; - let s = recv_str(vm, recv)?; - let sub = val_to_str(vm, pos[0])?; - let idx = s.find(sub.as_str()) - .map(|i| s[..i].chars().count() as i64) - .unwrap_or(-1); - vm.push(Val::int(idx)); - Ok(()) - }), - (StrCount, "count", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "count takes 1 argument")?; - let s = recv_str(vm, recv)?; - let sub = val_to_str(vm, pos[0])?; - vm.push(Val::int(s.matches(sub.as_str()).count() as i64)); - Ok(()) - }), - - // str: split / join / replace. - (StrSplit, "split", pure, |vm, recv, pos| { - check_arity(pos, 0, 1, "split takes 0 or 1 arguments")?; - let s = recv_str(vm, recv)?; - let parts: Vec = if pos.is_empty() { - s.split_whitespace() - .map(|p| vm.heap.alloc(HeapObj::Str(p.to_string()))) - .collect::>()? - } else { - let sep = val_to_str(vm, pos[0])?; - s.split(sep.as_str()) - .map(|p| vm.heap.alloc(HeapObj::Str(p.to_string()))) - .collect::>()? - }; - vm.alloc_and_push_list(parts) - }), - (StrJoin, "join", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "join takes 1 argument")?; - let sep = recv_str(vm, recv)?; - let items = match vm.heap.get(pos[0]) { - HeapObj::List(rc) => rc.borrow().clone(), - HeapObj::Tuple(v) => v.clone(), - _ => return Err(cold_type("join() argument must be iterable")), - }; - let mut parts: Vec = Vec::with_capacity(items.len()); - for v in items { parts.push(val_to_str(vm, v)?); } - let v = vm.heap.alloc(HeapObj::Str(parts.join(sep.as_str())))?; - vm.push(v); Ok(()) - }), - (StrReplace, "replace", pure, |vm, recv, pos| { - check_arity(pos, 2, 2, "replace takes 2 arguments")?; - let s = recv_str(vm, recv)?; - let old = val_to_str(vm, pos[0])?; - let new = val_to_str(vm, pos[1])?; - let v = vm.heap.alloc(HeapObj::Str(s.replace(old.as_str(), new.as_str())))?; - vm.push(v); Ok(()) - }), - - // `str.removeprefix` / `removesuffix` — strip if present, else return unchanged. - (StrRemovePrefix, "removeprefix", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "removeprefix takes 1 argument")?; - let s = recv_str(vm, recv)?; - let p = val_to_str(vm, pos[0])?; - let out = s.strip_prefix(p.as_str()).map(|t| t.to_string()).unwrap_or(s); - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - (StrRemoveSuffix, "removesuffix", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "removesuffix takes 1 argument")?; - let s = recv_str(vm, recv)?; - let suf = val_to_str(vm, pos[0])?; - let out = s.strip_suffix(suf.as_str()).map(|t| t.to_string()).unwrap_or(s); - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - - // `str.splitlines()` — split on \n / \r / \r\n, dropping the separator (keepends=False). - (StrSplitlines, "splitlines", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "splitlines takes no arguments")?; - let s = recv_str(vm, recv)?; - let mut parts: Vec = Vec::new(); - for line in s.split_inclusive(['\n', '\r']) { - let trimmed = line.trim_end_matches(['\n', '\r']).to_string(); - parts.push(vm.heap.alloc(HeapObj::Str(trimmed))?); - } - // Drop the trailing empty segment that split_inclusive leaves when the input ends in a separator. - if let Some(last) = parts.last() - && let HeapObj::Str(t) = vm.heap.get(*last) - && t.is_empty() && s.ends_with(['\n', '\r']) { - parts.pop(); - } - vm.alloc_and_push_list(parts) - }), - - // `str.partition` / `rpartition` — (head, sep, tail); on miss returns (s,"","") / ("","",s). - (StrPartition, "partition", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "partition takes 1 argument")?; - let s = recv_str(vm, recv)?; - let sep = val_to_str(vm, pos[0])?; - if sep.is_empty() { return Err(cold_value("empty separator")); } - let (a, b, c): (String, String, String) = match s.find(sep.as_str()) { - Some(i) => (s[..i].to_string(), sep.clone(), s[i + sep.len()..].to_string()), - None => (s, String::new(), String::new()), - }; - let av = vm.heap.alloc(HeapObj::Str(a))?; - let bv = vm.heap.alloc(HeapObj::Str(b))?; - let cv = vm.heap.alloc(HeapObj::Str(c))?; - vm.alloc_and_push_tuple(vec![av, bv, cv]) - }), - (StrRPartition, "rpartition", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "rpartition takes 1 argument")?; - let s = recv_str(vm, recv)?; - let sep = val_to_str(vm, pos[0])?; - if sep.is_empty() { return Err(cold_value("empty separator")); } - let (a, b, c): (String, String, String) = match s.rfind(sep.as_str()) { - Some(i) => (s[..i].to_string(), sep.clone(), s[i + sep.len()..].to_string()), - None => (String::new(), String::new(), s), - }; - let av = vm.heap.alloc(HeapObj::Str(a))?; - let bv = vm.heap.alloc(HeapObj::Str(b))?; - let cv = vm.heap.alloc(HeapObj::Str(c))?; - vm.alloc_and_push_tuple(vec![av, bv, cv]) - }), - - // `bytes.find` / index / count / replace — byte-oriented analogs of str methods. - (BytesFind, "find", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "find takes 1 argument")?; - let buf = recv_bytes(vm, recv)?; - let sub = recv_bytes(vm, pos[0])?; - let idx = buf.windows(sub.len()).position(|w| w == sub.as_slice()).map(|i| i as i64).unwrap_or(-1); - vm.push(Val::int(idx)); - Ok(()) - }), - (BytesIndex, "index", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "index takes 1 argument")?; - let buf = recv_bytes(vm, recv)?; - let sub = recv_bytes(vm, pos[0])?; - let idx = buf.windows(sub.len()).position(|w| w == sub.as_slice()).ok_or(cold_value("subsection not found"))?; - vm.push(Val::int(idx as i64)); - Ok(()) - }), - (BytesCount, "count", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "count takes 1 argument")?; - let buf = recv_bytes(vm, recv)?; - let sub = recv_bytes(vm, pos[0])?; - if sub.is_empty() { - vm.push(Val::int(buf.len() as i64 + 1)); - return Ok(()); - } - let mut n = 0i64; - let mut i = 0usize; - while i + sub.len() <= buf.len() { - if buf[i..i + sub.len()] == sub[..] { n += 1; i += sub.len(); } - else { i += 1; } - } - vm.push(Val::int(n)); - Ok(()) - }), - (BytesReplace, "replace", pure, |vm, recv, pos| { - check_arity(pos, 2, 2, "replace takes 2 arguments")?; - let buf = recv_bytes(vm, recv)?; - let old = recv_bytes(vm, pos[0])?; - let new = recv_bytes(vm, pos[1])?; - if old.is_empty() { - let v = vm.heap.alloc(HeapObj::Bytes(buf))?; - vm.push(v); return Ok(()); - } - let mut out: Vec = Vec::with_capacity(buf.len()); - let mut i = 0usize; - while i < buf.len() { - if i + old.len() <= buf.len() && buf[i..i + old.len()] == old[..] { - out.extend_from_slice(&new); i += old.len(); - } else { - out.push(buf[i]); i += 1; - } - } - let v = vm.heap.alloc(HeapObj::Bytes(out))?; - vm.push(v); Ok(()) - }), - (BytesSplit, "split", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "split takes 1 argument")?; - let buf = recv_bytes(vm, recv)?; - let sep = recv_bytes(vm, pos[0])?; - if sep.is_empty() { return Err(cold_value("empty separator")); } - let mut parts: Vec = Vec::new(); - let mut start = 0usize; - let mut i = 0usize; - while i + sep.len() <= buf.len() { - if buf[i..i + sep.len()] == sep[..] { - parts.push(vm.heap.alloc(HeapObj::Bytes(buf[start..i].to_vec()))?); - i += sep.len(); start = i; - } else { i += 1; } - } - parts.push(vm.heap.alloc(HeapObj::Bytes(buf[start..].to_vec()))?); - vm.alloc_and_push_list(parts) - }), - - // str: padding. - (StrCenter, "center", pure, |vm, recv, pos| { - check_arity(pos, 1, 2, "center takes 1 or 2 arguments")?; - let s = recv_str(vm, recv)?; - if !pos[0].is_int() { return Err(cold_type("center() width must be an integer")); } - let width = pos[0].as_int() as usize; - let fill = if pos.len() > 1 { - val_to_str(vm, pos[1])?.chars().next().unwrap_or(' ') - } else { ' ' }; - // Padding measured in code points, not UTF-8 bytes (Unicode parity). - let pad = width.saturating_sub(s.chars().count()); - let left = pad / 2; - let right = pad - left; - let out = fill.to_string().repeat(left) + &s + &fill.to_string().repeat(right); - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - (StrZfill, "zfill", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "zfill takes 1 argument")?; - if !pos[0].is_int() { return Err(cold_type("zfill() requires an integer argument")); } - let s = recv_str(vm, recv)?; - let width = pos[0].as_int() as usize; - let nchars = s.chars().count(); - let out = if nchars >= width { - s - } else { - let pad = "0".repeat(width - nchars); - if s.starts_with('+') || s.starts_with('-') { - s[..1].to_string() + &pad + &s[1..] - } else { - pad + &s - } - }; - let v = vm.heap.alloc(HeapObj::Str(out))?; - vm.push(v); Ok(()) - }), - - // list: pure. - (ListIndex, "index", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "index takes 1 argument")?; - let items = list_clone(vm, recv)?; - let idx = items.iter() - .position(|&v| eq_vals_with_heap(v, pos[0], &vm.heap)) - .map(|i| i as i64) - .ok_or(cold_value("value not found in list"))?; - vm.push(Val::int(idx)); - Ok(()) - }), - (ListCount, "count", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "count takes 1 argument")?; - let items = list_clone(vm, recv)?; - let n = items.iter().filter(|&&v| eq_vals_with_heap(v, pos[0], &vm.heap)).count() as i64; - vm.push(Val::int(n)); - Ok(()) - }), - (ListCopy, "copy", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "copy takes no arguments")?; - let items = list_clone(vm, recv)?; - vm.alloc_and_push_list(items) - }), - - // list: mutating. - (ListAppend, "append", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "append takes 1 argument")?; - list_mut(vm, recv, "append: receiver is not a list", |list| { - list.push(pos[0]); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (ListClear, "clear", mutating, |vm, recv, pos| { - check_arity(pos, 0, 0, "clear takes no arguments")?; - list_mut(vm, recv, "clear: receiver is not a list", |list| { - list.clear(); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (ListReverse, "reverse", mutating, |vm, recv, pos| { - check_arity(pos, 0, 0, "reverse takes no arguments")?; - list_mut(vm, recv, "reverse: receiver is not a list", |list| { - list.reverse(); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (ListExtend, "extend", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "extend takes 1 argument")?; - let items = vm.extract_iter(pos[0], true)?; - list_mut(vm, recv, "extend: receiver is not a list", |list| { - list.extend_from_slice(&items); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (ListInsert, "insert", mutating, |vm, recv, pos| { - check_arity(pos, 2, 2, "insert takes 2 arguments")?; - if !pos[0].is_int() { return Err(cold_type("list indices must be integers")); } - list_mut(vm, recv, "insert: receiver is not a list", |list| { - let i = pos[0].as_int(); - let ui = if i < 0 { - (list.len() as i64 + i).max(0) as usize - } else { - (i as usize).min(list.len()) - }; - list.insert(ui, pos[1]); - Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (ListRemove, "remove", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "remove takes 1 argument")?; - let items = list_clone(vm, recv)?; - let idx = items.iter() - .position(|&v| eq_vals_with_heap(v, pos[0], &vm.heap)) - .ok_or(cold_value("list.remove: value not found"))?; - list_mut(vm, recv, "remove: receiver is not a list", |list| { - list.remove(idx); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (ListPop, "pop", mutating, |vm, recv, pos| { - check_arity(pos, 0, 1, "pop takes 0 or 1 arguments")?; - let popped = list_mut(vm, recv, "pop: receiver is not a list", |list| { - if list.is_empty() { return Err(cold_value("pop from empty list")); } - if pos.is_empty() { return Ok(list.pop().unwrap()); } - if !pos[0].is_int() { return Err(cold_type("list indices must be integers")); } - let i = pos[0].as_int(); - let ui = if i < 0 { (list.len() as i64 + i) as usize } else { i as usize }; - if ui >= list.len() { return Err(cold_value("pop index out of range")); } - Ok(list.remove(ui)) - })?; - vm.push(popped); Ok(()) - }), - (ListSort, "sort", mutating, |vm, recv, pos| { - check_arity(pos, 0, 0, "sort takes no arguments")?; - let mut sorted = list_clone(vm, recv)?; - vm.sort_by_lt(&mut sorted)?; - list_mut(vm, recv, "sort: receiver is not a list", |list| { - *list = sorted; Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - - // dict. - (DictKeys, "keys", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "keys takes no arguments")?; - let entries = dict_entries(vm, recv)?; - let keys: Vec = entries.into_iter().map(|(k, _)| k).collect(); - vm.alloc_and_push_list(keys) - }), - (DictValues, "values", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "values takes no arguments")?; - let entries = dict_entries(vm, recv)?; - let vals: Vec = entries.into_iter().map(|(_, v)| v).collect(); - vm.alloc_and_push_list(vals) - }), - (DictItems, "items", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "items takes no arguments")?; - let entries = dict_entries(vm, recv)?; - let mut items: Vec = Vec::with_capacity(entries.len()); - for (k, vv) in entries { - let t = vm.heap.alloc(HeapObj::Tuple(vec![k, vv]))?; - items.push(t); - } - vm.alloc_and_push_list(items) - }), - // `dict.copy()` — shallow copy; mutations don't affect the original. - (DictCopy, "copy", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "copy takes no arguments")?; - let entries = dict_entries(vm, recv)?; - let mut dm = DictMap::with_capacity(entries.len()); - for (k, v) in entries { dm.insert(k, v); } - vm.alloc_and_push_dict(dm) - }), - // `dict.popitem()` — pop the last (k, v); KeyError on empty dict. - (DictPopItem, "popitem", mutating, |vm, recv, pos| { - check_arity(pos, 0, 0, "popitem takes no arguments")?; - let pair = dict_mut(vm, recv, "popitem: receiver is not a dict", |dict| { - let (k, v) = dict.entries.last().copied().ok_or(cold_value("popitem(): dictionary is empty"))?; - dict.remove(&k); - Ok((k, v)) - })?; - vm.alloc_and_push_tuple(vec![pair.0, pair.1]) - }), - (DictGet, "get", pure, |vm, recv, pos| { - check_arity(pos, 1, 2, "get takes 1 or 2 arguments")?; - let default = if pos.len() == 2 { pos[1] } else { Val::none() }; - let result = match vm.heap.get(recv) { - HeapObj::Dict(rc) => rc.borrow().get(&pos[0]).copied().unwrap_or(default), - _ => return Err(cold_type("get: receiver is not a dict")), - }; - vm.push(result); Ok(()) - }), - (DictUpdate, "update", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "update takes 1 argument")?; - // Accept a dict or an iterable of 2-element pairs. - let pairs: Vec<(Val, Val)> = if let HeapObj::Dict(rc) = vm.heap.get(pos[0]) { - rc.borrow().entries.clone() - } else { - let items = vm.extract_iter(pos[0], true)?; - let mut out = Vec::with_capacity(items.len()); - for it in items { - let pair = match vm.heap.get(it) { - HeapObj::Tuple(v) if v.len() == 2 => (v[0], v[1]), - HeapObj::List(v) if v.borrow().len() == 2 => { let v = v.borrow(); (v[0], v[1]) } - _ => return Err(cold_value("dictionary update sequence element must have length 2")), - }; - out.push(pair); - } - out - }; - dict_mut(vm, recv, "update: receiver is not a dict", |dict| { - for (k, v) in pairs { dict.insert(k, v); } - Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (DictPop, "pop", mutating, |vm, recv, pos| { - check_arity(pos, 1, 2, "pop takes 1 or 2 arguments")?; - let default = if pos.len() == 2 { Some(pos[1]) } else { None }; - let result = dict_mut(vm, recv, "pop: receiver is not a dict", |dict| { - match dict.remove(&pos[0]) { - Some(val) => Ok(val), - None => default.ok_or(cold_value("key not found")), - } - })?; - vm.push(result); Ok(()) - }), - (DictSetDefault, "setdefault", mutating, |vm, recv, pos| { - check_arity(pos, 1, 2, "setdefault takes 1 or 2 arguments")?; - let default = if pos.len() > 1 { pos[1] } else { Val::none() }; - let result = dict_mut(vm, recv, "setdefault: receiver is not a dict", |dict| { - if let Some(v) = dict.get(&pos[0]).copied() { Ok(v) } - else { dict.insert(pos[0], default); Ok(default) } - })?; - vm.push(result); Ok(()) - }), - - // set: mutating. - (SetAdd, "add", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "add takes 1 argument")?; - set_mut(vm, recv, "add: receiver is not a set", |set| { - set.insert(pos[0]); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (SetRemove, "remove", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "remove takes 1 argument")?; - set_mut(vm, recv, "remove: receiver is not a set", |set| { - // KeyError, not ValueError. - if !set.remove(&pos[0]) { return Err(VmErr::Raised("KeyError".into())); } - Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (SetDiscard, "discard", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "discard takes 1 argument")?; - set_mut(vm, recv, "discard: receiver is not a set", |set| { - set.remove(&pos[0]); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (SetPop, "pop", mutating, |vm, recv, pos| { - check_arity(pos, 0, 0, "pop takes no arguments")?; - let popped = set_mut(vm, recv, "pop: receiver is not a set", |set| { - // HashSet has no `pop()` — grab via `iter()` and remove. Empty set raises. - let pick = set.iter().next().copied().ok_or(cold_value("pop from an empty set"))?; - set.remove(&pick); - Ok(pick) - })?; - vm.push(popped); Ok(()) - }), - (SetClear, "clear", mutating, |vm, recv, pos| { - check_arity(pos, 0, 0, "clear takes no arguments")?; - set_mut(vm, recv, "clear: receiver is not a set", |set| { - set.clear(); Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - (SetUpdate, "update", mutating, |vm, recv, pos| { - check_arity(pos, 1, 1, "update takes 1 argument")?; - let items = iter_to_vec(vm, pos[0])?; - set_mut(vm, recv, "update: receiver is not a set", |set| { - for v in items { set.insert(v); } - Ok(()) - })?; - vm.push(Val::none()); Ok(()) - }), - - // set: pure (return a fresh set or a bool). - (SetCopy, "copy", pure, |vm, recv, pos| { - check_arity(pos, 0, 0, "copy takes no arguments")?; - let items = set_clone(vm, recv)?; - vm.alloc_and_push_set(items) - }), - (SetUnion, "union", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "union takes 1 argument")?; - let mut out = set_clone(vm, recv)?; - out.extend(iter_to_vec(vm, pos[0])?); - vm.alloc_and_push_set(out) - }), - (SetIntersection, "intersection", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "intersection takes 1 argument")?; - let lhs = set_clone(vm, recv)?; - let rhs_items = iter_to_vec(vm, pos[0])?; - let rhs: crate::util::fx::FxHashSet = rhs_items.into_iter().collect(); - let out: Vec = lhs.into_iter().filter(|v| rhs.contains(v)).collect(); - vm.alloc_and_push_set(out) - }), - (SetDifference, "difference", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "difference takes 1 argument")?; - let lhs = set_clone(vm, recv)?; - let rhs_items = iter_to_vec(vm, pos[0])?; - let rhs: crate::util::fx::FxHashSet = rhs_items.into_iter().collect(); - let out: Vec = lhs.into_iter().filter(|v| !rhs.contains(v)).collect(); - vm.alloc_and_push_set(out) - }), - (SetSymmetricDifference, "symmetric_difference", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "symmetric_difference takes 1 argument")?; - let lhs: crate::util::fx::FxHashSet = set_clone(vm, recv)?.into_iter().collect(); - let rhs: crate::util::fx::FxHashSet = iter_to_vec(vm, pos[0])?.into_iter().collect(); - let out: Vec = lhs.symmetric_difference(&rhs).copied().collect(); - vm.alloc_and_push_set(out) - }), - (SetIsSubset, "issubset", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "issubset takes 1 argument")?; - let lhs = set_clone(vm, recv)?; - let rhs: crate::util::fx::FxHashSet = iter_to_vec(vm, pos[0])?.into_iter().collect(); - vm.push(Val::bool(lhs.iter().all(|v| rhs.contains(v)))); - Ok(()) - }), - (SetIsSuperset, "issuperset", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "issuperset takes 1 argument")?; - let lhs: crate::util::fx::FxHashSet = set_clone(vm, recv)?.into_iter().collect(); - let rhs = iter_to_vec(vm, pos[0])?; - vm.push(Val::bool(rhs.iter().all(|v| lhs.contains(v)))); - Ok(()) - }), - (SetIsDisjoint, "isdisjoint", pure, |vm, recv, pos| { - check_arity(pos, 1, 1, "isdisjoint takes 1 argument")?; - let lhs: crate::util::fx::FxHashSet = set_clone(vm, recv)?.into_iter().collect(); - let rhs = iter_to_vec(vm, pos[0])?; - vm.push(Val::bool(!rhs.iter().any(|v| lhs.contains(v)))); - Ok(()) - }), -} diff --git a/compiler/src/modules/vm/handlers/methods_helpers.rs b/compiler/src/modules/vm/handlers/methods_helpers.rs index 2a135e7..477b79e 100644 --- a/compiler/src/modules/vm/handlers/methods_helpers.rs +++ b/compiler/src/modules/vm/handlers/methods_helpers.rs @@ -30,14 +30,6 @@ pub(super) fn val_to_str(vm: &VM, v: Val) -> Result { } } -#[inline] -pub(super) fn check_arity(pos: &[Val], min: usize, max: usize, msg: &'static str) -> Result<(), VmErr> { - if pos.len() < min || pos.len() > max { - return Err(cold_type(msg)); - } - Ok(()) -} - #[inline] pub(super) fn list_clone(vm: &VM, recv: Val) -> Result, VmErr> { match vm.heap.get(recv) { diff --git a/compiler/src/modules/vm/handlers/mod.rs b/compiler/src/modules/vm/handlers/mod.rs index 154c5ac..def8de8 100644 --- a/compiler/src/modules/vm/handlers/mod.rs +++ b/compiler/src/modules/vm/handlers/mod.rs @@ -1,4 +1,5 @@ pub(crate) mod arith; +pub(crate) mod builtin_methods; pub(crate) mod data; pub(crate) mod dunder; pub(crate) mod format; diff --git a/documentation/implementation/design.md b/documentation/implementation/design.md index 48e1477..1fe05c5 100644 --- a/documentation/implementation/design.md +++ b/documentation/implementation/design.md @@ -138,10 +138,19 @@ compiler/src/ ├── mod.rs ├── arith.rs ├── data.rs + ├── dunder.rs ├── format.rs ├── function.rs - ├── methods.rs - └── methods_helpers.rs + ├── methods.rs # AttrLookup + resolve_attr (no method bodies) + ├── methods_helpers.rs # recv_* / list_mut / dict_mut / iter_to_vec + └── builtin_methods/ # 68 builtin methods as plain pub fn, + ├── mod.rs # indexed by a static MethodDesc table + ├── prelude.rs # (name, fn, mutating, min_args, max_args). + ├── string.rs # Arity check + mark_impure live in the + ├── bytes.rs # dispatcher, not in each body. + ├── list.rs + ├── dict.rs + └── set.rs ``` ## Capabilities diff --git a/documentation/reference/wasm-abi.md b/documentation/reference/wasm-abi.md index 501896b..6dfd182 100644 --- a/documentation/reference/wasm-abi.md +++ b/documentation/reference/wasm-abi.md @@ -339,7 +339,7 @@ The reference browser shim is `demo/worker.js`. WASI hosts and Rust embedders mi - **Refcounted handles.** The guest must release every handle it creates via `edge_encode` or `edge_op` except the one it returns through `*out`. Argv handles are released by the host. - **`edge_decode` only handles primitives.** For `list`, `dict`, `set`, instances, etc., use `edge_op` (e.g. `Call recv "items"`, `GetItem recv idx`). -- **Reentrance is supported.** A guest's `edge_op` runs while the Edge Python VM is paused on the script's `CallExtern`. Method dispatch routes through the same `vm/handlers/methods.rs` table the language uses internally — adding a method there makes it visible to existing modules without recompiling them. +- **Reentrance is supported.** A guest's `edge_op` runs while the Edge Python VM is paused on the script's `CallExtern`. Method dispatch routes through the same `vm/handlers/builtin_methods/` descriptor table the language uses internally — adding a method there makes it visible to existing modules without recompiling them. - **Error-as-status, not panic.** Returning `1` from a guest function does NOT abort the host. The host pulls the error and raises it as a typed Python exception in the script. - **Memory ownership.** The host doesn't read the guest's linear memory except to copy in/out at well-defined points. Anything the guest allocates internally (its own pools, caches, embedded blobs) is private; the host never touches it.