Skip to content

Commit 07e9c93

Browse files
feat(fuzz): Add cargo-afl coverage-guided fuzzer for the lex/parse/VM pipeline.
1 parent 1ad20db commit 07e9c93

7 files changed

Lines changed: 184 additions & 2 deletions

File tree

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,10 @@ pkg/
3737

3838
# Secrets
3939
.env
40-
.env.*
40+
.env.*
41+
42+
# Fuzz
43+
**/in/
44+
**/out/
45+
**/target/
46+
**/edge.dict

compiler/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,18 @@ fn main() {
6464

6565
The download URL is derived from `CARGO_PKG_VERSION`, so a tag bump is the only retarget. Use `branch = "main"` for unreleased work. Requires `curl` on PATH; gated by the default-on `prebuilt` feature.
6666

67+
## Fuzzing
68+
69+
Coverage-guided fuzzing of the lex -> parse -> VM pipeline lives in [`fuzz-afl/`](fuzz-afl/), built on [cargo-afl](https://github.com/rust-fuzz/afl.rs) (AFL++) and running on stable Rust.
70+
71+
```bash
72+
cd fuzz-afl
73+
./seeds.sh # generate corpus + dictionary from vm.json (once)
74+
cargo afl build && cargo afl fuzz -i in -o out -x edge.dict target/debug/afl-pipeline
75+
```
76+
77+
Seeds and the dictionary are generated from `tests/cases/vm.json`, so they are gitignored. Under WSL, prefix the fuzz command with `AFL_SKIP_CPUFREQ=1 AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1`. See [Fuzzing](https://edgepython.com/implementation/fuzzing) for details.
78+
6779
## References
6880

6981
1. **Aho, Sethi & Ullman**, *Compilers: Principles, Techniques and Tools* (1986). LUT-based lexer.

compiler/fuzz-afl/Cargo.toml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[package]
2+
name = "edge-python-afl"
3+
version = "0.0.0"
4+
publish = false
5+
edition = "2024"
6+
7+
# Standalone workspace so the parent workspace does not adopt this crate.
8+
[workspace]
9+
10+
[dependencies]
11+
afl = "0.18.2"
12+
13+
# `default-features = false` drops `prebuilt` so the build skips the wasm download.
14+
[dependencies.edge-python]
15+
path = ".."
16+
default-features = false
17+
18+
[[bin]]
19+
name = "afl-pipeline"
20+
path = "src/main.rs"

compiler/fuzz-afl/seeds.sh

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#!/usr/bin/env bash
2+
# Regenerates the fuzz inputs from the single source of truth (tests/cases/vm.json). `in/` (seed corpus) and `edge.dict` are gitignored artifacts; run this once before fuzzing. Pure bash, no extra runtime.
3+
set -euo pipefail
4+
cd "$(dirname "$0")"
5+
6+
# Seed corpus: one file per unique `src` in the VM test fixtures. grep -oP pulls each JSON string body (handling \" and \\); sed unescapes the common escapes.
7+
rm -rf in && mkdir -p in
8+
while IFS= read -r raw; do
9+
src=$(printf '%s' "$raw" | sed -e 's/\\\\/\x01/g' \
10+
-e 's/\\n/\n/g' -e 's/\\t/\t/g' -e 's/\\r/\r/g' -e 's/\\"/"/g' \
11+
-e 's/\x01/\\/g')
12+
[ -z "$src" ] && continue
13+
name=$(printf '%s' "$src" | sha1sum | cut -c1-16)
14+
printf '%s' "$src" > "in/$name"
15+
done < <(grep -oP '"src":\s*"\K(?:[^"\\]|\\.)*' ../tests/cases/vm.json)
16+
echo "seeds: $(ls in | wc -l)"
17+
18+
# Token dictionary: keywords, operators, and builtins for the AFL++ mutator.
19+
cat > edge.dict <<'DICT'
20+
# Edge Python token dictionary for AFL++ (-x edge.dict). Generated by seeds.sh.
21+
22+
# keywords
23+
"if"
24+
"else"
25+
"elif"
26+
"for"
27+
"while"
28+
"def"
29+
"class"
30+
"return"
31+
"import"
32+
"from"
33+
"try"
34+
"except"
35+
"with"
36+
"yield"
37+
"async"
38+
"await"
39+
"pass"
40+
"break"
41+
"continue"
42+
"True"
43+
"False"
44+
"None"
45+
"and"
46+
"or"
47+
"not"
48+
"in"
49+
"is"
50+
"lambda"
51+
"assert"
52+
"del"
53+
"raise"
54+
55+
# operators and punctuation
56+
"->"
57+
":="
58+
"=="
59+
"!="
60+
"<="
61+
">="
62+
"**"
63+
"//"
64+
"<<"
65+
">>"
66+
"+="
67+
"-="
68+
"*="
69+
"/="
70+
"..."
71+
72+
# builtins and common identifiers
73+
"print("
74+
"len("
75+
"range("
76+
"int("
77+
"str("
78+
"list("
79+
"dict("
80+
"set("
81+
"input("
82+
"self"
83+
"__init__"
84+
"f\"{"
85+
DICT
86+
echo "dict: $(grep -c '^"' edge.dict) entries"

compiler/fuzz-afl/src/main.rs

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
use afl::fuzz;
2+
3+
use compiler::modules::lexer::lex;
4+
use compiler::modules::parser::Parser;
5+
use compiler::modules::vm::{Limits, VM};
6+
7+
fn main() {
8+
fuzz!(|data: &[u8]| {
9+
// Source is text; reject non-UTF-8 rather than counting it as coverage.
10+
let Ok(src) = core::str::from_utf8(data) else { return };
11+
12+
let (tokens, _lex_errs) = lex(src);
13+
let (chunk, parse_errs) = Parser::new(src, tokens.into_iter()).parse();
14+
15+
// Only valid programs reach the VM; the chunk is unreliable after a parse error.
16+
if !parse_errs.is_empty() {
17+
return;
18+
}
19+
20+
// Bounded budget turns runaway loops and allocations into VmErr, not hangs.
21+
let _ = VM::with_limits(&chunk, Limits::sandbox()).run();
22+
});
23+
}

docs/pages/implementation/_meta.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@ export default {
22
'--- implementation': { type: 'separator', title: 'Implementation' },
33
design: 'Design',
44
lexical: 'Lexical',
5-
syntax: 'Syntax'
5+
syntax: 'Syntax',
6+
fuzzing: 'Fuzzing'
67
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: "Fuzzing"
3+
description: "Coverage-guided fuzzing of the lex, parse, and VM pipeline with cargo-afl on stable Rust."
4+
---
5+
6+
## Overview
7+
8+
The fuzzer drives the full `lex -> parse -> VM` pipeline against mutated input, looking for panics, arithmetic overflow, and memory faults. It lives in [`compiler/fuzz-afl/`](https://github.com/dylan-sutton-chavez/edge-python/tree/main/compiler/fuzz-afl) and is built on [cargo-afl](https://github.com/rust-fuzz/afl.rs) (AFL++), which instruments via AFL++'s LLVM passes and therefore runs on **stable Rust**, no nightly toolchain required.
9+
10+
The target runs the VM under `Limits::sandbox()`, so runaway loops and allocations become a `VmErr` instead of a hang, and any real crash is a genuine bug rather than resource exhaustion. See [Limits and errors](/reference/limits-and-errors).
11+
12+
## Running it
13+
14+
```bash
15+
cd compiler/fuzz-afl
16+
./seeds.sh # generate corpus + dictionary from vm.json (once)
17+
cargo afl build # instrument on stable, no nightly
18+
cargo afl fuzz -i in -o out -x edge.dict target/debug/afl-pipeline
19+
```
20+
21+
Under WSL, prefix the fuzz command with `AFL_SKIP_CPUFREQ=1 AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1` to bypass the core-pattern and CPU-governor checks. Crashes and hangs land in `out/default/`. Reproduce one by piping it back into the target:
22+
23+
```bash
24+
./target/debug/afl-pipeline < out/default/crashes/<id>
25+
```
26+
27+
## Inputs are generated, not committed
28+
29+
The seed corpus (`in/`) and the token dictionary (`edge.dict`) are derived from a single source of truth, `tests/cases/vm.json`, so they are gitignored and regenerated by `seeds.sh`:
30+
31+
- **`in/`**: one file per unique program `src` in the VM test fixtures, giving AFL valid starting points that already exercise most of the language.
32+
- **`edge.dict`**: keywords, operators, and common builtins, so the byte mutator splices real tokens instead of discovering them blindly.
33+
34+
Only three files are tracked: `Cargo.toml`, `src/main.rs`, and `seeds.sh`. The corpus, dictionary, AFL output, and build artifacts are all reproducible.

0 commit comments

Comments
 (0)