fix(soundness): #1296 RISC-V opcode soundness audit#1345
Open
spherel wants to merge 7 commits into
Open
Conversation
Switch every precompile arg0 / arg1 register slot from `OpFixedRS<_, _, true>` to `OpFixedRS<_, _, false>`, dropping the free `prev_value` witness path through `register_write`. Memory addresses now derive from the caller-owned `MemAddr` (`expr_unaligned()`). The emulator's syscall reg-op tracker is moved to `SUBCYCLE_RS1` to match the new circuit subcycle. Touches: keccak, sha_extend, pubio_commit, fptower_fp / fp2_add / fp2_mul, weierstrass_add / double / decompress, uint256 plus `ceno_emul/src/syscalls.rs`.
Tag the carry-chain in `mulh_circuit_v2` with a FIXME pointing at the issue.
…1296 v2) Replace the inverse-scaled `inv(2^16) * (expected - rd)` carry, unsound over BabyBear (a single u16*u16 partial product exceeds p, and inv(65536) = -30720 lets small negative residuals pass the carry range bound), with SP1's byte schoolbook design: operands/result are range-checked u8 limbs and carries are genuine non-negative magnitudes that are directly range-checked. Covers MUL/MULH/MULHU/MULHSU and removes the prior FIXME. Adds a regression test that the byte identity rejects a wrong low product. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ound (#1296 v2) Rewrite `divisor*quotient + remainder == dividend` over u8 limbs with magnitude carries (same BabyBear overflow / inverse-scaled-carry bug as MUL), bound `|remainder| < |divisor|` with a per-byte comparison (a single 32-bit IsLt is invalid over BabyBear), and pin the zero-divisor remainder (closing the REMU absorb) and signed-overflow result explicitly. Covers DIV/DIVU/REM/REMU. Adds a REMU zero-divisor regression test and makes the shared test helper field-agnostic. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rs1_read / rs2_read used UInt8::new_unchecked, so the byte limbs were never range-checked. The register argument only pins the recombined u16 (byte0 + 256*byte1), not the individual bytes, so a non-canonical byte decomposition satisfies the read while feeding the byte-level shift gadget a different value. Use UInt8::new and emit the matching byte lookups for SLL/SRL/SRA and SLLI/SRLI/SRAI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Addresses soundness issues from audit #1296 across several RISC-V opcode circuits by strengthening byte-/carry-/outflow-constraints (notably for shift, mul*, div/rem*), and by preventing unintended mutation of ECALL argument registers by treating precompile args as read-only and deriving memory addresses from caller-owned MemAddr.
Findings (sorted by severity)
- minor —
ceno_zkvm/src/instructions/riscv/mulh/mulh_circuit_v2.rs:42-44: Doc comment typo uses<< pwhere an inequality< pis intended (stored as PR comment).
Open Questions / Assumptions
- None.
Changes:
- Add explicit
outflowu32 range constraints to v1 shift circuits and emit matching lookup multiplicities. - Rewrite v2 MUL* and DIV/REM* circuits to byte-limb schoolbook constraints with magnitude carries and byte range checks (plus new soundness regression tests).
- Make ECALL arg registers read-only (
OpFixedRS<..., false>) and tie precompile memory addressing toMemAddrvalues; align syscall register-access subcycle tracking accordingly.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| ceno_zkvm/src/instructions/riscv/shift/shift_circuit.rs | Range-check outflow in v1 shift circuit and add corresponding LK multiplicity. |
| ceno_zkvm/src/instructions/riscv/shift/shift_circuit_v2.rs | Range-check byte limbs for shift inputs/outputs and emit corresponding byte lookups. |
| ceno_zkvm/src/instructions/riscv/shift_imm/shift_imm_circuit.rs | Range-check outflow in v1 shift-imm circuit and add corresponding LK multiplicity. |
| ceno_zkvm/src/instructions/riscv/mulh/mulh_circuit_v2.rs | Replace inverse-scaled u16 carry scheme with byte-limb schoolbook multiply + magnitude carries. |
| ceno_zkvm/src/instructions/riscv/mulh.rs | Add v2 BabyBear regression test rejecting incorrect MUL products. |
| ceno_zkvm/src/instructions/riscv/div/div_circuit_v2.rs | Replace inverse-scaled u16 carry scheme with byte-limb div/rem identity + field-safe remainder bound. |
| ceno_zkvm/src/instructions/riscv/div.rs | Update tests for v2 byte-limb outputs and add zero-divisor remainder-binding regression tests. |
| ceno_zkvm/src/instructions/riscv/ecall/weierstrass_double.rs | Make arg0 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/weierstrass_decompress.rs | Make arg0/arg1 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/weierstrass_add.rs | Make arg0/arg1 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/uint256.rs | Make arg0/arg1 read-only across uint256 ecalls and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/sha_extend.rs | Make arg0 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/pubio_commit.rs | Make arg0 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/keccak.rs | Make arg0 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/fptower_fp2_mul.rs | Make arg0/arg1 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/fptower_fp2_add.rs | Make arg0/arg1 read-only and derive memory addresses from MemAddr witness. |
| ceno_zkvm/src/instructions/riscv/ecall/fptower_fp.rs | Make arg0/arg1 read-only and derive memory addresses from MemAddr witness. |
| ceno_emul/src/syscalls.rs | Track syscall register ops as read-subcycle accesses (aligning with read-only arg modeling). |
…#1296) a43febb moved SyscallEffects::finalize to track every syscall reg-op at SUBCYCLE_RS1, but only updated the dedicated precompile circuits. The generic LargeEcallDummy (used in production by LOG_PC_CYCLE) still wrote its arg registers via WriteRD at SUBCYCLE_RD, desyncing the register-bus timestamps so the verifier rejected with `prod_r != prod_w` for any program emitting LOG_PC_CYCLE (e.g. ceno_rt_alloc). Model the read-only argument registers as register_read at SUBCYCLE_RS1, matching the emulator and the precompiles' OpFixedRS<_, _, false> path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
daac387 to
4cfa84f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the soundness findings from the #1296 audit across the RISC-V opcode
circuits. Each bug let a malicious prover satisfy the circuit with a witness
that does not correspond to correct RISC-V execution (a verifier-side
soundness break, not just a liveness failure). Fixes span both circuit
families:
shift_circuit.rs,shift_imm_circuit.rs)u16limb_circuitpath (*_circuit_v2.rs)The common root cause across the arithmetic circuits is that a value's
fine-grained decomposition (bytes, carries, outflow) was under-constrained:
the register/offline-memory argument only pins the recombined 16-bit limbs, and
over a small field (BabyBear
p ≈ 2^31) the algebraic shortcuts used to boundcarries do not force the canonical integer decomposition.
Bugs fixed
SLL(+SRL/SRA/SLLI/SRLI/SRAI)outflowonly bounded byAssertLtConfig(a modular-difference bound), not its canonical[0, 2^32)range; a negative-aliasoutflowabsorbs a 1-bit error in the resultoutflowto[0, 2^32)in both shift circuitsMUL/MULH/MULHU/MULHSUinv(2^16)·(expected − rd)and only range-checked after scaling; over BabyBearinv(65536) = −30720, so small negative residuals pass the bound (and a singleu16·u16 ≈ 2^32partial product already overflowsp)DIV/DIVU/REM/REMUdivisor·quotient + remainder == dividendwith magnitude carries; field-safe per-byte `SLL/SRL/SRA(+SLLI/SRLI/SRAI)rs1_read/rs2_readbuilt withUInt8::new_unchecked; the register argument only pins the recombined u16 (b0 + 256·b1), so a non-canonical byte decomposition satisfies the read while feeding the byte-level shift gadget a different valueUInt8::new(range-check the bytes) and emit the matching byte lookupsShaExtend(+ allOpFixedRS<…,true>ecalls: keccak, uint256, weierstrass×3, fptower×3, pubio_commit)prev_value, so the syscall could use the correct pointer for memory yet write back a different valueOpFixedRS<…,false>; derive memory addresses from the caller-ownedMemAddrKey design points
u16·u16partial product(~
2^32) exceedsp ≈ 2^31, so any u16-limb schoolbook multiply wraps modpand the inverse-carry trick cannot bound magnitudes. Byte products(
≤ 255² = 65025) and their column sums stay far belowp, so the fieldequation is a faithful integer equation and the byte/carry decomposition is
unique. (Mirrors SP1's
MulOperation/divremdesign.)range-checked directly — so a "negative" residual can't be laundered into a
small positive field element.
IsLtConfigis invalidover BabyBear (
assert BaseField::bits()-1 > max_bits), so|remainder| < |divisor|uses a per-byte first-differing-limb comparison.memory argument matches on the recombined u16 limbs, not the bytes; the
producer's range-checked bytes are recombined to a u16 before entering the
argument and a fresh, unconstrained byte decomposition is allocated on the
read side. Any gadget that reasons over bytes must range-check its input bytes
at the point of use. (This is why
new_uncheckedremains correct for ordinaryu16 register reads but not for the byte circuits.)
Files changed
instructions/riscv/mulh/mulh_circuit_v2.rs,instructions/riscv/mulh.rsinstructions/riscv/div/div_circuit_v2.rs,instructions/riscv/div.rsinstructions/riscv/shift/shift_circuit_v2.rsinstructions/riscv/shift/shift_circuit.rs,instructions/riscv/shift_imm/shift_imm_circuit.rsinstructions/riscv/ecall/{keccak,sha_extend,uint256,pubio_commit,fptower_fp,fptower_fp2_add,fptower_fp2_mul,weierstrass_add,weierstrass_double,weierstrass_decompress}.rs,ceno_emul/src/syscalls.rsTest plan
cargo make clippy(workspace,-D warnings) andcargo fmt --all --checkcargo make tests— full workspace suite on BabyBear / v2 (exit 0)cargo make tests_goldilock— full suite on Goldilocks / v1 (exit 0)(i32::MIN, −1, 0, u32::MAX, random) on both fields
lk_shardram_matchtests for mul/mulh/div/remu/sra/slli (GPU LK closurematches CPU
assign_instance)test_opcode_mul_rejects_wrong_product,test_divrem_remu_zero_divisor_binds_remainderFollow-ups / out of scope
so the
ceno-gpu-mockMulColumnMap/DivColumnMapstructs and thewitgen_mul/witgen_divCUDA kernels need matching updates before--features gpubuilds those two. These are#[cfg(feature="gpu")], sodefault builds / tests / CI are unaffected. The shift fix does not change
layout (only adds lookups, collected via the in-repo closure), so shift GPU is
unaffected.
ConstraintSystem/GKRCircuitproduced byconstruct_circuit, so it picksup the new constraints automatically;
ceno_recursionpassed as part of theworkspace test run.