Skip to content

fix(soundness): #1296 RISC-V opcode soundness audit#1345

Open
spherel wants to merge 7 commits into
masterfrom
fix/issue-1296-soundness-audit
Open

fix(soundness): #1296 RISC-V opcode soundness audit#1345
spherel wants to merge 7 commits into
masterfrom
fix/issue-1296-soundness-audit

Conversation

@spherel
Copy link
Copy Markdown
Member

@spherel spherel commented May 29, 2026

Summary

Closes the soundness findings from the #1296 audit across the RISC-V opcode
circuits. Each bug let a malicious prover satisfy the circuit with a witness
that does not correspond to correct RISC-V execution (a verifier-side
soundness break, not just a liveness failure). Fixes span both circuit
families:

  • v1 = Goldilocks path (shift_circuit.rs, shift_imm_circuit.rs)
  • v2 = default BabyBear / u16limb_circuit path (*_circuit_v2.rs)

The common root cause across the arithmetic circuits is that a value's
fine-grained decomposition (bytes, carries, outflow) was under-constrained
:
the register/offline-memory argument only pins the recombined 16-bit limbs, and
over a small field (BabyBear p ≈ 2^31) the algebraic shortcuts used to bound
carries do not force the canonical integer decomposition.

Bugs fixed

Opcode(s) Field/ver Root cause Fix
SLL (+ SRL/SRA/SLLI/SRLI/SRAI) v1 (Goldilocks) outflow only bounded by AssertLtConfig (a modular-difference bound), not its canonical [0, 2^32) range; a negative-alias outflow absorbs a 1-bit error in the result range-check outflow to [0, 2^32) in both shift circuits
MUL/MULH/MULHU/MULHSU v2 (BabyBear) carry computed as inv(2^16)·(expected − rd) and only range-checked after scaling; over BabyBear inv(65536) = −30720, so small negative residuals pass the bound (and a single u16·u16 ≈ 2^32 partial product already overflows p) byte (u8) schoolbook multiply; carries are genuine non-negative magnitudes that are directly range-checked; operands/result are range-checked u8 limbs
DIV/DIVU/REM/REMU v2 (BabyBear) same inverse-scaled carry family; on the REMU zero-divisor path an incorrect remainder is absorbed by the carry chain byte-limb divisor·quotient + remainder == dividend with magnitude carries; field-safe per-byte `
SLL/SRL/SRA (+ SLLI/SRLI/SRAI) v2 (BabyBear) rs1_read/rs2_read built with UInt8::new_unchecked; the register argument only pins the recombined u16 (b0 + 256·b1), so a non-canonical byte decomposition satisfies the read while feeding the byte-level shift gadget a different value use UInt8::new (range-check the bytes) and emit the matching byte lookups
ShaExtend (+ all OpFixedRS<…,true> ecalls: keccak, uint256, weierstrass×3, fptower×3, pubio_commit) shared arg0/arg1 modeled as a register write whose written value was not tied to prev_value, so the syscall could use the correct pointer for memory yet write back a different value switch every precompile arg/ecall slot to OpFixedRS<…,false>; derive memory addresses from the caller-owned MemAddr

Key design points

  • Byte limbs are required over BabyBear. A single u16·u16 partial product
    (~2^32) exceeds p ≈ 2^31, so any u16-limb schoolbook multiply wraps mod
    p and the inverse-carry trick cannot bound magnitudes. Byte products
    (≤ 255² = 65025) and their column sums stay far below p, so the field
    equation is a faithful integer equation and the byte/carry decomposition is
    unique. (Mirrors SP1's MulOperation / divrem design.)
  • Carries are magnitudes, not inverse-scaled residuals, and are
    range-checked directly — so a "negative" residual can't be laundered into a
    small positive field element.
  • The remainder bound is field-safe. A single 32-bit IsLtConfig is invalid
    over BabyBear (assert BaseField::bits()-1 > max_bits), so |remainder| < |divisor| uses a per-byte first-differing-limb comparison.
  • Register-argument granularity drives the input byte checks. The offline
    memory argument matches on the recombined u16 limbs, not the bytes; the
    producer's range-checked bytes are recombined to a u16 before entering the
    argument and a fresh, unconstrained byte decomposition is allocated on the
    read side. Any gadget that reasons over bytes must range-check its input bytes
    at the point of use. (This is why new_unchecked remains correct for ordinary
    u16 register reads but not for the byte circuits.)

Files changed

  • instructions/riscv/mulh/mulh_circuit_v2.rs, instructions/riscv/mulh.rs
  • instructions/riscv/div/div_circuit_v2.rs, instructions/riscv/div.rs
  • instructions/riscv/shift/shift_circuit_v2.rs
  • instructions/riscv/shift/shift_circuit.rs,
    instructions/riscv/shift_imm/shift_imm_circuit.rs
  • instructions/riscv/ecall/{keccak,sha_extend,uint256,pubio_commit,fptower_fp,fptower_fp2_add,fptower_fp2_mul,weierstrass_add,weierstrass_double,weierstrass_decompress}.rs,
    ceno_emul/src/syscalls.rs

Test plan

  • cargo make clippy (workspace, -D warnings) and cargo fmt --all --check
  • cargo make tests — full workspace suite on BabyBear / v2 (exit 0)
  • cargo make tests_goldilock — full suite on Goldilocks / v1 (exit 0)
  • MUL/MULH/MULHU/MULHSU and DIV/DIVU/REM/REMU unit tests across all edges
    (i32::MIN, −1, 0, u32::MAX, random) on both fields
  • lk_shardram_match tests for mul/mulh/div/remu/sra/slli (GPU LK closure
    matches CPU assign_instance)
  • New regression tests: test_opcode_mul_rejects_wrong_product,
    test_divrem_remu_zero_divisor_binds_remainder
  • SLL/SRL/SRA + SLLI/SRLI/SRAI unit tests after the shift byte-range fix

Follow-ups / out of scope

  • GPU (external). The MUL/DIV rewrites change the witness column layout,
    so the ceno-gpu-mock MulColumnMap/DivColumnMap structs and the
    witgen_mul/witgen_div CUDA kernels need matching updates before
    --features gpu builds those two. These are #[cfg(feature="gpu")], so
    default builds / tests / CI are unaffected. The shift fix does not change
    layout (only adds lookups, collected via the in-repo closure), so shift GPU is
    unaffected.
  • Recursion. No code change: the recursive verifier is generic over the
    ConstraintSystem/GKRCircuit produced by construct_circuit, so it picks
    up the new constraints automatically; ceno_recursion passed as part of the
    workspace test run.

spherel and others added 6 commits May 27, 2026 08:56
 v1)

Constrain `outflow` to `[0, 2^32)` in both `shift_circuit.rs` and
`shift_imm_circuit.rs`, with the matching `lk_multiplicity` entry on
the witness side.
Switch every precompile arg0 / arg1 register slot from
`OpFixedRS<_, _, true>` to `OpFixedRS<_, _, false>`, dropping the
free `prev_value` witness path through `register_write`. Memory
addresses now derive from the caller-owned `MemAddr` (`expr_unaligned()`).
The emulator's syscall reg-op tracker is moved to `SUBCYCLE_RS1` to
match the new circuit subcycle.

Touches: keccak, sha_extend, pubio_commit, fptower_fp / fp2_add /
fp2_mul, weierstrass_add / double / decompress, uint256 plus
`ceno_emul/src/syscalls.rs`.
Tag the carry-chain in `mulh_circuit_v2` with a FIXME pointing at the
issue.
…1296 v2)

Replace the inverse-scaled `inv(2^16) * (expected - rd)` carry, unsound over
BabyBear (a single u16*u16 partial product exceeds p, and inv(65536) = -30720
lets small negative residuals pass the carry range bound), with SP1's byte
schoolbook design: operands/result are range-checked u8 limbs and carries are
genuine non-negative magnitudes that are directly range-checked. Covers
MUL/MULH/MULHU/MULHSU and removes the prior FIXME. Adds a regression test that
the byte identity rejects a wrong low product.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ound (#1296 v2)

Rewrite `divisor*quotient + remainder == dividend` over u8 limbs with
magnitude carries (same BabyBear overflow / inverse-scaled-carry bug as MUL),
bound `|remainder| < |divisor|` with a per-byte comparison (a single 32-bit
IsLt is invalid over BabyBear), and pin the zero-divisor remainder (closing the
REMU absorb) and signed-overflow result explicitly. Covers DIV/DIVU/REM/REMU.
Adds a REMU zero-divisor regression test and makes the shared test helper
field-agnostic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rs1_read / rs2_read used UInt8::new_unchecked, so the byte limbs were never
range-checked. The register argument only pins the recombined u16
(byte0 + 256*byte1), not the individual bytes, so a non-canonical byte
decomposition satisfies the read while feeding the byte-level shift gadget a
different value. Use UInt8::new and emit the matching byte lookups for
SLL/SRL/SRA and SLLI/SRLI/SRAI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses soundness issues from audit #1296 across several RISC-V opcode circuits by strengthening byte-/carry-/outflow-constraints (notably for shift, mul*, div/rem*), and by preventing unintended mutation of ECALL argument registers by treating precompile args as read-only and deriving memory addresses from caller-owned MemAddr.

Findings (sorted by severity)

  • minorceno_zkvm/src/instructions/riscv/mulh/mulh_circuit_v2.rs:42-44: Doc comment typo uses << p where an inequality < p is intended (stored as PR comment).

Open Questions / Assumptions

  • None.

Changes:

  • Add explicit outflow u32 range constraints to v1 shift circuits and emit matching lookup multiplicities.
  • Rewrite v2 MUL* and DIV/REM* circuits to byte-limb schoolbook constraints with magnitude carries and byte range checks (plus new soundness regression tests).
  • Make ECALL arg registers read-only (OpFixedRS<..., false>) and tie precompile memory addressing to MemAddr values; align syscall register-access subcycle tracking accordingly.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ceno_zkvm/src/instructions/riscv/shift/shift_circuit.rs Range-check outflow in v1 shift circuit and add corresponding LK multiplicity.
ceno_zkvm/src/instructions/riscv/shift/shift_circuit_v2.rs Range-check byte limbs for shift inputs/outputs and emit corresponding byte lookups.
ceno_zkvm/src/instructions/riscv/shift_imm/shift_imm_circuit.rs Range-check outflow in v1 shift-imm circuit and add corresponding LK multiplicity.
ceno_zkvm/src/instructions/riscv/mulh/mulh_circuit_v2.rs Replace inverse-scaled u16 carry scheme with byte-limb schoolbook multiply + magnitude carries.
ceno_zkvm/src/instructions/riscv/mulh.rs Add v2 BabyBear regression test rejecting incorrect MUL products.
ceno_zkvm/src/instructions/riscv/div/div_circuit_v2.rs Replace inverse-scaled u16 carry scheme with byte-limb div/rem identity + field-safe remainder bound.
ceno_zkvm/src/instructions/riscv/div.rs Update tests for v2 byte-limb outputs and add zero-divisor remainder-binding regression tests.
ceno_zkvm/src/instructions/riscv/ecall/weierstrass_double.rs Make arg0 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/weierstrass_decompress.rs Make arg0/arg1 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/weierstrass_add.rs Make arg0/arg1 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/uint256.rs Make arg0/arg1 read-only across uint256 ecalls and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/sha_extend.rs Make arg0 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/pubio_commit.rs Make arg0 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/keccak.rs Make arg0 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/fptower_fp2_mul.rs Make arg0/arg1 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/fptower_fp2_add.rs Make arg0/arg1 read-only and derive memory addresses from MemAddr witness.
ceno_zkvm/src/instructions/riscv/ecall/fptower_fp.rs Make arg0/arg1 read-only and derive memory addresses from MemAddr witness.
ceno_emul/src/syscalls.rs Track syscall register ops as read-subcycle accesses (aligning with read-only arg modeling).

Comment thread ceno_zkvm/src/instructions/riscv/mulh/mulh_circuit_v2.rs
@spherel spherel marked this pull request as ready for review May 29, 2026 06:48
@spherel spherel requested a review from Copilot May 29, 2026 06:55
@spherel spherel self-assigned this May 29, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

…#1296)

a43febb moved SyscallEffects::finalize to track every syscall reg-op at
SUBCYCLE_RS1, but only updated the dedicated precompile circuits. The
generic LargeEcallDummy (used in production by LOG_PC_CYCLE) still wrote
its arg registers via WriteRD at SUBCYCLE_RD, desyncing the register-bus
timestamps so the verifier rejected with `prod_r != prod_w` for any
program emitting LOG_PC_CYCLE (e.g. ceno_rt_alloc).

Model the read-only argument registers as register_read at SUBCYCLE_RS1,
matching the emulator and the precompiles' OpFixedRS<_, _, false> path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@spherel spherel force-pushed the fix/issue-1296-soundness-audit branch from daac387 to 4cfa84f Compare May 29, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants