Skip to content

feat: widen ECCVM layout (8 additions/row, 8 wnaf digits/row)#21721

Open
notnotraju wants to merge 24 commits intomerge-train/barretenbergfrom
rk/eccvm-wide-short
Open

feat: widen ECCVM layout (8 additions/row, 8 wnaf digits/row)#21721
notnotraju wants to merge 24 commits intomerge-train/barretenbergfrom
rk/eccvm-wide-short

Conversation

@notnotraju
Copy link
Contributor

@notnotraju notnotraju commented Mar 18, 2026

Summary

Doubles the width of the ECCVM Precomputed and MSM tables:

  • WNAF_DIGITS_PER_ROW: 4 → 8 (precomputed table rows halved per scalar)
  • ADDITIONS_PER_ROW: 4 → 8 (MSM addition rows halved per round)
  • DOUBLINGS_PER_ROW: stays at 4 (decoupled from additions)

Net effect: for m short scalar muls, MSM rows go from 33·⌈m/4⌉ + 31 to 33·⌈m/8⌉ + 31; precompute rows go from 8m to 4m.

Capacity analysis

Each app circuit adds ~1104 ECCVM rows with a base overhead of ~1494 rows. The MaxCapacityPassing test now computes the max app count from CONST_ECCVM_LOG_N instead of hardcoding it.

CONST_ECCVM_LOG_N ECCVM rows Max apps vs. old 4-wide at LOG_N=15
15 32768 28 +65% capacity (was 17)
14 16384 13 -24% capacity (was 17)

LOG_N=14 does NOT maintain capacity parity: only 13 apps fit (down from 17). The win here is keeping LOG_N = 15 and getting 28 apps — nearly doubling the stack depth.

Changes made

Constants & types (eccvm_builder_types.hpp):

  • WNAF_DIGITS_PER_ROW 4→8, ADDITIONS_PER_ROW 4→8

Flavor (eccvm_flavor.hpp):

  • 36 new witness columns: msm_add5..8, msm_x5..8, msm_y5..8, msm_collision_x5..8, msm_lambda5..8, msm_slice5..8, precompute_s5hi..s8lo, precompute_tx2, precompute_ty2, lookup_read_counts_2, lookup_read_counts_3

Builders:

  • precomputed_tables_builder.hpp: 8 digits per row, 2 points per row (Tx/Ty + Tx2/Ty2), int64_t for row_chunk to avoid overflow
  • msm_builder.hpp: 8 additions per row with 4 doublings, dummy-point padding for unused slots, 4 read-count columns with compressed-slice-to-table mapping

Relations (all in relations/ecc_vm/):

  • ecc_msm_relation: extended to 8 addition constraints per row
  • ecc_bools_relation: boolean checks for msm_add5..8
  • ecc_wnaf_relation: 8 slice decompositions per row
  • ecc_point_table_relation: 2nd precomputed point constraint (Tx2/Ty2)
  • ecc_set_relation: 8 slice fingerprints, 8 add-gated tuples, eccvm_set_permutation_delta as product of 8 terms; second term uses tx2/ty2
  • ecc_lookup_relation: 8 reads, 4 table terms (point1 pos/neg, point2 pos/neg)

Prover/verifier (eccvm_prover.cpp, eccvm_verifier.cpp, eccvm_trace_checker.cpp):

  • eccvm_set_permutation_delta updated to 8-term product

Test infrastructure:

  • eccvm.test.cpp: updated delta computation, transcript manifest
  • eccvm_transcript.test.cpp: updated expected prover manifest
  • chonk.test.cpp: MaxCapacityPassing now computes max apps from LOG_N (28 at LOG_N=15)
  • Gate count / proof size constants updated

Not included (needed before merge)

  • Noir constant updates (constants.nr ECCVM proof length)
  • VK regeneration
  • Decision on CONST_ECCVM_LOG_N (stay at 15 for capacity, not 14)

Test plan

  • 41/41 eccvm_tests pass
  • 50/50 CI=1 NO_FAIL_FAST=1 ./bootstrap test pass (includes chonk, goblin, dsl, VK checks)
  • LOG_N=14 tested: only 13 apps fit (insufficient for 17-app workload)
  • E2E / IVC integration tests
  • Noir constant sync

notnotraju added 19 commits March 17, 2026 13:41
This is the first step toward halving the Precomputed and MSM table heights
by doubling their width. The key changes:

- WNAF_DIGITS_PER_ROW: 4 -> 8 (process 8 wNAF digits per precompute row)
- ADDITIONS_PER_ROW: 4 -> 8 (process 8 point additions per MSM row)
- DOUBLINGS_PER_ROW: new constant, always NUM_WNAF_DIGIT_BITS (= 4)

The new DOUBLINGS_PER_ROW constant decouples the doubling chain length
(which must remain 4, matching the wNAF digit width w=4) from
ADDITIONS_PER_ROW (which we are doubling to 8). Previously, these were
conflated because ADDITIONS_PER_ROW happened to equal NUM_WNAF_DIGIT_BITS.
Key changes to MSMRow and trace computation:

- AddState array: hardcoded size 4 -> ADDITIONS_PER_ROW (now 8)
- Doubling loops: use DOUBLINGS_PER_ROW (= 4) instead of ADDITIONS_PER_ROW
  for the doubling phase, since we always do w=4 doublings regardless of
  how many additions we pack per row
- Trace sizing: (num_msm_rows - 2) * 4 -> * ADDITIONS_PER_ROW
- trace_index computation: * 4 -> * ADDITIONS_PER_ROW
- After doubling loops, advance trace_index by (ADDITIONS_PER_ROW -
  DOUBLINGS_PER_ROW) to skip unused slots allocated in the point trace
- Final row add_state: use ADDITIONS_PER_ROW-sized array fill
With WNAF_DIGITS_PER_ROW doubled from 4 to 8:
- num_rows_per_scalar drops from 8 to 4 (32 digits / 8 per row)
- Each row now encodes 8 wNAF digits via 16 two-bit slices (s1..s16),
  up from 4 digits / 8 slices (s1..s8)
- Each row stores 2 precomputed points (precompute_accumulator and
  precompute_accumulator2), since we have 8 points to store across
  4 rows. Row i stores table[POINT_TABLE_SIZE-1-2i] and
  table[POINT_TABLE_SIZE-2-2i].
- Horner scalar accumulation shifts by 2^32 (was 2^16) since each
  row now contributes 8*4 = 32 bits of scalar data.
- row_chunk computation extended to sum all 8 wNAF digits.
- Removed static_assert(WNAF_DIGITS_PER_ROW == 4), replaced with
  static_assert(WNAF_DIGITS_PER_ROW == 8).
- Updated POINT_TABLE_SIZE/2 == num_rows_per_scalar*2 assert to
  reflect the new 2-points-per-row layout.
Updates ECCVMFlavor entity counts and column definitions:

NUM_WIRES: 85 -> 121
NUM_ALL_ENTITIES: 118 -> 156
NUM_WITNESS_ENTITIES: 87 -> 123
NUM_SHIFTED_ENTITIES: 26 -> 28

New WireNonShiftedEntities (+34 columns):
- precompute_s5hi..s8lo: 8 new 2-bit slice columns for digits 5-8
- msm_add5..add8: 4 new addition selector columns
- msm_x5..x8, msm_y5..y8: 8 new point coordinate columns
- msm_collision_x5..x8: 4 new collision inverse columns
- msm_lambda5..lambda8: 4 new slope columns
- msm_slice5..slice8: 4 new wNAF slice columns
- lookup_read_counts_2, _3: 2 new lookup read count columns

New WireToBeShiftedWithoutAccumulatorsEntities (+2 columns):
- precompute_tx2, precompute_ty2: 2nd precomputed point per row,
  needs shifting for inter-row point table constraints

Corresponding ShiftedEntities updated with precompute_tx2_shift,
precompute_ty2_shift. CommitmentLabels updated for all new columns.
Extends the ProverPolynomials constructor to populate the 36 new flavor
columns from the builder row data:

Precompute section:
- Wire precompute_s5hi..s8lo from point_table_rows[i].s9..s16
- Wire precompute_tx2/ty2 from point_table_rows[i].precompute_accumulator2

MSM section (all from add_state[4..7]):
- Wire msm_add5..add8 from add_state[4..7].add
- Wire msm_x5..x8, msm_y5..y8 from add_state[4..7].point
- Wire msm_collision_x5..x8 from add_state[4..7].collision_inverse
- Wire msm_lambda5..lambda8 from add_state[4..7].lambda
- Wire msm_slice5..slice8 from add_state[4..7].slice

lookup_read_counts_2/_3 columns are declared but not yet populated;
they will be wired when the lookup relation is updated to support
4 table terms per precompute row.
The MSM relation now supports 8 point additions per row (was 4).
The doubling chain remains 4-wide (= wNAF digit width w = 4).

Key changes:
- Addition chain: first_add + 7 conditional adds (was first_add + 3)
- Skew chain: 8 conditional skew additions (was 4)
- Collision checks: 8 inverse checks (was 4)
- Slice-zero enforcement: 8 checks (was 4)
- Count update: sum of add1..add8 (was add1..add4)
- Addition continuity: add{i+1} * (-add{i} + 1) for i=1..7 (was 1..3)
- Cross-row continuity: (-add8 + 1) * add1_shift (was -add4 + 1)

Subrelation count: 47 -> 67 (20 new subrelations)
New subrelations: ADD slopes 5-8, SKEW slopes 5-8, collision 5-8,
slice-zero 5-8, continuity add5-8.

MAX_PARTIAL_RELATION_LENGTH for this relation: 8 -> 12 (due to the
longer addition chain increasing the degree of the accumulator output).
Extend the bools relation with 4 new boolean constraints for the
msm_add5 through msm_add8 columns (indices 23-26).
Subrelation count: 23 -> 27.
Update ecc_wnaf_relation to process 8 wNAF digits per precompute row
(was 4), halving the number of rows per scalar from 8 to 4.

Key changes:
- SUBRELATION_PARTIAL_LENGTHS expanded from 23 to 35 entries
- 16 two-bit range checks (was 8) for slices s1hi..s8lo
- 8 wNAF conversions w0..w7 (was 4 w0..w3)
- Horner accumulation uses 2^32 shift (was 2^16) for 8 digits
- Round max changed from 7 to 3 (NUM_WNAF_DIGITS_PER_SCALAR/8 - 1)
- Added slice-zero checks for w4..w7 (subrelations 31-34)
- Updated header docstring to reflect 4-row layout
…ed tuples

Update ECCVMSetRelation for 8-wide precompute and MSM tables:

Numerator changes:
- 8 slice fingerprints instead of 4, with round encoding 8*round+j
- Scalar reconstruction uses 8 wNAF digits with 2^32 shift (was 4, 2^16)
- Skew tuple uses round offset 8 (was 4)
- eccvm_set_permutation_delta comment updated for 8-term product

Denominator changes:
- 8 add-gated (pc, round, slice) tuples instead of 4
- PC offsets 0..7 (was 0..3) for msm_add1..msm_add8

SUBRELATION_PARTIAL_LENGTHS updated to {29, 3} (was {22, 3}) to
accommodate the higher degree from the 8-wide grand product.
Update ECCVMLookupRelation for 8-wide MSM and 2 precomputed points
per precompute row:

- NUM_LOOKUP_TERMS: 4 -> 8 (msm_add1..msm_add8 gated reads)
- NUM_TABLE_TERMS: 2 -> 4 (positive/negative for each of 2 points)
- LENGTH: 9 -> 15

Table term structure (4 terms covering all 16 slice values):
  - table_index 0: point 1 positive, slice = 15 - 2*round -> {15,13,11,9}
  - table_index 1: point 1 negative, slice = 2*round      -> {0,2,4,6}
  - table_index 2: point 2 positive, slice = 14 - 2*round -> {14,12,10,8}
  - table_index 3: point 2 negative, slice = 2*round + 1  -> {1,3,5,7}

Lookup read counts expanded from 2 to 4 columns
(lookup_read_counts_0..3) to match the 4 table terms.
…tion

Update ECCVMPointTableRelation for 2 precomputed points per row
(Tx/Ty and Tx2/Ty2):

SUBRELATION_PARTIAL_LENGTHS expanded from 6 to 8 entries:
- Subrelations 0-1: Doubling constraint, now uses Tx2/Ty2 as the base
  point (at transition row, Tx2=P so Dx=2P)
- Subrelations 2-3: Dx/Dy continuity (unchanged)
- Subrelations 4-5: NEW intra-row addition (Tx = Tx2 + Dx), gated by
  precompute_select. Validates first point = second point + 2P.
- Subrelations 6-7: NEW inter-row addition (Tx2 = Tx_shift + Dx), gated
  by not-transition and not-first-row. Validates second point of row i
  equals first point of row i+1 plus 2P.

Row layout example for point P:
  round 0: Tx=15P, Tx2=13P | round 1: Tx=11P, Tx2=9P
  round 2: Tx=7P,  Tx2=5P  | round 3: Tx=3P,  Tx2=P
With 8 wNAF digits per precompute row, the zero-tuple fingerprint
used for padding inactive rows must be the product of 8 terms
(γ + j·β² + t·β⁴) for j = 0..7, rather than 4 terms for j = 0..3.

Updated in all three locations:
- eccvm_prover.cpp
- eccvm_verifier.cpp
- eccvm_trace_checker.cpp
Add benchmarks for ECCVM relation evaluation using Sumcheck
univariates (prover-side work), in addition to the existing
values-based benchmarks (verifier-side work).
- Set all MSM relation SUBRELATION_PARTIAL_LENGTHS to 12 (was mixed
  8/12). Required because the View type is derived from the max
  partial length subrelation (index 0 = 12), so all intermediate
  Univariates are 12-wide and can only be accumulated into 12-wide
  accumulators.
- Fixed element count: was 68, now 67 (matching the array declaration).
- Removed unused Tx2_shift/Ty2_shift variables from point table
  relation (the inter-row constraint uses Tx_shift/Ty_shift, not
  the shifted versions of the second point).
Two bugs fixed:

1. batch_normalize crash on zero z-coordinates: With ADDITIONS_PER_ROW=8
   and DOUBLINGS_PER_ROW=4, doubling rows only use 4 of 8 trace slots.
   The unused slots had default Element{} with z=0 (point at infinity),
   causing batch_normalize to fail when inverting z-coordinates. Fix:
   fill unused slots with valid (non-infinity) dummy points and track
   which slots are used via is_used vector to skip them during
   collision_inverse computation.

2. Signed integer overflow in precomputed_tables_builder: With 8 wNAF
   digits per row, row_chunk = slice0 * (1<<28) can reach ~4 billion,
   exceeding INT_MAX. This was undefined behavior causing incorrect
   scalar_sum values. Fix: use int64_t for row_chunk computation.
…r-row layout

Three fixes:

1. Lookup read counts: Reworked MSM builder to return 4 read count
   columns (was 2). With 2 precomputed points per row and 4 table terms
   in the lookup relation, each compressed slice value maps to one of 4
   tables based on parity and magnitude:
   - Table 0: odd slices >= 8 (point 1 positive)
   - Table 1: even slices < 8 (point 1 negative)
   - Table 2: even slices >= 8 (point 2 positive)
   - Table 3: odd slices < 8 (point 2 negative)
   ProverPolynomials now wires all 4 read count columns.

2. Set relation second term: Changed base point reference from
   precompute_tx/ty to precompute_tx2/ty2. In the 2-point-per-row
   layout, the base point P is stored in tx2/ty2 at the transition
   row (round=3), not in tx/ty (which holds 3P).

3. Removed debug trace code from trace checker.
- eccvm.test.cpp: Fix eccvm_set_permutation_delta in
  complete_proving_key_for_test() to compute the product of 8
  zero-tuple fingerprints (γ + j·β² + t·β⁴) for j=0..7, matching
  the prover/verifier/trace_checker. Previously only computed 4 terms,
  causing CommittedSumcheck test to fail.

- eccvm_transcript.test.cpp: Update hardcoded prover manifest in
  construct_eccvm_honk_manifest() with all new wire columns added
  for the 8-wide layout:
  - PRECOMPUTE_S5HI through PRECOMPUTE_S8LO (8 columns)
  - MSM_ADD5 through MSM_ADD8 (4 columns)
  - MSM_X5/Y5 through MSM_X8/Y8 (8 columns)
  - MSM_COLLISION_X5 through MSM_COLLISION_X8 (4 columns)
  - MSM_LAMBDA5 through MSM_LAMBDA8 (4 columns)
  - MSM_SLICE5 through MSM_SLICE8 (4 columns)
  - LOOKUP_READ_COUNTS_2, LOOKUP_READ_COUNTS_3 (2 columns)
  - PRECOMPUTE_TX2, PRECOMPUTE_TY2 (2 columns)

All 41 eccvm_tests now pass.
The ECCVM recursive verifier gate count increased from 224,657 to
269,130 due to the wider relation columns and higher-degree
subrelations in the 8-wide ECCVM layout. The recursive flavor
inherits all entity/relation changes automatically from the native
ECCVMFlavor via templates, so no code changes were needed in the
stdlib recursive verifier itself.
Update comments across msm_builder.hpp, eccvm_flavor.hpp, and
ecc_lookup_relation_impl.hpp to reflect the new 8-wide layout:
- "4 point-additions per row" → 8
- "size-4 array" → size-8
- "result of four EC additions" → eight
- Document msm_x/y/add/lambda/slice/collision_x 1..8
- Document precompute_s1..s8 (8 slices per row)
- Document precompute_tx2/ty2 (2 points per row)
- Document all 4 lookup_read_counts columns
@notnotraju notnotraju added the ci-barretenberg Run all barretenberg/cpp checks. label Mar 18, 2026
Remove ECCVM univariate benchmark additions from relations.bench.cpp
to keep this PR focused on the 8-wide layout change.
@notnotraju notnotraju changed the title feat(eccvm): widen ECCVM layout (8 additions/row, 8 wnaf digits/row) feat: widen ECCVM layout (8 additions/row, 8 wnaf digits/row) Mar 18, 2026
notnotraju added 4 commits March 18, 2026 11:05
- Fix ECCVM proof size in design doc: ~716 → 756 Fr (confirmed by
  static_assert in proof_compression.hpp)
- Correct set relation degree comments: denominator sub-products are
  16 (8 add-gated tuples) + 6 (transcript z1/z2) + 4 (MSM output) = 26,
  not the previously claimed 28. Full GP subrelation degree = 27,
  partial length upper bound = 29.
- Fix duplicate comment blocks in set relation numerator/denominator
  third term docstrings
- Update inline cumulative degree annotations throughout
  compute_grand_product_numerator/denominator
Instead of hardcoding 17 apps, compute the max number of app circuits
that fit in the ECCVM based on CONST_ECCVM_LOG_N. Each app adds ~1104
ECCVM rows with ~1494 base overhead. At LOG_N=15: 28 apps; LOG_N=14: 13.
Plan to pack 2 doubling rounds into 1 MSM row (DOUBLINGS_PER_ROW 4→8),
cutting doubling rows from 31 to 16 per MSM. Reuses lambda5..8 on
doubling rows (free since q_add/q_double are mutually exclusive).
No new columns needed. MSM formula: 33*ceil(m/8)+16 (was +31).
The 31 doubling rows per MSM cannot be halved because each occurs
between consecutive digit-slot ADD phases in the Straus algorithm.
Combining two DBL rounds into one row would require 8-bit digits
(point table size 256), which is impractical. The 8-wide change
achieves ~1.65x capacity (17→28 apps), not 2x.
@notnotraju notnotraju marked this pull request as ready for review March 18, 2026 14:22
@notnotraju notnotraju self-assigned this Mar 18, 2026
@notnotraju
Copy link
Contributor Author

Related: AztecProtocol/barretenberg#1654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-barretenberg Run all barretenberg/cpp checks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant