feat: widen ECCVM layout (8 additions/row, 8 wnaf digits/row)#21721
Open
notnotraju wants to merge 24 commits intomerge-train/barretenbergfrom
Open
feat: widen ECCVM layout (8 additions/row, 8 wnaf digits/row)#21721notnotraju wants to merge 24 commits intomerge-train/barretenbergfrom
notnotraju wants to merge 24 commits intomerge-train/barretenbergfrom
Conversation
added 19 commits
March 17, 2026 13:41
This is the first step toward halving the Precomputed and MSM table heights by doubling their width. The key changes: - WNAF_DIGITS_PER_ROW: 4 -> 8 (process 8 wNAF digits per precompute row) - ADDITIONS_PER_ROW: 4 -> 8 (process 8 point additions per MSM row) - DOUBLINGS_PER_ROW: new constant, always NUM_WNAF_DIGIT_BITS (= 4) The new DOUBLINGS_PER_ROW constant decouples the doubling chain length (which must remain 4, matching the wNAF digit width w=4) from ADDITIONS_PER_ROW (which we are doubling to 8). Previously, these were conflated because ADDITIONS_PER_ROW happened to equal NUM_WNAF_DIGIT_BITS.
Key changes to MSMRow and trace computation: - AddState array: hardcoded size 4 -> ADDITIONS_PER_ROW (now 8) - Doubling loops: use DOUBLINGS_PER_ROW (= 4) instead of ADDITIONS_PER_ROW for the doubling phase, since we always do w=4 doublings regardless of how many additions we pack per row - Trace sizing: (num_msm_rows - 2) * 4 -> * ADDITIONS_PER_ROW - trace_index computation: * 4 -> * ADDITIONS_PER_ROW - After doubling loops, advance trace_index by (ADDITIONS_PER_ROW - DOUBLINGS_PER_ROW) to skip unused slots allocated in the point trace - Final row add_state: use ADDITIONS_PER_ROW-sized array fill
With WNAF_DIGITS_PER_ROW doubled from 4 to 8: - num_rows_per_scalar drops from 8 to 4 (32 digits / 8 per row) - Each row now encodes 8 wNAF digits via 16 two-bit slices (s1..s16), up from 4 digits / 8 slices (s1..s8) - Each row stores 2 precomputed points (precompute_accumulator and precompute_accumulator2), since we have 8 points to store across 4 rows. Row i stores table[POINT_TABLE_SIZE-1-2i] and table[POINT_TABLE_SIZE-2-2i]. - Horner scalar accumulation shifts by 2^32 (was 2^16) since each row now contributes 8*4 = 32 bits of scalar data. - row_chunk computation extended to sum all 8 wNAF digits. - Removed static_assert(WNAF_DIGITS_PER_ROW == 4), replaced with static_assert(WNAF_DIGITS_PER_ROW == 8). - Updated POINT_TABLE_SIZE/2 == num_rows_per_scalar*2 assert to reflect the new 2-points-per-row layout.
Updates ECCVMFlavor entity counts and column definitions: NUM_WIRES: 85 -> 121 NUM_ALL_ENTITIES: 118 -> 156 NUM_WITNESS_ENTITIES: 87 -> 123 NUM_SHIFTED_ENTITIES: 26 -> 28 New WireNonShiftedEntities (+34 columns): - precompute_s5hi..s8lo: 8 new 2-bit slice columns for digits 5-8 - msm_add5..add8: 4 new addition selector columns - msm_x5..x8, msm_y5..y8: 8 new point coordinate columns - msm_collision_x5..x8: 4 new collision inverse columns - msm_lambda5..lambda8: 4 new slope columns - msm_slice5..slice8: 4 new wNAF slice columns - lookup_read_counts_2, _3: 2 new lookup read count columns New WireToBeShiftedWithoutAccumulatorsEntities (+2 columns): - precompute_tx2, precompute_ty2: 2nd precomputed point per row, needs shifting for inter-row point table constraints Corresponding ShiftedEntities updated with precompute_tx2_shift, precompute_ty2_shift. CommitmentLabels updated for all new columns.
Extends the ProverPolynomials constructor to populate the 36 new flavor columns from the builder row data: Precompute section: - Wire precompute_s5hi..s8lo from point_table_rows[i].s9..s16 - Wire precompute_tx2/ty2 from point_table_rows[i].precompute_accumulator2 MSM section (all from add_state[4..7]): - Wire msm_add5..add8 from add_state[4..7].add - Wire msm_x5..x8, msm_y5..y8 from add_state[4..7].point - Wire msm_collision_x5..x8 from add_state[4..7].collision_inverse - Wire msm_lambda5..lambda8 from add_state[4..7].lambda - Wire msm_slice5..slice8 from add_state[4..7].slice lookup_read_counts_2/_3 columns are declared but not yet populated; they will be wired when the lookup relation is updated to support 4 table terms per precompute row.
The MSM relation now supports 8 point additions per row (was 4).
The doubling chain remains 4-wide (= wNAF digit width w = 4).
Key changes:
- Addition chain: first_add + 7 conditional adds (was first_add + 3)
- Skew chain: 8 conditional skew additions (was 4)
- Collision checks: 8 inverse checks (was 4)
- Slice-zero enforcement: 8 checks (was 4)
- Count update: sum of add1..add8 (was add1..add4)
- Addition continuity: add{i+1} * (-add{i} + 1) for i=1..7 (was 1..3)
- Cross-row continuity: (-add8 + 1) * add1_shift (was -add4 + 1)
Subrelation count: 47 -> 67 (20 new subrelations)
New subrelations: ADD slopes 5-8, SKEW slopes 5-8, collision 5-8,
slice-zero 5-8, continuity add5-8.
MAX_PARTIAL_RELATION_LENGTH for this relation: 8 -> 12 (due to the
longer addition chain increasing the degree of the accumulator output).
Extend the bools relation with 4 new boolean constraints for the msm_add5 through msm_add8 columns (indices 23-26). Subrelation count: 23 -> 27.
Update ecc_wnaf_relation to process 8 wNAF digits per precompute row (was 4), halving the number of rows per scalar from 8 to 4. Key changes: - SUBRELATION_PARTIAL_LENGTHS expanded from 23 to 35 entries - 16 two-bit range checks (was 8) for slices s1hi..s8lo - 8 wNAF conversions w0..w7 (was 4 w0..w3) - Horner accumulation uses 2^32 shift (was 2^16) for 8 digits - Round max changed from 7 to 3 (NUM_WNAF_DIGITS_PER_SCALAR/8 - 1) - Added slice-zero checks for w4..w7 (subrelations 31-34) - Updated header docstring to reflect 4-row layout
…ed tuples
Update ECCVMSetRelation for 8-wide precompute and MSM tables:
Numerator changes:
- 8 slice fingerprints instead of 4, with round encoding 8*round+j
- Scalar reconstruction uses 8 wNAF digits with 2^32 shift (was 4, 2^16)
- Skew tuple uses round offset 8 (was 4)
- eccvm_set_permutation_delta comment updated for 8-term product
Denominator changes:
- 8 add-gated (pc, round, slice) tuples instead of 4
- PC offsets 0..7 (was 0..3) for msm_add1..msm_add8
SUBRELATION_PARTIAL_LENGTHS updated to {29, 3} (was {22, 3}) to
accommodate the higher degree from the 8-wide grand product.
Update ECCVMLookupRelation for 8-wide MSM and 2 precomputed points
per precompute row:
- NUM_LOOKUP_TERMS: 4 -> 8 (msm_add1..msm_add8 gated reads)
- NUM_TABLE_TERMS: 2 -> 4 (positive/negative for each of 2 points)
- LENGTH: 9 -> 15
Table term structure (4 terms covering all 16 slice values):
- table_index 0: point 1 positive, slice = 15 - 2*round -> {15,13,11,9}
- table_index 1: point 1 negative, slice = 2*round -> {0,2,4,6}
- table_index 2: point 2 positive, slice = 14 - 2*round -> {14,12,10,8}
- table_index 3: point 2 negative, slice = 2*round + 1 -> {1,3,5,7}
Lookup read counts expanded from 2 to 4 columns
(lookup_read_counts_0..3) to match the 4 table terms.
…tion Update ECCVMPointTableRelation for 2 precomputed points per row (Tx/Ty and Tx2/Ty2): SUBRELATION_PARTIAL_LENGTHS expanded from 6 to 8 entries: - Subrelations 0-1: Doubling constraint, now uses Tx2/Ty2 as the base point (at transition row, Tx2=P so Dx=2P) - Subrelations 2-3: Dx/Dy continuity (unchanged) - Subrelations 4-5: NEW intra-row addition (Tx = Tx2 + Dx), gated by precompute_select. Validates first point = second point + 2P. - Subrelations 6-7: NEW inter-row addition (Tx2 = Tx_shift + Dx), gated by not-transition and not-first-row. Validates second point of row i equals first point of row i+1 plus 2P. Row layout example for point P: round 0: Tx=15P, Tx2=13P | round 1: Tx=11P, Tx2=9P round 2: Tx=7P, Tx2=5P | round 3: Tx=3P, Tx2=P
With 8 wNAF digits per precompute row, the zero-tuple fingerprint used for padding inactive rows must be the product of 8 terms (γ + j·β² + t·β⁴) for j = 0..7, rather than 4 terms for j = 0..3. Updated in all three locations: - eccvm_prover.cpp - eccvm_verifier.cpp - eccvm_trace_checker.cpp
Add benchmarks for ECCVM relation evaluation using Sumcheck univariates (prover-side work), in addition to the existing values-based benchmarks (verifier-side work).
- Set all MSM relation SUBRELATION_PARTIAL_LENGTHS to 12 (was mixed 8/12). Required because the View type is derived from the max partial length subrelation (index 0 = 12), so all intermediate Univariates are 12-wide and can only be accumulated into 12-wide accumulators. - Fixed element count: was 68, now 67 (matching the array declaration). - Removed unused Tx2_shift/Ty2_shift variables from point table relation (the inter-row constraint uses Tx_shift/Ty_shift, not the shifted versions of the second point).
Two bugs fixed:
1. batch_normalize crash on zero z-coordinates: With ADDITIONS_PER_ROW=8
and DOUBLINGS_PER_ROW=4, doubling rows only use 4 of 8 trace slots.
The unused slots had default Element{} with z=0 (point at infinity),
causing batch_normalize to fail when inverting z-coordinates. Fix:
fill unused slots with valid (non-infinity) dummy points and track
which slots are used via is_used vector to skip them during
collision_inverse computation.
2. Signed integer overflow in precomputed_tables_builder: With 8 wNAF
digits per row, row_chunk = slice0 * (1<<28) can reach ~4 billion,
exceeding INT_MAX. This was undefined behavior causing incorrect
scalar_sum values. Fix: use int64_t for row_chunk computation.
…r-row layout Three fixes: 1. Lookup read counts: Reworked MSM builder to return 4 read count columns (was 2). With 2 precomputed points per row and 4 table terms in the lookup relation, each compressed slice value maps to one of 4 tables based on parity and magnitude: - Table 0: odd slices >= 8 (point 1 positive) - Table 1: even slices < 8 (point 1 negative) - Table 2: even slices >= 8 (point 2 positive) - Table 3: odd slices < 8 (point 2 negative) ProverPolynomials now wires all 4 read count columns. 2. Set relation second term: Changed base point reference from precompute_tx/ty to precompute_tx2/ty2. In the 2-point-per-row layout, the base point P is stored in tx2/ty2 at the transition row (round=3), not in tx/ty (which holds 3P). 3. Removed debug trace code from trace checker.
- eccvm.test.cpp: Fix eccvm_set_permutation_delta in complete_proving_key_for_test() to compute the product of 8 zero-tuple fingerprints (γ + j·β² + t·β⁴) for j=0..7, matching the prover/verifier/trace_checker. Previously only computed 4 terms, causing CommittedSumcheck test to fail. - eccvm_transcript.test.cpp: Update hardcoded prover manifest in construct_eccvm_honk_manifest() with all new wire columns added for the 8-wide layout: - PRECOMPUTE_S5HI through PRECOMPUTE_S8LO (8 columns) - MSM_ADD5 through MSM_ADD8 (4 columns) - MSM_X5/Y5 through MSM_X8/Y8 (8 columns) - MSM_COLLISION_X5 through MSM_COLLISION_X8 (4 columns) - MSM_LAMBDA5 through MSM_LAMBDA8 (4 columns) - MSM_SLICE5 through MSM_SLICE8 (4 columns) - LOOKUP_READ_COUNTS_2, LOOKUP_READ_COUNTS_3 (2 columns) - PRECOMPUTE_TX2, PRECOMPUTE_TY2 (2 columns) All 41 eccvm_tests now pass.
The ECCVM recursive verifier gate count increased from 224,657 to 269,130 due to the wider relation columns and higher-degree subrelations in the 8-wide ECCVM layout. The recursive flavor inherits all entity/relation changes automatically from the native ECCVMFlavor via templates, so no code changes were needed in the stdlib recursive verifier itself.
Update comments across msm_builder.hpp, eccvm_flavor.hpp, and ecc_lookup_relation_impl.hpp to reflect the new 8-wide layout: - "4 point-additions per row" → 8 - "size-4 array" → size-8 - "result of four EC additions" → eight - Document msm_x/y/add/lambda/slice/collision_x 1..8 - Document precompute_s1..s8 (8 slices per row) - Document precompute_tx2/ty2 (2 points per row) - Document all 4 lookup_read_counts columns
Remove ECCVM univariate benchmark additions from relations.bench.cpp to keep this PR focused on the 8-wide layout change.
added 4 commits
March 18, 2026 11:05
- Fix ECCVM proof size in design doc: ~716 → 756 Fr (confirmed by static_assert in proof_compression.hpp) - Correct set relation degree comments: denominator sub-products are 16 (8 add-gated tuples) + 6 (transcript z1/z2) + 4 (MSM output) = 26, not the previously claimed 28. Full GP subrelation degree = 27, partial length upper bound = 29. - Fix duplicate comment blocks in set relation numerator/denominator third term docstrings - Update inline cumulative degree annotations throughout compute_grand_product_numerator/denominator
Instead of hardcoding 17 apps, compute the max number of app circuits that fit in the ECCVM based on CONST_ECCVM_LOG_N. Each app adds ~1104 ECCVM rows with ~1494 base overhead. At LOG_N=15: 28 apps; LOG_N=14: 13.
Plan to pack 2 doubling rounds into 1 MSM row (DOUBLINGS_PER_ROW 4→8), cutting doubling rows from 31 to 16 per MSM. Reuses lambda5..8 on doubling rows (free since q_add/q_double are mutually exclusive). No new columns needed. MSM formula: 33*ceil(m/8)+16 (was +31).
The 31 doubling rows per MSM cannot be halved because each occurs between consecutive digit-slot ADD phases in the Straus algorithm. Combining two DBL rounds into one row would require 8-bit digits (point table size 256), which is impractical. The 8-wide change achieves ~1.65x capacity (17→28 apps), not 2x.
Contributor
Author
|
Related: AztecProtocol/barretenberg#1654 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Doubles the width of the ECCVM Precomputed and MSM tables:
WNAF_DIGITS_PER_ROW: 4 → 8 (precomputed table rows halved per scalar)ADDITIONS_PER_ROW: 4 → 8 (MSM addition rows halved per round)DOUBLINGS_PER_ROW: stays at 4 (decoupled from additions)Net effect: for
mshort scalar muls, MSM rows go from33·⌈m/4⌉ + 31to33·⌈m/8⌉ + 31; precompute rows go from8mto4m.Capacity analysis
Each app circuit adds ~1104 ECCVM rows with a base overhead of ~1494 rows. The MaxCapacityPassing test now computes the max app count from
CONST_ECCVM_LOG_Ninstead of hardcoding it.CONST_ECCVM_LOG_NLOG_N=14 does NOT maintain capacity parity: only 13 apps fit (down from 17). The win here is keeping
LOG_N = 15and getting 28 apps — nearly doubling the stack depth.Changes made
Constants & types (
eccvm_builder_types.hpp):WNAF_DIGITS_PER_ROW4→8,ADDITIONS_PER_ROW4→8Flavor (
eccvm_flavor.hpp):msm_add5..8,msm_x5..8,msm_y5..8,msm_collision_x5..8,msm_lambda5..8,msm_slice5..8,precompute_s5hi..s8lo,precompute_tx2,precompute_ty2,lookup_read_counts_2,lookup_read_counts_3Builders:
precomputed_tables_builder.hpp: 8 digits per row, 2 points per row (Tx/Ty+Tx2/Ty2),int64_tfor row_chunk to avoid overflowmsm_builder.hpp: 8 additions per row with 4 doublings, dummy-point padding for unused slots, 4 read-count columns with compressed-slice-to-table mappingRelations (all in
relations/ecc_vm/):ecc_msm_relation: extended to 8 addition constraints per rowecc_bools_relation: boolean checks formsm_add5..8ecc_wnaf_relation: 8 slice decompositions per rowecc_point_table_relation: 2nd precomputed point constraint (Tx2/Ty2)ecc_set_relation: 8 slice fingerprints, 8 add-gated tuples,eccvm_set_permutation_deltaas product of 8 terms; second term usestx2/ty2ecc_lookup_relation: 8 reads, 4 table terms (point1 pos/neg, point2 pos/neg)Prover/verifier (
eccvm_prover.cpp,eccvm_verifier.cpp,eccvm_trace_checker.cpp):eccvm_set_permutation_deltaupdated to 8-term productTest infrastructure:
eccvm.test.cpp: updated delta computation, transcript manifesteccvm_transcript.test.cpp: updated expected prover manifestchonk.test.cpp: MaxCapacityPassing now computes max apps from LOG_N (28 at LOG_N=15)Not included (needed before merge)
constants.nrECCVM proof length)CONST_ECCVM_LOG_N(stay at 15 for capacity, not 14)Test plan
eccvm_testspassCI=1 NO_FAIL_FAST=1 ./bootstrap testpass (includes chonk, goblin, dsl, VK checks)