feat: delta gating — drop pre-existing complexity/size violations in diff mode by cffls · Pull Request #9 · 0xPolygon/diffguard

cffls · 2026-05-05T22:02:31Z

Summary

A PR that touches a legacy 4000-line file or a complexity=91 function should not be blamed for that file's size or that function's complexity. Today, diffguard's diff mode picks any function whose line range overlaps the diff and evaluates the absolute metric against the threshold, so legacy code becomes a permanent blocker.

This PR adds delta gating: in diff mode, complexity and size findings are dropped when the metric at the merge-base was already at or close to the head value. Brand-new files/functions absent at base are kept as-is, so absolute thresholds still gate genuinely new debt. Refactoring mode (--paths) is unchanged.

Motivated by bor's Quality metrics CI, which reports 17 complexity violations and 19 size violations on a PR that touches core/blockchain.go, core/state/statedb.go, etc. — most of which are inherited from the touched files, not caused by the PR.

Defaults

Rule	Tolerance	Flag
Per-function complexity	+3	`--complexity-delta-tolerance`
Per-function size	+5 lines	`--function-size-delta-tolerance`
Per-file size	max(10 lines, 5% of base)	`--file-size-delta-tolerance-pct`, `--file-size-delta-tolerance-floor`

The complexity tolerance accounts for cognitive complexity's nesting penalty: threading a parameter through one extra branch on a deeply-nested function easily nudges +1..+3 with nothing materially worse happening. The file-size percentage scales with file size — +99 lines on a 4400-line file is 2% (forgiven), but +298 on a 2000-line file is 15% (still flagged); the floor stops tiny additions to small files from collapsing below the noise level.

How it works

diff.Result.MergeBase is now populated by Parse (empty for CollectPaths / refactoring mode).
diff.ShowAtRef(repo, ref, path) fetches a file's contents at ref, returning (nil, nil) for paths absent from the tree and bubbling up errors for bad refs / non-repos. We disambiguate via stderr ("does not exist in") because git show exits 128 for several conditions; a misconfigured CI must not silently weaken the gate.
internal/baseline writes the base content to a temp file (extension preserved) so existing language analyzers run unchanged with a full-coverage synthetic FileChange. No language interface changes — works for Go, TypeScript and Rust via the same orchestrator.
complexity.Analyze and sizes.Analyze separate analyzed (full set, used for stats / mean / median / max) from candidates (post-filter subset, used for findings) so the section header still describes the whole diff even when the gate drops every legacy violation.
Findings render (+N vs base) when the function/file existed at base; brand-new code keeps the bare form so the report distinguishes "added new hot code" from "made existing hot code hotter".

Behavior matrix

Complexity (threshold 10, tolerance +3)

Case	Result	Note
Touch line in c=91 legacy function (head=base)	PASS	Comment-level edits inside a complex function don't fail.
Bump function from c=33 to c=34	PASS	Δ=+1, within tolerance.
Bump function from c=72 to c=77	FAIL: `complexity=77 (+5 vs base)`	Δ=+5, exceeds tolerance.
Add brand-new c=20 function	FAIL: `complexity=20`	No baseline — gated by absolute threshold.

Function size (threshold 50, tolerance +5)

Case	Result	Note
80-line legacy function, only comment changes	PASS	head==base.
60-line function, +3 lines (logging)	PASS	Δ=+3, within tolerance.
60-line function, +20 lines (new branch)	FAIL: `function=80 lines (+20 vs base)`	Δ exceeds tolerance.
Brand-new 60-line function	FAIL: `function=60 lines`	No baseline.

File size (threshold 500, tolerance max(10, 5%))

Case	Result	Note
4000-line file, +5 lines (one-line fix)	PASS	Below 10-line floor.
600-line file, +20 lines	PASS	5% of 600 = 30; 20 ≤ 30.
600-line file, +50 lines	FAIL: `file=650 lines (+50 vs base)`	50 > 30.
4400-line file, +99 lines	PASS	99 / 4400 = 2.3%, below 5%.
2000-line file, +298 lines	FAIL: `file=2298 lines (+298 vs base)`	14.7%, exceeds 5%.
Brand-new file	FAIL if over absolute threshold	No baseline.

Mode

Mode	Behavior
Diff mode (`--base <ref>`)	Delta gating active.
Refactoring mode (`--paths`)	Unchanged: absolute thresholds.

Bor result (vs origin/develop)

Metric	Original CI	After delta gating + tolerances
Complexity violations	17	7 (10 dropped: 7 unchanged + 3 within +3)
Size violations (file + function)	19	12 (7 dropped: 4 below 5% / 10-line floor + 3 within +5)

Stats line now reads 285 functions analyzed | Max: 91 | 7 over threshold — the full diff is described even when delta gating drops most legacy violations.

Tests

complexity_delta_test.go — pre-existing dropped, increased kept, new function kept, new file kept, tolerance boundary; table-driven worsened predicate.
sizes_delta_test.go — same matrix for per-function and per-file size; table-driven grewFunc, grewFile, fileGrowthTolerance.
multilang_delta_test.go (in both packages) — TypeScript + Rust end-to-end.
show_at_ref_test.go — happy path, absent path returns (nil, nil), bad ref / non-repo surface errors.
internal/baseline/baseline_test.go — FetchToTemp extension preservation, absent / bad-ref handling.

Test plan

make build clean
make test all green
./diffguard . passes (T1 mutation 94.6%, T2 77.8%, overall 78.9%)
Smoke-tested on bor blockstm_redesign branch end-to-end

🤖 Generated with Claude Code

In diff mode, complexity and size gates now compare each finding to its value at the merge-base and drop findings that did not get worse on this branch. A PR that touches a 4000-line legacy file or a complexity=91 function without making it bigger or more complex no longer fails CI on metrics it inherited. How it works: - diff.Result.MergeBase is populated by Parse (empty for refactoring mode via CollectPaths). - diff.ShowAtRef fetches a file's contents at the merge-base, returning (nil, nil) for paths absent from the tree and bubbling up errors for bad refs / non-repos so a misconfigured CI doesn't silently weaken the gate. - internal/baseline writes the base content to a temp file (extension preserved) so existing language analyzers run unchanged with a full-coverage synthetic FileChange. - complexity.Analyze and sizes.Analyze drop findings where base ≥ head (function: complexity / line count; file: line count). New files/functions absent from base are kept so absolute thresholds still gate genuinely new debt. Refactoring mode (--paths) leaves MergeBase empty and falls through to absolute thresholds — no behavior change there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mplexity - Findings now render '(+N vs base)' when the function/file existed at the merge-base, so PR authors can tell "added new hot code" from "made existing hot code hotter". Brand-new code stays with the bare message. - New --complexity-delta-tolerance flag (default 3) — head must exceed base by *more than* tolerance to count as worsened. Forgives the +1/+2/+3 noise that's typical when threading a parameter through a legacy hot function. Size deltas remain strict (tolerance=0); cheap to add later if needed. - The orchestrators thread per-file/per-function delta maps to the formatters; helper extraction (recordDelta, recordFuncDelta) keeps applyDeltaFilter / filterFuncSizes under the cognitive-complexity threshold. - New tests cover formatComplexityMsg / formatFuncSizeMsg / formatFileSizeMsg (lock subtraction direction so the head-base operand order doesn't silently flip), and the previously-untested internal/baseline package. On the bor blockstm_redesign branch (vs origin/develop), tolerance=3 drops three +1..+3 regressions (ProcessBlock, subfetcher.loop, stateTransition.execute) while keeping legitimate +5/+4 worsenings (IntermediateRoot, getStateObject) and the brand-new block-stm v2 functions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cffls · 2026-05-05T22:14:00Z

Pushed two follow-ups (b31e892):

Delta in messages — findings now render (+N vs base) when the function/file existed at the merge-base. Brand-new code keeps the bare form so the report distinguishes "added new hot code" from "made existing hot code hotter".

--complexity-delta-tolerance flag (default 3) — head must exceed base by more than tolerance for a regression to fail. Forgives the +1/+2/+3 noise that's typical when threading a parameter through a legacy hot function. Size deltas stay strict (tolerance=0); easy to add --function-size-delta-tolerance / --file-size-delta-tolerance later if needed.

Tested on bor's blockstm_redesign branch (vs origin/develop): tolerance=3 drops three small regressions (ProcessBlock +1, subfetcher.loop +2, stateTransition.execute +3) while keeping legitimate worsenings (IntermediateRoot +5, getStateObject +4) and brand-new block-stm v2 functions.

Self-check passes (T1 mutation 100%, T2 75%, overall 80%).

…d findings Previous commit reassigned the analyzer's full result list to the post-delta-filter subset, which corrupted the section's mean / median / max / total-functions stats. Worse: when the filter dropped *every* finding (e.g. the only complex function in the diff was untouched), the section reported "No changed functions to analyze" — which is the empty-input case, not the all-clear case. Surfaced via TS/Rust smoke tests where a single-file diff has exactly one analyzable function. Both Analyze functions now thread two lists: `analyzed` (everything the language calculator returned, used for stats) and `candidates` (the post-filter subset eligible to become findings). buildSection stays a single function, taking both. Also adds multi-language regression tests: - internal/complexity/multilang_delta_test.go covers TypeScript and Rust delta gating end-to-end. - internal/sizes/multilang_delta_test.go does the same for sizes. Both confirm that the orchestrator drives whichever ComplexityCalculator / FunctionExtractor each language registers, and that the temp-file-with-preserved-extension trick works for the tree-sitter analyzers as well as Go's go/parser. On the bor blockstm_redesign branch (vs origin/develop), the stats now read "285 functions analyzed | Max: 91 | 7 over threshold" — the full diff is described, with 7 violations vs the original 17. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cffls · 2026-05-05T22:22:05Z

Good catch — the orchestrator was language-agnostic in principle, but I hadn't actually exercised the TS/Rust paths and ran straight into a related bug.

Pushed cbc2481:

Bug fix: previous commits reassigned the analyzer's full result list to the post-filter subset, which corrupted the stats line. When the filter dropped every finding (e.g. a single-file TS diff where the only complex function was untouched), the section reported "No changed functions to analyze" — the empty-input message, not "all clear". Both Analyze functions now thread two lists: analyzed (full, for stats) and candidates (filtered, for findings).

Multi-language regression tests: internal/complexity/multilang_delta_test.go and internal/sizes/multilang_delta_test.go exercise TypeScript and Rust end-to-end. Both confirm the orchestrator drives whichever calculator/extractor each language registers, and that the temp-file-with-preserved-extension trick works for the tree-sitter front-ends as well as Go's go/parser.

On the bor branch the stats line now reads "285 functions analyzed | Max: 91 | 7 over threshold" — full diff described, 7 violations (vs the original 17). Self-check still passes (T1 100%, T2 75%, overall 80%).

Function size: --function-size-delta-tolerance flag, default 5 lines. Threading a parameter through a 60-line function or adding a couple of defensive guards no longer fails on a function that was already over threshold. File size: --file-size-delta-tolerance-pct (default 5) + --file-size-delta-tolerance-floor (default 10). The drop rule is "growth ≤ max(floor, base × pct%)". Percentage scales with file size: +99 lines on a 4382-line file (2.3%) is forgiven, but +298 on a 2031-line file (14.7%) still flags. The floor stops "+8 to a 600-line file" (which is 1.3%) from over-firing on small files where %-of-base collapses below the noise level. Tolerances live behind a new sizes.DeltaTolerances struct so adding language-specific or per-rule overrides later is local. Zero value = strict (preserves previous behavior for any caller that doesn't populate it). On the bor blockstm_redesign branch, this drops 7 further findings: files dropped: blockchain.go +99 (2.3%), instructions.go +16 (1.5%), state_transition.go +11 (1.4%), trie_prefetcher.go +18 (3.4%) funcs dropped: stateTransition.execute +1, NewEVM +4, updateTrie (no growth), preloadAddressAsync +4 kept legitimately: statedb.go +298 (14.7%), pps.go +705 (145%), trie.go +59 (6.8%), reader.go +76 (10.8%), evm.go +57 (7.7%), plus brand-new files / functions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cffls · 2026-05-05T22:47:10Z

Pushed 04617aa: size-delta tolerances.

New defaults:

Rule	Tolerance	Flag
Per-function size	5 lines	`--function-size-delta-tolerance`
Per-file size	`max(10 lines, 5% of base)`	`--file-size-delta-tolerance-pct`, `--file-size-delta-tolerance-floor`

The percentage rule scales with file size — +99 lines on a 4382-line file (2.3%) is forgiven, but +298 on a 2031-line file (14.7%) still fails. The floor stops tiny additions to small files from collapsing below the noise level when % gives a sub-10-line threshold.

On the bor branch this drops 7 further findings on top of the prior delta gate:

Files dropped: blockchain.go +99 (2.3%), instructions.go +16 (1.5%), state_transition.go +11 (1.4%), trie_prefetcher.go +18 (3.4%).
Functions dropped: stateTransition.execute +1, NewEVM +4, updateTrie (no growth), preloadAddressAsync +4.
Kept legitimately: statedb.go +298 (14.7%), pps.go +705 (145%), trie.go +59 (6.8%), reader.go +76 (10.8%), evm.go +57 (7.7%), plus brand-new files / functions.

The tolerances live behind a new sizes.DeltaTolerances struct so per-rule overrides stay local. Zero value preserves strict behavior for any caller that doesn't populate it.

Self-check passes (T1 mutation 94.6%, T2 77.8%, overall 78.9%).

pratikspatil024

Should we also update the README with the new flags?

cffls and others added 2 commits May 5, 2026 15:02

cffls requested review from donnfelker and pratikspatil024 May 5, 2026 22:50

pratikspatil024 approved these changes May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: delta gating — drop pre-existing complexity/size violations in diff mode#9

feat: delta gating — drop pre-existing complexity/size violations in diff mode#9
cffls wants to merge 4 commits intomainfrom
feat/delta-gating

cffls commented May 5, 2026 •

edited

Loading

Uh oh!

cffls commented May 5, 2026

Uh oh!

cffls commented May 5, 2026

Uh oh!

cffls commented May 5, 2026

Uh oh!

pratikspatil024 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cffls commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Defaults

How it works

Behavior matrix

Complexity (threshold 10, tolerance +3)

Function size (threshold 50, tolerance +5)

File size (threshold 500, tolerance max(10, 5%))

Mode

Bor result (vs origin/develop)

Tests

Test plan

Uh oh!

cffls commented May 5, 2026

Uh oh!

cffls commented May 5, 2026

Uh oh!

cffls commented May 5, 2026

Uh oh!

pratikspatil024 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cffls commented May 5, 2026 •

edited

Loading