Skip to content

feat: delta gating — drop pre-existing complexity/size violations in diff mode#9

Open
cffls wants to merge 4 commits intomainfrom
feat/delta-gating
Open

feat: delta gating — drop pre-existing complexity/size violations in diff mode#9
cffls wants to merge 4 commits intomainfrom
feat/delta-gating

Conversation

@cffls
Copy link
Copy Markdown
Collaborator

@cffls cffls commented May 5, 2026

Summary

A PR that touches a legacy 4000-line file or a complexity=91 function should not be blamed for that file's size or that function's complexity. Today, diffguard's diff mode picks any function whose line range overlaps the diff and evaluates the absolute metric against the threshold, so legacy code becomes a permanent blocker.

This PR adds delta gating: in diff mode, complexity and size findings are dropped when the metric at the merge-base was already at or close to the head value. Brand-new files/functions absent at base are kept as-is, so absolute thresholds still gate genuinely new debt. Refactoring mode (--paths) is unchanged.

Motivated by bor's Quality metrics CI, which reports 17 complexity violations and 19 size violations on a PR that touches core/blockchain.go, core/state/statedb.go, etc. — most of which are inherited from the touched files, not caused by the PR.

Defaults

Rule Tolerance Flag
Per-function complexity +3 --complexity-delta-tolerance
Per-function size +5 lines --function-size-delta-tolerance
Per-file size max(10 lines, 5% of base) --file-size-delta-tolerance-pct, --file-size-delta-tolerance-floor

The complexity tolerance accounts for cognitive complexity's nesting penalty: threading a parameter through one extra branch on a deeply-nested function easily nudges +1..+3 with nothing materially worse happening. The file-size percentage scales with file size — +99 lines on a 4400-line file is 2% (forgiven), but +298 on a 2000-line file is 15% (still flagged); the floor stops tiny additions to small files from collapsing below the noise level.

How it works

  • diff.Result.MergeBase is now populated by Parse (empty for CollectPaths / refactoring mode).
  • diff.ShowAtRef(repo, ref, path) fetches a file's contents at ref, returning (nil, nil) for paths absent from the tree and bubbling up errors for bad refs / non-repos. We disambiguate via stderr ("does not exist in") because git show exits 128 for several conditions; a misconfigured CI must not silently weaken the gate.
  • internal/baseline writes the base content to a temp file (extension preserved) so existing language analyzers run unchanged with a full-coverage synthetic FileChange. No language interface changes — works for Go, TypeScript and Rust via the same orchestrator.
  • complexity.Analyze and sizes.Analyze separate analyzed (full set, used for stats / mean / median / max) from candidates (post-filter subset, used for findings) so the section header still describes the whole diff even when the gate drops every legacy violation.
  • Findings render (+N vs base) when the function/file existed at base; brand-new code keeps the bare form so the report distinguishes "added new hot code" from "made existing hot code hotter".

Behavior matrix

Complexity (threshold 10, tolerance +3)

Case Result Note
Touch line in c=91 legacy function (head=base) PASS Comment-level edits inside a complex function don't fail.
Bump function from c=33 to c=34 PASS Δ=+1, within tolerance.
Bump function from c=72 to c=77 FAIL: complexity=77 (+5 vs base) Δ=+5, exceeds tolerance.
Add brand-new c=20 function FAIL: complexity=20 No baseline — gated by absolute threshold.

Function size (threshold 50, tolerance +5)

Case Result Note
80-line legacy function, only comment changes PASS head==base.
60-line function, +3 lines (logging) PASS Δ=+3, within tolerance.
60-line function, +20 lines (new branch) FAIL: function=80 lines (+20 vs base) Δ exceeds tolerance.
Brand-new 60-line function FAIL: function=60 lines No baseline.

File size (threshold 500, tolerance max(10, 5%))

Case Result Note
4000-line file, +5 lines (one-line fix) PASS Below 10-line floor.
600-line file, +20 lines PASS 5% of 600 = 30; 20 ≤ 30.
600-line file, +50 lines FAIL: file=650 lines (+50 vs base) 50 > 30.
4400-line file, +99 lines PASS 99 / 4400 = 2.3%, below 5%.
2000-line file, +298 lines FAIL: file=2298 lines (+298 vs base) 14.7%, exceeds 5%.
Brand-new file FAIL if over absolute threshold No baseline.

Mode

Mode Behavior
Diff mode (--base <ref>) Delta gating active.
Refactoring mode (--paths) Unchanged: absolute thresholds.

Bor result (vs origin/develop)

Metric Original CI After delta gating + tolerances
Complexity violations 17 7 (10 dropped: 7 unchanged + 3 within +3)
Size violations (file + function) 19 12 (7 dropped: 4 below 5% / 10-line floor + 3 within +5)

Stats line now reads 285 functions analyzed | Max: 91 | 7 over threshold — the full diff is described even when delta gating drops most legacy violations.

Tests

  • complexity_delta_test.go — pre-existing dropped, increased kept, new function kept, new file kept, tolerance boundary; table-driven worsened predicate.
  • sizes_delta_test.go — same matrix for per-function and per-file size; table-driven grewFunc, grewFile, fileGrowthTolerance.
  • multilang_delta_test.go (in both packages) — TypeScript + Rust end-to-end.
  • show_at_ref_test.go — happy path, absent path returns (nil, nil), bad ref / non-repo surface errors.
  • internal/baseline/baseline_test.goFetchToTemp extension preservation, absent / bad-ref handling.

Test plan

  • make build clean
  • make test all green
  • ./diffguard . passes (T1 mutation 94.6%, T2 77.8%, overall 78.9%)
  • Smoke-tested on bor blockstm_redesign branch end-to-end

🤖 Generated with Claude Code

cffls and others added 2 commits May 5, 2026 15:02
In diff mode, complexity and size gates now compare each finding to its
value at the merge-base and drop findings that did not get worse on this
branch. A PR that touches a 4000-line legacy file or a complexity=91
function without making it bigger or more complex no longer fails CI on
metrics it inherited.

How it works:
- diff.Result.MergeBase is populated by Parse (empty for refactoring
  mode via CollectPaths).
- diff.ShowAtRef fetches a file's contents at the merge-base, returning
  (nil, nil) for paths absent from the tree and bubbling up errors for
  bad refs / non-repos so a misconfigured CI doesn't silently weaken
  the gate.
- internal/baseline writes the base content to a temp file (extension
  preserved) so existing language analyzers run unchanged with a
  full-coverage synthetic FileChange.
- complexity.Analyze and sizes.Analyze drop findings where base ≥ head
  (function: complexity / line count; file: line count). New
  files/functions absent from base are kept so absolute thresholds
  still gate genuinely new debt.

Refactoring mode (--paths) leaves MergeBase empty and falls through to
absolute thresholds — no behavior change there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mplexity

- Findings now render '(+N vs base)' when the function/file existed at
  the merge-base, so PR authors can tell "added new hot code" from "made
  existing hot code hotter". Brand-new code stays with the bare message.
- New --complexity-delta-tolerance flag (default 3) — head must exceed
  base by *more than* tolerance to count as worsened. Forgives the
  +1/+2/+3 noise that's typical when threading a parameter through a
  legacy hot function. Size deltas remain strict (tolerance=0); cheap to
  add later if needed.
- The orchestrators thread per-file/per-function delta maps to the
  formatters; helper extraction (recordDelta, recordFuncDelta) keeps
  applyDeltaFilter / filterFuncSizes under the cognitive-complexity
  threshold.
- New tests cover formatComplexityMsg / formatFuncSizeMsg /
  formatFileSizeMsg (lock subtraction direction so the head-base
  operand order doesn't silently flip), and the previously-untested
  internal/baseline package.

On the bor blockstm_redesign branch (vs origin/develop), tolerance=3
drops three +1..+3 regressions (ProcessBlock, subfetcher.loop,
stateTransition.execute) while keeping legitimate +5/+4 worsenings
(IntermediateRoot, getStateObject) and the brand-new block-stm v2
functions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cffls
Copy link
Copy Markdown
Collaborator Author

cffls commented May 5, 2026

Pushed two follow-ups (b31e892):

Delta in messages — findings now render (+N vs base) when the function/file existed at the merge-base. Brand-new code keeps the bare form so the report distinguishes "added new hot code" from "made existing hot code hotter".

--complexity-delta-tolerance flag (default 3) — head must exceed base by more than tolerance for a regression to fail. Forgives the +1/+2/+3 noise that's typical when threading a parameter through a legacy hot function. Size deltas stay strict (tolerance=0); easy to add --function-size-delta-tolerance / --file-size-delta-tolerance later if needed.

Tested on bor's blockstm_redesign branch (vs origin/develop): tolerance=3 drops three small regressions (ProcessBlock +1, subfetcher.loop +2, stateTransition.execute +3) while keeping legitimate worsenings (IntermediateRoot +5, getStateObject +4) and brand-new block-stm v2 functions.

Self-check passes (T1 mutation 100%, T2 75%, overall 80%).

…d findings

Previous commit reassigned the analyzer's full result list to the
post-delta-filter subset, which corrupted the section's mean / median /
max / total-functions stats. Worse: when the filter dropped *every*
finding (e.g. the only complex function in the diff was untouched), the
section reported "No changed functions to analyze" — which is the
empty-input case, not the all-clear case. Surfaced via TS/Rust smoke
tests where a single-file diff has exactly one analyzable function.

Both Analyze functions now thread two lists: `analyzed` (everything the
language calculator returned, used for stats) and `candidates` (the
post-filter subset eligible to become findings). buildSection stays a
single function, taking both.

Also adds multi-language regression tests:
  - internal/complexity/multilang_delta_test.go covers TypeScript and
    Rust delta gating end-to-end.
  - internal/sizes/multilang_delta_test.go does the same for sizes.

Both confirm that the orchestrator drives whichever
ComplexityCalculator / FunctionExtractor each language registers, and
that the temp-file-with-preserved-extension trick works for the
tree-sitter analyzers as well as Go's go/parser.

On the bor blockstm_redesign branch (vs origin/develop), the stats now
read "285 functions analyzed | Max: 91 | 7 over threshold" — the full
diff is described, with 7 violations vs the original 17.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cffls
Copy link
Copy Markdown
Collaborator Author

cffls commented May 5, 2026

Good catch — the orchestrator was language-agnostic in principle, but I hadn't actually exercised the TS/Rust paths and ran straight into a related bug.

Pushed cbc2481:

Bug fix: previous commits reassigned the analyzer's full result list to the post-filter subset, which corrupted the stats line. When the filter dropped every finding (e.g. a single-file TS diff where the only complex function was untouched), the section reported "No changed functions to analyze" — the empty-input message, not "all clear". Both Analyze functions now thread two lists: analyzed (full, for stats) and candidates (filtered, for findings).

Multi-language regression tests: internal/complexity/multilang_delta_test.go and internal/sizes/multilang_delta_test.go exercise TypeScript and Rust end-to-end. Both confirm the orchestrator drives whichever calculator/extractor each language registers, and that the temp-file-with-preserved-extension trick works for the tree-sitter front-ends as well as Go's go/parser.

On the bor branch the stats line now reads "285 functions analyzed | Max: 91 | 7 over threshold" — full diff described, 7 violations (vs the original 17). Self-check still passes (T1 100%, T2 75%, overall 80%).

Function size: --function-size-delta-tolerance flag, default 5 lines.
Threading a parameter through a 60-line function or adding a couple of
defensive guards no longer fails on a function that was already over
threshold.

File size: --file-size-delta-tolerance-pct (default 5) +
--file-size-delta-tolerance-floor (default 10). The drop rule is
"growth ≤ max(floor, base × pct%)". Percentage scales with file size:
+99 lines on a 4382-line file (2.3%) is forgiven, but +298 on a
2031-line file (14.7%) still flags. The floor stops "+8 to a 600-line
file" (which is 1.3%) from over-firing on small files where %-of-base
collapses below the noise level.

Tolerances live behind a new sizes.DeltaTolerances struct so adding
language-specific or per-rule overrides later is local. Zero value =
strict (preserves previous behavior for any caller that doesn't
populate it).

On the bor blockstm_redesign branch, this drops 7 further findings:
  files dropped: blockchain.go +99 (2.3%), instructions.go +16 (1.5%),
                 state_transition.go +11 (1.4%), trie_prefetcher.go +18 (3.4%)
  funcs dropped: stateTransition.execute +1, NewEVM +4, updateTrie (no growth),
                 preloadAddressAsync +4
  kept legitimately: statedb.go +298 (14.7%), pps.go +705 (145%),
                     trie.go +59 (6.8%), reader.go +76 (10.8%), evm.go +57 (7.7%),
                     plus brand-new files / functions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cffls
Copy link
Copy Markdown
Collaborator Author

cffls commented May 5, 2026

Pushed 04617aa: size-delta tolerances.

New defaults:

Rule Tolerance Flag
Per-function size 5 lines --function-size-delta-tolerance
Per-file size max(10 lines, 5% of base) --file-size-delta-tolerance-pct, --file-size-delta-tolerance-floor

The percentage rule scales with file size — +99 lines on a 4382-line file (2.3%) is forgiven, but +298 on a 2031-line file (14.7%) still fails. The floor stops tiny additions to small files from collapsing below the noise level when % gives a sub-10-line threshold.

On the bor branch this drops 7 further findings on top of the prior delta gate:

  • Files dropped: blockchain.go +99 (2.3%), instructions.go +16 (1.5%), state_transition.go +11 (1.4%), trie_prefetcher.go +18 (3.4%).
  • Functions dropped: stateTransition.execute +1, NewEVM +4, updateTrie (no growth), preloadAddressAsync +4.
  • Kept legitimately: statedb.go +298 (14.7%), pps.go +705 (145%), trie.go +59 (6.8%), reader.go +76 (10.8%), evm.go +57 (7.7%), plus brand-new files / functions.

The tolerances live behind a new sizes.DeltaTolerances struct so per-rule overrides stay local. Zero value preserves strict behavior for any caller that doesn't populate it.

Self-check passes (T1 mutation 94.6%, T2 77.8%, overall 78.9%).

Copy link
Copy Markdown
Member

@pratikspatil024 pratikspatil024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also update the README with the new flags?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants