feat: delta gating — drop pre-existing complexity/size violations in diff mode#9
feat: delta gating — drop pre-existing complexity/size violations in diff mode#9
Conversation
In diff mode, complexity and size gates now compare each finding to its value at the merge-base and drop findings that did not get worse on this branch. A PR that touches a 4000-line legacy file or a complexity=91 function without making it bigger or more complex no longer fails CI on metrics it inherited. How it works: - diff.Result.MergeBase is populated by Parse (empty for refactoring mode via CollectPaths). - diff.ShowAtRef fetches a file's contents at the merge-base, returning (nil, nil) for paths absent from the tree and bubbling up errors for bad refs / non-repos so a misconfigured CI doesn't silently weaken the gate. - internal/baseline writes the base content to a temp file (extension preserved) so existing language analyzers run unchanged with a full-coverage synthetic FileChange. - complexity.Analyze and sizes.Analyze drop findings where base ≥ head (function: complexity / line count; file: line count). New files/functions absent from base are kept so absolute thresholds still gate genuinely new debt. Refactoring mode (--paths) leaves MergeBase empty and falls through to absolute thresholds — no behavior change there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mplexity - Findings now render '(+N vs base)' when the function/file existed at the merge-base, so PR authors can tell "added new hot code" from "made existing hot code hotter". Brand-new code stays with the bare message. - New --complexity-delta-tolerance flag (default 3) — head must exceed base by *more than* tolerance to count as worsened. Forgives the +1/+2/+3 noise that's typical when threading a parameter through a legacy hot function. Size deltas remain strict (tolerance=0); cheap to add later if needed. - The orchestrators thread per-file/per-function delta maps to the formatters; helper extraction (recordDelta, recordFuncDelta) keeps applyDeltaFilter / filterFuncSizes under the cognitive-complexity threshold. - New tests cover formatComplexityMsg / formatFuncSizeMsg / formatFileSizeMsg (lock subtraction direction so the head-base operand order doesn't silently flip), and the previously-untested internal/baseline package. On the bor blockstm_redesign branch (vs origin/develop), tolerance=3 drops three +1..+3 regressions (ProcessBlock, subfetcher.loop, stateTransition.execute) while keeping legitimate +5/+4 worsenings (IntermediateRoot, getStateObject) and the brand-new block-stm v2 functions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed two follow-ups (b31e892): Delta in messages — findings now render
Tested on bor's blockstm_redesign branch (vs Self-check passes (T1 mutation 100%, T2 75%, overall 80%). |
…d findings
Previous commit reassigned the analyzer's full result list to the
post-delta-filter subset, which corrupted the section's mean / median /
max / total-functions stats. Worse: when the filter dropped *every*
finding (e.g. the only complex function in the diff was untouched), the
section reported "No changed functions to analyze" — which is the
empty-input case, not the all-clear case. Surfaced via TS/Rust smoke
tests where a single-file diff has exactly one analyzable function.
Both Analyze functions now thread two lists: `analyzed` (everything the
language calculator returned, used for stats) and `candidates` (the
post-filter subset eligible to become findings). buildSection stays a
single function, taking both.
Also adds multi-language regression tests:
- internal/complexity/multilang_delta_test.go covers TypeScript and
Rust delta gating end-to-end.
- internal/sizes/multilang_delta_test.go does the same for sizes.
Both confirm that the orchestrator drives whichever
ComplexityCalculator / FunctionExtractor each language registers, and
that the temp-file-with-preserved-extension trick works for the
tree-sitter analyzers as well as Go's go/parser.
On the bor blockstm_redesign branch (vs origin/develop), the stats now
read "285 functions analyzed | Max: 91 | 7 over threshold" — the full
diff is described, with 7 violations vs the original 17.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Good catch — the orchestrator was language-agnostic in principle, but I hadn't actually exercised the TS/Rust paths and ran straight into a related bug. Pushed cbc2481: Bug fix: previous commits reassigned the analyzer's full result list to the post-filter subset, which corrupted the stats line. When the filter dropped every finding (e.g. a single-file TS diff where the only complex function was untouched), the section reported "No changed functions to analyze" — the empty-input message, not "all clear". Both Multi-language regression tests: On the bor branch the stats line now reads "285 functions analyzed | Max: 91 | 7 over threshold" — full diff described, 7 violations (vs the original 17). Self-check still passes (T1 100%, T2 75%, overall 80%). |
Function size: --function-size-delta-tolerance flag, default 5 lines.
Threading a parameter through a 60-line function or adding a couple of
defensive guards no longer fails on a function that was already over
threshold.
File size: --file-size-delta-tolerance-pct (default 5) +
--file-size-delta-tolerance-floor (default 10). The drop rule is
"growth ≤ max(floor, base × pct%)". Percentage scales with file size:
+99 lines on a 4382-line file (2.3%) is forgiven, but +298 on a
2031-line file (14.7%) still flags. The floor stops "+8 to a 600-line
file" (which is 1.3%) from over-firing on small files where %-of-base
collapses below the noise level.
Tolerances live behind a new sizes.DeltaTolerances struct so adding
language-specific or per-rule overrides later is local. Zero value =
strict (preserves previous behavior for any caller that doesn't
populate it).
On the bor blockstm_redesign branch, this drops 7 further findings:
files dropped: blockchain.go +99 (2.3%), instructions.go +16 (1.5%),
state_transition.go +11 (1.4%), trie_prefetcher.go +18 (3.4%)
funcs dropped: stateTransition.execute +1, NewEVM +4, updateTrie (no growth),
preloadAddressAsync +4
kept legitimately: statedb.go +298 (14.7%), pps.go +705 (145%),
trie.go +59 (6.8%), reader.go +76 (10.8%), evm.go +57 (7.7%),
plus brand-new files / functions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed 04617aa: size-delta tolerances. New defaults:
The percentage rule scales with file size — +99 lines on a 4382-line file (2.3%) is forgiven, but +298 on a 2031-line file (14.7%) still fails. The floor stops tiny additions to small files from collapsing below the noise level when % gives a sub-10-line threshold. On the bor branch this drops 7 further findings on top of the prior delta gate:
The tolerances live behind a new Self-check passes (T1 mutation 94.6%, T2 77.8%, overall 78.9%). |
pratikspatil024
left a comment
There was a problem hiding this comment.
Should we also update the README with the new flags?
Summary
A PR that touches a legacy 4000-line file or a complexity=91 function should not be blamed for that file's size or that function's complexity. Today, diffguard's diff mode picks any function whose line range overlaps the diff and evaluates the absolute metric against the threshold, so legacy code becomes a permanent blocker.
This PR adds delta gating: in diff mode, complexity and size findings are dropped when the metric at the merge-base was already at or close to the head value. Brand-new files/functions absent at base are kept as-is, so absolute thresholds still gate genuinely new debt. Refactoring mode (
--paths) is unchanged.Motivated by bor's Quality metrics CI, which reports 17 complexity violations and 19 size violations on a PR that touches
core/blockchain.go,core/state/statedb.go, etc. — most of which are inherited from the touched files, not caused by the PR.Defaults
--complexity-delta-tolerance--function-size-delta-tolerance--file-size-delta-tolerance-pct,--file-size-delta-tolerance-floorThe complexity tolerance accounts for cognitive complexity's nesting penalty: threading a parameter through one extra branch on a deeply-nested function easily nudges +1..+3 with nothing materially worse happening. The file-size percentage scales with file size — +99 lines on a 4400-line file is 2% (forgiven), but +298 on a 2000-line file is 15% (still flagged); the floor stops tiny additions to small files from collapsing below the noise level.
How it works
diff.Result.MergeBaseis now populated byParse(empty forCollectPaths/ refactoring mode).diff.ShowAtRef(repo, ref, path)fetches a file's contents atref, returning(nil, nil)for paths absent from the tree and bubbling up errors for bad refs / non-repos. We disambiguate via stderr ("does not exist in") becausegit showexits 128 for several conditions; a misconfigured CI must not silently weaken the gate.internal/baselinewrites the base content to a temp file (extension preserved) so existing language analyzers run unchanged with a full-coverage syntheticFileChange. No language interface changes — works for Go, TypeScript and Rust via the same orchestrator.complexity.Analyzeandsizes.Analyzeseparateanalyzed(full set, used for stats / mean / median / max) fromcandidates(post-filter subset, used for findings) so the section header still describes the whole diff even when the gate drops every legacy violation.(+N vs base)when the function/file existed at base; brand-new code keeps the bare form so the report distinguishes "added new hot code" from "made existing hot code hotter".Behavior matrix
Complexity (threshold 10, tolerance +3)
complexity=77 (+5 vs base)complexity=20Function size (threshold 50, tolerance +5)
function=80 lines (+20 vs base)function=60 linesFile size (threshold 500, tolerance max(10, 5%))
file=650 lines (+50 vs base)file=2298 lines (+298 vs base)Mode
--base <ref>)--paths)Bor result (vs origin/develop)
Stats line now reads
285 functions analyzed | Max: 91 | 7 over threshold— the full diff is described even when delta gating drops most legacy violations.Tests
complexity_delta_test.go— pre-existing dropped, increased kept, new function kept, new file kept, tolerance boundary; table-drivenworsenedpredicate.sizes_delta_test.go— same matrix for per-function and per-file size; table-drivengrewFunc,grewFile,fileGrowthTolerance.multilang_delta_test.go(in both packages) — TypeScript + Rust end-to-end.show_at_ref_test.go— happy path, absent path returns(nil, nil), bad ref / non-repo surface errors.internal/baseline/baseline_test.go—FetchToTempextension preservation, absent / bad-ref handling.Test plan
make buildcleanmake testall green./diffguard .passes (T1 mutation 94.6%, T2 77.8%, overall 78.9%)🤖 Generated with Claude Code