test: add summary-cache pytest-benchmark suite with CI regression gate by clean6378-max-it · Pull Request #120 · cppalliance/cppa-cursor-browser

clean6378-max-it · 2026-06-24T19:20:01Z

Closes #115

Summary

Adds a pytest-benchmark suite for services/summary_cache.py with synthetic fixtures and a CI regression gate. No production code changes.

New tests/benchmarks/test_summary_cache_bench.py measures cache hit, cache miss, fingerprint cost (10/50/200 workspace entries), and round-trip write+read latency
New tests/benchmarks/conftest.py builds synthetic workspace trees and isolates cache files to temp dirs (no real Cursor storage)
benchmarks/baselines.json records initial gated means (1.5× slack from local reference run; refresh from ubuntu-latest CI artifact after first green job)
scripts/check_benchmark_regression.py fails CI when any benchmark mean exceeds baseline by >20%
pyproject.toml: add pytest-benchmark dev dep; default --benchmark-skip so normal pytest stays fast
.github/workflows/tests.yml: new benchmarks job on ubuntu-latest

Sprint item #7 (June Week 4, Wednesday). Sequences before Thursday #8 (unified perf benchmark suite).

Test plan

pytest tests/benchmarks/test_summary_cache_bench.py --benchmark-only -o addopts=
python scripts/check_benchmark_regression.py benchmark-results.json benchmarks/baselines.json
pytest tests/benchmarks/test_summary_cache_bench.py -o addopts= (benchmarks skipped via --benchmark-skip)
python -m unittest discover tests -q
Confirm benchmarks CI job is green on ubuntu-latest

Summary by CodeRabbit

New Features
- Added performance benchmarks for summary-cache hot paths (cache hit/miss, fingerprinting, and cache round-trip).
Bug Fixes
- Prevented benchmark/integration runs from inheriting unrelated pytest options.
Chores
- Added CI benchmarking job that generates results, applies a regression gate against stored baselines, and uploads the results artifact; refreshed benchmarks and expanded ignored artifacts.
- Updated pytest configuration and development tooling to include benchmark support.
Tests
- Added a regression-gate test suite and benchmark fixtures for synthetic workspace data.

coderabbitai · 2026-06-24T19:20:19Z

Warning

Review limit reached

@clean6378-max-it, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 33 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 809b3faf-d20d-49e4-9994-baa39a6702dd

📥 Commits

Reviewing files that changed from the base of the PR and between 83bb248 and b1c2f93.

📒 Files selected for processing (2)

scripts/check_benchmark_regression.py
tests/test_check_benchmark_regression.py

📝 Walkthrough

Walkthrough

Adds a summary-cache pytest-benchmark suite, records baseline timings, adds a regression-checking script, and wires the benchmarks into CI with artifact upload and ignore rules.

Changes

Summary Cache Benchmark Validation

Layer / File(s)	Summary
Benchmark config and fixtures `pyproject.toml`, `tests/benchmarks/conftest.py`	Adds `pytest-benchmark` to the dev dependency set, configures pytest benchmark defaults and marker registration, and adds synthetic workspace fixtures for summary-cache benchmarks.
Benchmark cases `tests/benchmarks/test_summary_cache_bench.py`	Adds benchmark cases for cache hits, cache misses, workspace fingerprinting at 10/50/200 entries, and a set-then-get round trip.
Regression script and baselines `benchmarks/baselines.json`, `scripts/check_benchmark_regression.py`	Refreshes stored benchmark timings and adds a CLI that loads benchmark results, compares means to stored baselines, and reports regressions or malformed inputs.
CI benchmark job `.github/workflows/tests.yml`, `.gitignore`	Adds a workflow job that runs the benchmarks, checks regression output, uploads `benchmark-results.json`, and ignores generated benchmark artifacts.

Sequence Diagram(s)

sequenceDiagram
  participant BenchmarksJob as benchmarks job
  participant Pytest as pytest
  participant RegressionScript as scripts/check_benchmark_regression.py
  participant ArtifactUpload as actions/upload-artifact

  BenchmarksJob->>Pytest: run tests/benchmarks/ with --benchmark-only
  Pytest-->>BenchmarksJob: benchmark-results.json
  BenchmarksJob->>RegressionScript: compare benchmark-results.json to benchmarks/baselines.json
  RegressionScript-->>BenchmarksJob: exit code 0, 1, or 2
  BenchmarksJob->>ArtifactUpload: upload benchmark-results.json with if: always()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

wpak-ai
bradjin8

Poem

(_/)
I hopped through cache with careful thump,
Measured misses, hits, and every jump.
Baselines twinkled, CI said “hooray,”
And regression gates held firm today.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	requirements-lock.txt includes an unrelated click pin update that is outside the benchmark-validation scope.	Remove the click version bump unless it is required for the benchmark changes and document that dependency update separately.
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the new summary-cache pytest-benchmark suite and CI regression gate.
Linked Issues check	✅ Passed	The PR covers the required synthetic benchmarks, baselines, and CI regression gate for summary-cache without changing production code.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/summary-cache-benchmarks

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

scripts/check_benchmark_regression.py (1)
95-99: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

A missing benchmark result silently passes the gate.

When a baseline benchmark has no current result, the script only prints a WARN and continues. If a gated benchmark is renamed, removed, or fails to run, the regression gate will report success and the performance coverage silently disappears. Consider treating a missing gated benchmark as a failure (or a separate exit code) so coverage gaps are surfaced.
♻️ Possible approach
     failures: list[str] = []
+    missing: list[str] = []
     for name, base in baseline_means.items():
         cur = flat.get(name)
         if cur is None:
-            print(f"WARN: no current result for baseline {name!r}; skipping")
+            print(f"MISSING: no current result for baseline {name!r}")
+            missing.append(name)
             continue
Then fail (or warn-as-error) when missing is non-empty, per your policy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/check_benchmark_regression.py` around lines 95 - 99, The benchmark
gating in check_benchmark_regression.py currently skips missing current results
in the baseline loop, which can let renamed, removed, or failing benchmarks pass
unnoticed. Update the logic around the baseline_means/flat comparison so missing
gated benchmarks are collected and treated as a failure (or a distinct non-zero
exit condition) instead of only printing a warning. Make the behavior explicit
in the final exit path of the script so coverage gaps are surfaced alongside the
existing regression checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/baselines.json`:
- Around line 1-15: The benchmark baselines in the summary-cache group are still
using Windows-derived values with 1.5x slack, but the regression gate runs on
ubuntu-latest. Refresh the values from a green ubuntu-latest benchmark artifact
and keep the _note consistent with the source platform; if platform-matched
baselines are not yet available, make the gate non-blocking until they are.
Focus on the baselines.json entries under groups.summary-cache so the
check_benchmark_regression.py comparisons are against the same OS.

In `@pyproject.toml`:
- Around line 39-44: The pytest configuration currently applies --benchmark-skip
globally in [tool.pytest.ini_options], which breaks the regular integration
pytest run because pytest-benchmark is not installed there. Update the test
setup so the benchmark plugin is available wherever this addopts setting is
used, or move the --benchmark-skip option to only the benchmark-specific job;
check the pytest configuration and the workflows that invoke python -m pytest,
especially the benchmark-related markers and options.

---

Nitpick comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 95-99: The benchmark gating in check_benchmark_regression.py
currently skips missing current results in the baseline loop, which can let
renamed, removed, or failing benchmarks pass unnoticed. Update the logic around
the baseline_means/flat comparison so missing gated benchmarks are collected and
treated as a failure (or a distinct non-zero exit condition) instead of only
printing a warning. Make the behavior explicit in the final exit path of the
script so coverage gaps are surfaced alongside the existing regression checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1cc8514e-6127-42e8-95a6-134382f2ed25

📥 Commits

Reviewing files that changed from the base of the PR and between 793b6f2 and c126227.

📒 Files selected for processing (7)

.github/workflows/tests.yml
.gitignore
benchmarks/baselines.json
pyproject.toml
scripts/check_benchmark_regression.py
tests/benchmarks/conftest.py
tests/benchmarks/test_summary_cache_bench.py

clean6378-max-it · 2026-06-24T19:44:57Z

@CodeRabbit, review this PR in more depth

coderabbitai · 2026-06-24T19:47:28Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/tests.yml ==="
cat -n .github/workflows/tests.yml

echo ""
echo "=== pyproject.toml ==="
cat -n pyproject.toml

echo ""
echo "=== scripts/check_benchmark_regression.py ==="
cat -n scripts/check_benchmark_regression.py

echo ""
echo "=== tests/benchmarks/conftest.py ==="
cat -n tests/benchmarks/conftest.py

echo ""
echo "=== tests/benchmarks/test_summary_cache_bench.py ==="
cat -n tests/benchmarks/test_summary_cache_bench.py

echo ""
echo "=== benchmarks/baselines.json ==="
cat -n benchmarks/baselines.json

echo ""
echo "=== .gitignore (last 20 lines) ==="
tail -20 .gitignore

echo ""
echo "=== requirements-lock.txt ==="
cat -n requirements-lock.txt

echo ""
echo "=== services/summary_cache.py ==="
cat -n services/summary_cache.py

echo ""
echo "=== Directory tree of tests/benchmarks/ ==="
tree tests/benchmarks/ 2>/dev/null || find tests/benchmarks/ -type f

echo ""
echo "=== Directory tree of scripts/ ==="
tree scripts/ 2>/dev/null || find scripts/ -type f

echo ""
echo "=== benchmarks/ directory ==="
tree benchmarks/ 2>/dev/null || find benchmarks/ -type f

Length of output: 44058

@clean6378-max-it Here is a deep review of PR #120.

Deep Review: PR `#120` — pytest-benchmark suite for `summary_cache.py`

The PR is well-structured overall — no production code is touched, the fixture isolation via monkeypatch is correct, and the regression gate script has solid error-handling. However, there are several issues ranging from critical to minor that should be addressed before merge.

🔴 Critical

1. Baseline name format may never match pytest-benchmark output (`benchmarks/baselines.json` + `scripts/check_benchmark_regression.py`)

The regression gate in check_regression() (Line 96) looks up current results by name from the baselines. The baselines use short names like test_summary_cache_hit and test_fingerprint_workspace_entries[10].

pytest-benchmark emits the name field as the short test node name (e.g. test_summary_cache_hit) in some configurations, but in others it emits the full node ID including the module path (tests/benchmarks/test_summary_cache_bench.py::test_summary_cache_hit). If the format doesn't match, every baseline entry will show as MISSING and the gate will always fail (exit code 1) — or always pass if the lookup silently misses.

You should verify the exact format by inspecting an actual benchmark-results.json produced by the CI job and cross-referencing the name field against the keys in baselines.json. Consider adding a normalization step in load_results() that strips the module prefix:

# In load_results(), after extracting `name`:
# Normalize "tests/benchmarks/test_x.py::test_foo[param]" → "test_foo[param]"
name = str(name).rsplit("::", 1)[-1]

2. Baselines appear manually fabricated — future-dated and unverified (`benchmarks/baselines.json`)

"updated": "2026-06-25T12:00:00Z",

The timestamp is in the future and the _note claims these values come from "ubuntu-latest CI benchmark-results.json", but there is no CI artifact proving that. Values like 6.3e-05 for both cache hit and cache miss (identical to 6 decimal places) suggest they were hand-written, not measured. If these baselines are too tight for a slower runner, the gate will spuriously fail; if too loose, the gate provides no real protection.

Recommendation: Do not commit hand-crafted baselines. Run the benchmarks in CI first, capture the artifact, update the file from the actual numbers, and commit that as a follow-up.

3. `benchmarks` CI job has no `needs:` dependency on `unittest` (`.github/workflows/tests.yml` Line 219)

The benchmark job runs in parallel with unit tests. If unit tests fail due to a production code regression, benchmarks can still be reported as "passing" — giving a misleading green signal.

# Add this to the benchmarks job:
needs: [unittest]

🟡 Medium

4. `addopts = "--benchmark-disable"` is missing from `pyproject.toml` (Line 39–43)

The PR objectives state: "adds default benchmark skipping for normal test runs." But the [tool.pytest.ini_options] section has no addopts at all:

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "benchmark: performance benchmarks (pytest-benchmark)",
]

Since testpaths includes tests/, tests/benchmarks/ is collected on every plain pytest invocation. Without --benchmark-disable, pytest-benchmark will execute benchmark functions normally (one call each, with timing overhead). This slows down the developer loop and contradicts the stated goal.

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--benchmark-disable"
markers = [
    "benchmark: performance benchmarks (pytest-benchmark)",
]

Note: the CI benchmark job already overrides this with -o addopts=, so adding it won't break CI.

5. `sys.path` manipulation in `conftest.py` is fragile (Lines 12–14)

REPO_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
if REPO_ROOT not in sys.path:
    sys.path.insert(0, REPO_ROOT)

This is three levels of dirname that breaks silently if the file moves. The idiomatic solution is to install the package in editable mode in CI (pip install -e .) and remove the manual path insertion entirely. Alternatively, the [tool.pytest.ini_options] section can set pythonpath = ["."] (pytest ≥ 7.0):

[tool.pytest.ini_options]
pythonpath = ["."]

6. `pytest-benchmark` not pinned in `requirements-lock.txt` (`.github/workflows/tests.yml` Line 236)

python -m pip install 'pytest>=8,<9' 'pytest-benchmark>=4,<5'

All other runtime deps are resolved from requirements-lock.txt for determinism, but pytest-benchmark is installed ad-hoc. A transitive dep (e.g. py-cpuinfo, attrs) could change versions silently. pytest-benchmark should be pinned in the lock file, or at minimum pinned to a specific patch version in the workflow.

7. `stale_fingerprint` relies on an undocumented assumption (`tests/benchmarks/conftest.py` Line 88–89)

`@pytest.fixture`
def stale_fingerprint(workspace_fingerprint: dict[str, Any]) -> dict[str, Any]:
    return {**workspace_fingerprint, "global_db_mtime_ns": 9_999_999_999}

This works because global_db_path=None causes _file_mtime_ns(None) to return None, making the real fingerprint's global_db_mtime_ns equal to None. Setting it to 9_999_999_999 creates a mismatch. But this is completely implicit — if fingerprint_workspace_storage ever changes its handling of None paths, this fixture silently stops simulating a miss.

A more robust approach is to use a sentinel that always differs:

`@pytest.fixture`
def stale_fingerprint(workspace_fingerprint: dict[str, Any]) -> dict[str, Any]:
    """Return a fingerprint guaranteed to differ from the stored one."""
    return {**workspace_fingerprint, "rules_digest": "deadbeefdeadbeef"}

This works regardless of what global_db_mtime_ns is.

🔵 Minor

8. Missing `tests/benchmarks/init.py`

tests/benchmarks/
├── conftest.py
└── test_summary_cache_bench.py

The directory has no __init__.py. While modern pytest handles this correctly, the rest of tests/ likely uses __init__.py files (consistent with python -m unittest discover). Adding one avoids potential namespace collisions if identically-named modules exist elsewhere.

9. `load_baseline_means` silently skips non-dict group values (`scripts/check_benchmark_regression.py` Line 69)

if not isinstance(value, dict):
    continue  # silently skips

A typo in baselines.json (e.g. a string value for a group) is silently ignored, which could cause a group's benchmarks to never be gated. Consider issuing a warning or raising BenchmarkDataError.

10. Redundant `str()` cast in `load_results` (`scripts/check_benchmark_regression.py` Line 43)

name = str(name)  # name was already set from entry["name"] above

name is set on Line 37 directly from entry["name"], which would have been accepted as str already. The redundant cast can be removed.

11. Missing blank line before first test function (`tests/benchmarks/test_summary_cache_bench.py` Line 16)

PEP 8 requires two blank lines between the import block and the first top-level definition:

from services.summary_cache import (
    fingerprint_workspace_storage,
    get_cached_projects,
    set_cached_projects,
)
                                         # ← two blank lines needed here
`@pytest.mark.benchmark`(group="summary-cache")
def test_summary_cache_hit(...):

12. No unit tests for `check_benchmark_regression.py`

The regression gate script has good internal logic (load_results, load_baseline_means, check_regression) but zero test coverage. Given that this script is the only thing blocking regressions in CI, a small test file (e.g. tests/test_check_benchmark_regression.py) covering the happy path, missing-benchmark, and threshold-exceeded cases would greatly increase confidence.

Summary table

Severity	Issue	File
🔴 Critical	Baseline name format may mismatch pytest-benchmark output	`baselines.json`, `check_benchmark_regression.py`
🔴 Critical	Baselines are future-dated and appear manually fabricated	`baselines.json`
🔴 Critical	`benchmarks` job has no `needs: [unittest]`	`tests.yml`
🟡 Medium	Missing `addopts = "--benchmark-disable"`	`pyproject.toml`
🟡 Medium	Fragile `sys.path` manipulation	`conftest.py`
🟡 Medium	`pytest-benchmark` not pinned in lock file	`tests.yml`, `requirements-lock.txt`
🟡 Medium	`stale_fingerprint` relies on implicit `None` assumption	`conftest.py`
🔵 Minor	Missing `__init__.py` in `tests/benchmarks/`	—
🔵 Minor	Silent skip of non-dict group values	`check_benchmark_regression.py`
🔵 Minor	Redundant `str()` cast	`check_benchmark_regression.py`
🔵 Minor	Missing blank line after imports	`test_summary_cache_bench.py`
🔵 Minor	No unit tests for the regression gate script	—

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 17-19: The normalize_benchmark_name helper currently uses
rsplit("::", 1), which can incorrectly strip parts of parametrized benchmark
names when parameter values include "::". Update normalize_benchmark_name to
split on the first "::" only so pytest node prefixes are removed without
truncating names like test_x[param::v], and keep the behavior consistent for
both short and full benchmark names.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c745c7f7-791c-4802-9b4a-d7dc19639498

📥 Commits

Reviewing files that changed from the base of the PR and between e4d79a6 and 83bb248.

📒 Files selected for processing (8)

.github/workflows/tests.yml
benchmarks/baselines.json
pyproject.toml
scripts/check_benchmark_regression.py
tests/benchmarks/__init__.py
tests/benchmarks/conftest.py
tests/benchmarks/test_summary_cache_bench.py
tests/test_check_benchmark_regression.py

✅ Files skipped from review due to trivial changes (1)

benchmarks/baselines.json

🚧 Files skipped from review as they are similar to previous changes (2)

tests/benchmarks/test_summary_cache_bench.py
.github/workflows/tests.yml

bradjin8 · 2026-06-24T21:07:19Z

        # `pytest tests/` — to avoid re-collecting unittest.TestCase classes above.
-        run: python -m pytest tests/test_api_search.py tests/test_api_workspaces.py tests/test_api_export.py tests/test_pdf_export.py tests/test_search_helpers.py -v --tb=short
+        # -o addopts= avoids inheriting benchmark-only options from pyproject.toml.
+        run: python -m pytest tests/test_api_search.py tests/test_api_workspaces.py tests/test_api_export.py tests/test_pdf_export.py tests/test_search_helpers.py -v --tb=short -o addopts=


tests/test_check_benchmark_regression.py (15 tests for the regression gate) is not run in CI. unittest discover won't collect pytest functions, and the integration pytest step lists only API/search files. Add this file to the pytest step or run pytest tests/test_check_benchmark_regression.py -o addopts= so gate logic is covered on every matrix run.

bradjin8 · 2026-06-24T21:07:38Z

 # colorama via click) are omitted — pip still installs them on Windows when needed.
 blinker==1.9.0            # via flask
-click==8.4.1              # via flask
+click==8.4.2              # via flask


Unrelated click==8.4.1 → 8.4.2 bump is out of scope for a benchmarks-only PR. Revert unless there's a documented dependency reason; otherwise it adds review nois

bradjin8 · 2026-06-24T21:07:54Z


+[tool.pytest.ini_options]
+pythonpath = ["."]
+addopts = "--benchmark-disable"


PR description says --benchmark-skip but config uses --benchmark-disable. With disable, benchmark tests still run once on a bare pytest (functional smoke, ~0.4s). Either align the docs or switch to --benchmark-skip if the goal is zero benchmark execution during normal pytest.

bradjin8 · 2026-06-24T21:08:08Z

+[tool.pytest.ini_options]
+pythonpath = ["."]
+addopts = "--benchmark-disable"
+testpaths = ["tests"]


New [tool.pytest.ini_options] with addopts = "--benchmark-disable" requires pytest-benchmark installed for any default pytest invocation. CI handles this; document in PR/issue that devs need pip install -e ".[dev]" (or pytest-benchmark) before running pytest without -o addopts=.

bradjin8 · 2026-06-24T21:08:29Z

+@pytest.mark.benchmark(group="summary-cache")
+def test_summary_cache_miss(
+    benchmark,
+    summary_cache_dir: Path,
+    workspace_fingerprint: dict[str, Any],
+    stale_fingerprint: dict[str, Any],
+    sample_projects: list[dict[str, Any]],
+) -> None:
+    set_cached_projects(workspace_fingerprint, sample_projects, [])
+    benchmark(get_cached_projects, stale_fingerprint)


test_summary_cache_hit and test_summary_cache_miss measure nearly identical paths (~63µs on CI); both baselines are 6.3e-05. Consider merging into one "cache lookup" benchmark with hit/miss as subcases, or document that miss means fingerprint-mismatch only (not rebuild cost).

bradjin8 · 2026-06-24T21:08:46Z

+@pytest.fixture
+def summary_cache_dir(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
+    """Redirect summary-cache files to an isolated temp directory.
+
+    Patches ``CACHE_DIR`` (also used by tab-summary paths via ``_tab_summaries_path``)
+    plus the projects/composer-map file constants used by current benchmarks.
+    """
+    cache_dir = tmp_path / "cache"
+    cache_dir.mkdir()
+    monkeypatch.setattr(summary_cache, "CACHE_DIR", cache_dir)
+    monkeypatch.setattr(summary_cache, "PROJECTS_CACHE_FILE", cache_dir / "projects.json")
+    monkeypatch.setattr(
+        summary_cache,
+        "COMPOSER_MAP_CACHE_FILE",
+        cache_dir / "composer-id-to-ws.json",
+    )


summary_cache_dir patches PROJECTS_CACHE_FILE and COMPOSER_MAP_CACHE_FILE but not tab-summary paths (_tab_summaries_path uses CACHE_DIR prefix only). Fine for current scope; add a comment that tab-summary benchmarks belong in issue #8's unified suite.

bradjin8 · 2026-06-24T21:08:58Z

+{
+  "_note": "Gated means from ubuntu-latest CI benchmark-results.json (PR #120, run 28123677675). Refresh: pytest tests/benchmarks/ --benchmark-only --benchmark-json=benchmark-results.json -o addopts=",


_note refresh command is truncated (-o addopts= with no value). Complete the example: -o addopts= --benchmark-json=benchmark-results.json.

bradjin8 · 2026-06-24T21:09:13Z

+      - name: Run summary-cache benchmarks
+        run: >
+          python -m pytest tests/benchmarks/
+          --benchmark-only
+          --benchmark-json=benchmark-results.json
+          --benchmark-columns=min,max,mean,stddev,rounds
+          -o addopts=


Consider pinning --benchmark-min-rounds (e.g. 5) to reduce CI timing variance near the 20% threshold, especially for sub-millisecond hit/miss benchmarks.

bradjin8 · 2026-06-24T21:09:48Z

+        if base == 0:
+            print(f"WARN: baseline for {name!r} is zero; skipping ratio check")
+            continue
+        ratio = cur / base
+        tag = "FAIL" if ratio > threshold else "ok"
+        print(f"[{tag}] {name}: {cur:.6f}s vs {base:.6f}s ({ratio:.2f}x)")
+        if ratio > threshold:
+            failures.append(name)
+


New benchmarks without baselines only WARN; gated missing benchmarks correctly FAIL (fixed in 83bb248). Consider also failing when all current results are faster than baseline by >50% (stale baseline drift), or document a baseline-refresh workflow in benchmarks/baselines.json.

test: add summary-cache pytest-benchmark suite with CI regression gate

c126227

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread benchmarks/baselines.json

Comment thread pyproject.toml

fix(ci): unblock unittest matrix and seed ubuntu benchmark baselines

e4d79a6

fix(test): address PR #120 benchmark review (gate, CI, baselines)

83bb248

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread scripts/check_benchmark_regression.py Outdated

fix(test): normalize benchmark names without truncating param values

b1c2f93

bradjin8 requested changes Jun 24, 2026

View reviewed changes

		{
		"_note": "Gated means from ubuntu-latest CI benchmark-results.json (PR #120, run 28123677675). Refresh: pytest tests/benchmarks/ --benchmark-only --benchmark-json=benchmark-results.json -o addopts=",

Uh oh!

Conversation

clean6378-max-it commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

clean6378-max-it commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Deep Review: PR #120 — pytest-benchmark suite for summary_cache.py

🔴 Critical

1. Baseline name format may never match pytest-benchmark output (benchmarks/baselines.json + scripts/check_benchmark_regression.py)

2. Baselines appear manually fabricated — future-dated and unverified (benchmarks/baselines.json)

3. benchmarks CI job has no needs: dependency on unittest (.github/workflows/tests.yml Line 219)

🟡 Medium

4. addopts = "--benchmark-disable" is missing from pyproject.toml (Line 39–43)

5. sys.path manipulation in conftest.py is fragile (Lines 12–14)

6. pytest-benchmark not pinned in requirements-lock.txt (.github/workflows/tests.yml Line 236)

7. stale_fingerprint relies on an undocumented assumption (tests/benchmarks/conftest.py Line 88–89)

🔵 Minor

8. Missing tests/benchmarks/__init__.py

9. load_baseline_means silently skips non-dict group values (scripts/check_benchmark_regression.py Line 69)

10. Redundant str() cast in load_results (scripts/check_benchmark_regression.py Line 43)

11. Missing blank line before first test function (tests/benchmarks/test_summary_cache_bench.py Line 16)

12. No unit tests for check_benchmark_regression.py

Summary table

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clean6378-max-it commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Deep Review: PR `#120` — pytest-benchmark suite for `summary_cache.py`

1. Baseline name format may never match pytest-benchmark output (`benchmarks/baselines.json` + `scripts/check_benchmark_regression.py`)

2. Baselines appear manually fabricated — future-dated and unverified (`benchmarks/baselines.json`)

3. `benchmarks` CI job has no `needs:` dependency on `unittest` (`.github/workflows/tests.yml` Line 219)

4. `addopts = "--benchmark-disable"` is missing from `pyproject.toml` (Line 39–43)

5. `sys.path` manipulation in `conftest.py` is fragile (Lines 12–14)

6. `pytest-benchmark` not pinned in `requirements-lock.txt` (`.github/workflows/tests.yml` Line 236)

7. `stale_fingerprint` relies on an undocumented assumption (`tests/benchmarks/conftest.py` Line 88–89)

8. Missing `tests/benchmarks/init.py`

9. `load_baseline_means` silently skips non-dict group values (`scripts/check_benchmark_regression.py` Line 69)

10. Redundant `str()` cast in `load_results` (`scripts/check_benchmark_regression.py` Line 43)

11. Missing blank line before first test function (`tests/benchmarks/test_summary_cache_bench.py` Line 16)

12. No unit tests for `check_benchmark_regression.py`