Skip to content

test: add summary-cache pytest-benchmark suite with CI regression gate#120

Open
clean6378-max-it wants to merge 4 commits into
masterfrom
test/summary-cache-benchmarks
Open

test: add summary-cache pytest-benchmark suite with CI regression gate#120
clean6378-max-it wants to merge 4 commits into
masterfrom
test/summary-cache-benchmarks

Conversation

@clean6378-max-it

@clean6378-max-it clean6378-max-it commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Closes #115

Summary

Adds a pytest-benchmark suite for services/summary_cache.py with synthetic fixtures and a CI regression gate. No production code changes.

  • New tests/benchmarks/test_summary_cache_bench.py measures cache hit, cache miss, fingerprint cost (10/50/200 workspace entries), and round-trip write+read latency
  • New tests/benchmarks/conftest.py builds synthetic workspace trees and isolates cache files to temp dirs (no real Cursor storage)
  • benchmarks/baselines.json records initial gated means (1.5× slack from local reference run; refresh from ubuntu-latest CI artifact after first green job)
  • scripts/check_benchmark_regression.py fails CI when any benchmark mean exceeds baseline by >20%
  • pyproject.toml: add pytest-benchmark dev dep; default --benchmark-skip so normal pytest stays fast
  • .github/workflows/tests.yml: new benchmarks job on ubuntu-latest

Sprint item #7 (June Week 4, Wednesday). Sequences before Thursday #8 (unified perf benchmark suite).

Test plan

  • pytest tests/benchmarks/test_summary_cache_bench.py --benchmark-only -o addopts=
  • python scripts/check_benchmark_regression.py benchmark-results.json benchmarks/baselines.json
  • pytest tests/benchmarks/test_summary_cache_bench.py -o addopts= (benchmarks skipped via --benchmark-skip)
  • python -m unittest discover tests -q
  • Confirm benchmarks CI job is green on ubuntu-latest

Summary by CodeRabbit

  • New Features
    • Added performance benchmarks for summary-cache hot paths (cache hit/miss, fingerprinting, and cache round-trip).
  • Bug Fixes
    • Prevented benchmark/integration runs from inheriting unrelated pytest options.
  • Chores
    • Added CI benchmarking job that generates results, applies a regression gate against stored baselines, and uploads the results artifact; refreshed benchmarks and expanded ignored artifacts.
    • Updated pytest configuration and development tooling to include benchmark support.
  • Tests
    • Added a regression-gate test suite and benchmark fixtures for synthetic workspace data.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@clean6378-max-it, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 33 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 809b3faf-d20d-49e4-9994-baa39a6702dd

📥 Commits

Reviewing files that changed from the base of the PR and between 83bb248 and b1c2f93.

📒 Files selected for processing (2)
  • scripts/check_benchmark_regression.py
  • tests/test_check_benchmark_regression.py
📝 Walkthrough

Walkthrough

Adds a summary-cache pytest-benchmark suite, records baseline timings, adds a regression-checking script, and wires the benchmarks into CI with artifact upload and ignore rules.

Changes

Summary Cache Benchmark Validation

Layer / File(s) Summary
Benchmark config and fixtures
pyproject.toml, tests/benchmarks/conftest.py
Adds pytest-benchmark to the dev dependency set, configures pytest benchmark defaults and marker registration, and adds synthetic workspace fixtures for summary-cache benchmarks.
Benchmark cases
tests/benchmarks/test_summary_cache_bench.py
Adds benchmark cases for cache hits, cache misses, workspace fingerprinting at 10/50/200 entries, and a set-then-get round trip.
Regression script and baselines
benchmarks/baselines.json, scripts/check_benchmark_regression.py
Refreshes stored benchmark timings and adds a CLI that loads benchmark results, compares means to stored baselines, and reports regressions or malformed inputs.
CI benchmark job
.github/workflows/tests.yml, .gitignore
Adds a workflow job that runs the benchmarks, checks regression output, uploads benchmark-results.json, and ignores generated benchmark artifacts.

Sequence Diagram(s)

sequenceDiagram
  participant BenchmarksJob as benchmarks job
  participant Pytest as pytest
  participant RegressionScript as scripts/check_benchmark_regression.py
  participant ArtifactUpload as actions/upload-artifact

  BenchmarksJob->>Pytest: run tests/benchmarks/ with --benchmark-only
  Pytest-->>BenchmarksJob: benchmark-results.json
  BenchmarksJob->>RegressionScript: compare benchmark-results.json to benchmarks/baselines.json
  RegressionScript-->>BenchmarksJob: exit code 0, 1, or 2
  BenchmarksJob->>ArtifactUpload: upload benchmark-results.json with if: always()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • wpak-ai
  • bradjin8

Poem

(_/)
I hopped through cache with careful thump,
Measured misses, hits, and every jump.
Baselines twinkled, CI said “hooray,”
And regression gates held firm today.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning requirements-lock.txt includes an unrelated click pin update that is outside the benchmark-validation scope. Remove the click version bump unless it is required for the benchmark changes and document that dependency update separately.
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the new summary-cache pytest-benchmark suite and CI regression gate.
Linked Issues check ✅ Passed The PR covers the required synthetic benchmarks, baselines, and CI regression gate for summary-cache without changing production code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/summary-cache-benchmarks

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
scripts/check_benchmark_regression.py (1)

95-99: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

A missing benchmark result silently passes the gate.

When a baseline benchmark has no current result, the script only prints a WARN and continues. If a gated benchmark is renamed, removed, or fails to run, the regression gate will report success and the performance coverage silently disappears. Consider treating a missing gated benchmark as a failure (or a separate exit code) so coverage gaps are surfaced.

♻️ Possible approach
     failures: list[str] = []
+    missing: list[str] = []
     for name, base in baseline_means.items():
         cur = flat.get(name)
         if cur is None:
-            print(f"WARN: no current result for baseline {name!r}; skipping")
+            print(f"MISSING: no current result for baseline {name!r}")
+            missing.append(name)
             continue

Then fail (or warn-as-error) when missing is non-empty, per your policy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/check_benchmark_regression.py` around lines 95 - 99, The benchmark
gating in check_benchmark_regression.py currently skips missing current results
in the baseline loop, which can let renamed, removed, or failing benchmarks pass
unnoticed. Update the logic around the baseline_means/flat comparison so missing
gated benchmarks are collected and treated as a failure (or a distinct non-zero
exit condition) instead of only printing a warning. Make the behavior explicit
in the final exit path of the script so coverage gaps are surfaced alongside the
existing regression checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/baselines.json`:
- Around line 1-15: The benchmark baselines in the summary-cache group are still
using Windows-derived values with 1.5x slack, but the regression gate runs on
ubuntu-latest. Refresh the values from a green ubuntu-latest benchmark artifact
and keep the _note consistent with the source platform; if platform-matched
baselines are not yet available, make the gate non-blocking until they are.
Focus on the baselines.json entries under groups.summary-cache so the
check_benchmark_regression.py comparisons are against the same OS.

In `@pyproject.toml`:
- Around line 39-44: The pytest configuration currently applies --benchmark-skip
globally in [tool.pytest.ini_options], which breaks the regular integration
pytest run because pytest-benchmark is not installed there. Update the test
setup so the benchmark plugin is available wherever this addopts setting is
used, or move the --benchmark-skip option to only the benchmark-specific job;
check the pytest configuration and the workflows that invoke python -m pytest,
especially the benchmark-related markers and options.

---

Nitpick comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 95-99: The benchmark gating in check_benchmark_regression.py
currently skips missing current results in the baseline loop, which can let
renamed, removed, or failing benchmarks pass unnoticed. Update the logic around
the baseline_means/flat comparison so missing gated benchmarks are collected and
treated as a failure (or a distinct non-zero exit condition) instead of only
printing a warning. Make the behavior explicit in the final exit path of the
script so coverage gaps are surfaced alongside the existing regression checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1cc8514e-6127-42e8-95a6-134382f2ed25

📥 Commits

Reviewing files that changed from the base of the PR and between 793b6f2 and c126227.

📒 Files selected for processing (7)
  • .github/workflows/tests.yml
  • .gitignore
  • benchmarks/baselines.json
  • pyproject.toml
  • scripts/check_benchmark_regression.py
  • tests/benchmarks/conftest.py
  • tests/benchmarks/test_summary_cache_bench.py

Comment thread benchmarks/baselines.json
Comment thread pyproject.toml
@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit, review this PR in more depth

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/tests.yml ==="
cat -n .github/workflows/tests.yml

echo ""
echo "=== pyproject.toml ==="
cat -n pyproject.toml

echo ""
echo "=== scripts/check_benchmark_regression.py ==="
cat -n scripts/check_benchmark_regression.py

echo ""
echo "=== tests/benchmarks/conftest.py ==="
cat -n tests/benchmarks/conftest.py

echo ""
echo "=== tests/benchmarks/test_summary_cache_bench.py ==="
cat -n tests/benchmarks/test_summary_cache_bench.py

echo ""
echo "=== benchmarks/baselines.json ==="
cat -n benchmarks/baselines.json

echo ""
echo "=== .gitignore (last 20 lines) ==="
tail -20 .gitignore

echo ""
echo "=== requirements-lock.txt ==="
cat -n requirements-lock.txt

echo ""
echo "=== services/summary_cache.py ==="
cat -n services/summary_cache.py

echo ""
echo "=== Directory tree of tests/benchmarks/ ==="
tree tests/benchmarks/ 2>/dev/null || find tests/benchmarks/ -type f

echo ""
echo "=== Directory tree of scripts/ ==="
tree scripts/ 2>/dev/null || find scripts/ -type f

echo ""
echo "=== benchmarks/ directory ==="
tree benchmarks/ 2>/dev/null || find benchmarks/ -type f

Length of output: 44058



@clean6378-max-it Here is a deep review of PR #120.


Deep Review: PR #120 — pytest-benchmark suite for summary_cache.py

The PR is well-structured overall — no production code is touched, the fixture isolation via monkeypatch is correct, and the regression gate script has solid error-handling. However, there are several issues ranging from critical to minor that should be addressed before merge.


🔴 Critical

1. Baseline name format may never match pytest-benchmark output (benchmarks/baselines.json + scripts/check_benchmark_regression.py)

The regression gate in check_regression() (Line 96) looks up current results by name from the baselines. The baselines use short names like test_summary_cache_hit and test_fingerprint_workspace_entries[10].

pytest-benchmark emits the name field as the short test node name (e.g. test_summary_cache_hit) in some configurations, but in others it emits the full node ID including the module path (tests/benchmarks/test_summary_cache_bench.py::test_summary_cache_hit). If the format doesn't match, every baseline entry will show as MISSING and the gate will always fail (exit code 1) — or always pass if the lookup silently misses.

You should verify the exact format by inspecting an actual benchmark-results.json produced by the CI job and cross-referencing the name field against the keys in baselines.json. Consider adding a normalization step in load_results() that strips the module prefix:

# In load_results(), after extracting `name`:
# Normalize "tests/benchmarks/test_x.py::test_foo[param]" → "test_foo[param]"
name = str(name).rsplit("::", 1)[-1]

2. Baselines appear manually fabricated — future-dated and unverified (benchmarks/baselines.json)

"updated": "2026-06-25T12:00:00Z",

The timestamp is in the future and the _note claims these values come from "ubuntu-latest CI benchmark-results.json", but there is no CI artifact proving that. Values like 6.3e-05 for both cache hit and cache miss (identical to 6 decimal places) suggest they were hand-written, not measured. If these baselines are too tight for a slower runner, the gate will spuriously fail; if too loose, the gate provides no real protection.

Recommendation: Do not commit hand-crafted baselines. Run the benchmarks in CI first, capture the artifact, update the file from the actual numbers, and commit that as a follow-up.

3. benchmarks CI job has no needs: dependency on unittest (.github/workflows/tests.yml Line 219)

The benchmark job runs in parallel with unit tests. If unit tests fail due to a production code regression, benchmarks can still be reported as "passing" — giving a misleading green signal.

# Add this to the benchmarks job:
needs: [unittest]

🟡 Medium

4. addopts = "--benchmark-disable" is missing from pyproject.toml (Line 39–43)

The PR objectives state: "adds default benchmark skipping for normal test runs." But the [tool.pytest.ini_options] section has no addopts at all:

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "benchmark: performance benchmarks (pytest-benchmark)",
]

Since testpaths includes tests/, tests/benchmarks/ is collected on every plain pytest invocation. Without --benchmark-disable, pytest-benchmark will execute benchmark functions normally (one call each, with timing overhead). This slows down the developer loop and contradicts the stated goal.

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--benchmark-disable"
markers = [
    "benchmark: performance benchmarks (pytest-benchmark)",
]

Note: the CI benchmark job already overrides this with -o addopts=, so adding it won't break CI.

5. sys.path manipulation in conftest.py is fragile (Lines 12–14)

REPO_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
if REPO_ROOT not in sys.path:
    sys.path.insert(0, REPO_ROOT)

This is three levels of dirname that breaks silently if the file moves. The idiomatic solution is to install the package in editable mode in CI (pip install -e .) and remove the manual path insertion entirely. Alternatively, the [tool.pytest.ini_options] section can set pythonpath = ["."] (pytest ≥ 7.0):

[tool.pytest.ini_options]
pythonpath = ["."]

6. pytest-benchmark not pinned in requirements-lock.txt (.github/workflows/tests.yml Line 236)

python -m pip install 'pytest>=8,<9' 'pytest-benchmark>=4,<5'

All other runtime deps are resolved from requirements-lock.txt for determinism, but pytest-benchmark is installed ad-hoc. A transitive dep (e.g. py-cpuinfo, attrs) could change versions silently. pytest-benchmark should be pinned in the lock file, or at minimum pinned to a specific patch version in the workflow.

7. stale_fingerprint relies on an undocumented assumption (tests/benchmarks/conftest.py Line 88–89)

`@pytest.fixture`
def stale_fingerprint(workspace_fingerprint: dict[str, Any]) -> dict[str, Any]:
    return {**workspace_fingerprint, "global_db_mtime_ns": 9_999_999_999}

This works because global_db_path=None causes _file_mtime_ns(None) to return None, making the real fingerprint's global_db_mtime_ns equal to None. Setting it to 9_999_999_999 creates a mismatch. But this is completely implicit — if fingerprint_workspace_storage ever changes its handling of None paths, this fixture silently stops simulating a miss.

A more robust approach is to use a sentinel that always differs:

`@pytest.fixture`
def stale_fingerprint(workspace_fingerprint: dict[str, Any]) -> dict[str, Any]:
    """Return a fingerprint guaranteed to differ from the stored one."""
    return {**workspace_fingerprint, "rules_digest": "deadbeefdeadbeef"}

This works regardless of what global_db_mtime_ns is.


🔵 Minor

8. Missing tests/benchmarks/__init__.py

tests/benchmarks/
├── conftest.py
└── test_summary_cache_bench.py

The directory has no __init__.py. While modern pytest handles this correctly, the rest of tests/ likely uses __init__.py files (consistent with python -m unittest discover). Adding one avoids potential namespace collisions if identically-named modules exist elsewhere.

9. load_baseline_means silently skips non-dict group values (scripts/check_benchmark_regression.py Line 69)

if not isinstance(value, dict):
    continue  # silently skips

A typo in baselines.json (e.g. a string value for a group) is silently ignored, which could cause a group's benchmarks to never be gated. Consider issuing a warning or raising BenchmarkDataError.

10. Redundant str() cast in load_results (scripts/check_benchmark_regression.py Line 43)

name = str(name)  # name was already set from entry["name"] above

name is set on Line 37 directly from entry["name"], which would have been accepted as str already. The redundant cast can be removed.

11. Missing blank line before first test function (tests/benchmarks/test_summary_cache_bench.py Line 16)

PEP 8 requires two blank lines between the import block and the first top-level definition:

from services.summary_cache import (
    fingerprint_workspace_storage,
    get_cached_projects,
    set_cached_projects,
)
                                         # ← two blank lines needed here
`@pytest.mark.benchmark`(group="summary-cache")
def test_summary_cache_hit(...):

12. No unit tests for check_benchmark_regression.py

The regression gate script has good internal logic (load_results, load_baseline_means, check_regression) but zero test coverage. Given that this script is the only thing blocking regressions in CI, a small test file (e.g. tests/test_check_benchmark_regression.py) covering the happy path, missing-benchmark, and threshold-exceeded cases would greatly increase confidence.


Summary table

Severity Issue File
🔴 Critical Baseline name format may mismatch pytest-benchmark output baselines.json, check_benchmark_regression.py
🔴 Critical Baselines are future-dated and appear manually fabricated baselines.json
🔴 Critical benchmarks job has no needs: [unittest] tests.yml
🟡 Medium Missing addopts = "--benchmark-disable" pyproject.toml
🟡 Medium Fragile sys.path manipulation conftest.py
🟡 Medium pytest-benchmark not pinned in lock file tests.yml, requirements-lock.txt
🟡 Medium stale_fingerprint relies on implicit None assumption conftest.py
🔵 Minor Missing __init__.py in tests/benchmarks/
🔵 Minor Silent skip of non-dict group values check_benchmark_regression.py
🔵 Minor Redundant str() cast check_benchmark_regression.py
🔵 Minor Missing blank line after imports test_summary_cache_bench.py
🔵 Minor No unit tests for the regression gate script

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 17-19: The normalize_benchmark_name helper currently uses
rsplit("::", 1), which can incorrectly strip parts of parametrized benchmark
names when parameter values include "::". Update normalize_benchmark_name to
split on the first "::" only so pytest node prefixes are removed without
truncating names like test_x[param::v], and keep the behavior consistent for
both short and full benchmark names.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c745c7f7-791c-4802-9b4a-d7dc19639498

📥 Commits

Reviewing files that changed from the base of the PR and between e4d79a6 and 83bb248.

📒 Files selected for processing (8)
  • .github/workflows/tests.yml
  • benchmarks/baselines.json
  • pyproject.toml
  • scripts/check_benchmark_regression.py
  • tests/benchmarks/__init__.py
  • tests/benchmarks/conftest.py
  • tests/benchmarks/test_summary_cache_bench.py
  • tests/test_check_benchmark_regression.py
✅ Files skipped from review due to trivial changes (1)
  • benchmarks/baselines.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/benchmarks/test_summary_cache_bench.py
  • .github/workflows/tests.yml

Comment thread scripts/check_benchmark_regression.py Outdated
# `pytest tests/` — to avoid re-collecting unittest.TestCase classes above.
run: python -m pytest tests/test_api_search.py tests/test_api_workspaces.py tests/test_api_export.py tests/test_pdf_export.py tests/test_search_helpers.py -v --tb=short
# -o addopts= avoids inheriting benchmark-only options from pyproject.toml.
run: python -m pytest tests/test_api_search.py tests/test_api_workspaces.py tests/test_api_export.py tests/test_pdf_export.py tests/test_search_helpers.py -v --tb=short -o addopts=

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/test_check_benchmark_regression.py (15 tests for the regression gate) is not run in CI. unittest discover won't collect pytest functions, and the integration pytest step lists only API/search files. Add this file to the pytest step or run pytest tests/test_check_benchmark_regression.py -o addopts= so gate logic is covered on every matrix run.

Comment thread requirements-lock.txt
# colorama via click) are omitted — pip still installs them on Windows when needed.
blinker==1.9.0 # via flask
click==8.4.1 # via flask
click==8.4.2 # via flask

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated click==8.4.1 → 8.4.2 bump is out of scope for a benchmarks-only PR. Revert unless there's a documented dependency reason; otherwise it adds review nois

Comment thread pyproject.toml

[tool.pytest.ini_options]
pythonpath = ["."]
addopts = "--benchmark-disable"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says --benchmark-skip but config uses --benchmark-disable. With disable, benchmark tests still run once on a bare pytest (functional smoke, ~0.4s). Either align the docs or switch to --benchmark-skip if the goal is zero benchmark execution during normal pytest.

Comment thread pyproject.toml
Comment on lines +39 to +42
[tool.pytest.ini_options]
pythonpath = ["."]
addopts = "--benchmark-disable"
testpaths = ["tests"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New [tool.pytest.ini_options] with addopts = "--benchmark-disable" requires pytest-benchmark installed for any default pytest invocation. CI handles this; document in PR/issue that devs need pip install -e ".[dev]" (or pytest-benchmark) before running pytest without -o addopts=.

Comment on lines +28 to +37
@pytest.mark.benchmark(group="summary-cache")
def test_summary_cache_miss(
benchmark,
summary_cache_dir: Path,
workspace_fingerprint: dict[str, Any],
stale_fingerprint: dict[str, Any],
sample_projects: list[dict[str, Any]],
) -> None:
set_cached_projects(workspace_fingerprint, sample_projects, [])
benchmark(get_cached_projects, stale_fingerprint)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_summary_cache_hit and test_summary_cache_miss measure nearly identical paths (~63µs on CI); both baselines are 6.3e-05. Consider merging into one "cache lookup" benchmark with hit/miss as subcases, or document that miss means fingerprint-mismatch only (not rebuild cost).

Comment on lines +33 to +48
@pytest.fixture
def summary_cache_dir(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
"""Redirect summary-cache files to an isolated temp directory.

Patches ``CACHE_DIR`` (also used by tab-summary paths via ``_tab_summaries_path``)
plus the projects/composer-map file constants used by current benchmarks.
"""
cache_dir = tmp_path / "cache"
cache_dir.mkdir()
monkeypatch.setattr(summary_cache, "CACHE_DIR", cache_dir)
monkeypatch.setattr(summary_cache, "PROJECTS_CACHE_FILE", cache_dir / "projects.json")
monkeypatch.setattr(
summary_cache,
"COMPOSER_MAP_CACHE_FILE",
cache_dir / "composer-id-to-ws.json",
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary_cache_dir patches PROJECTS_CACHE_FILE and COMPOSER_MAP_CACHE_FILE but not tab-summary paths (_tab_summaries_path uses CACHE_DIR prefix only). Fine for current scope; add a comment that tab-summary benchmarks belong in issue #8's unified suite.

Comment thread benchmarks/baselines.json
Comment on lines +1 to +2
{
"_note": "Gated means from ubuntu-latest CI benchmark-results.json (PR #120, run 28123677675). Refresh: pytest tests/benchmarks/ --benchmark-only --benchmark-json=benchmark-results.json -o addopts=",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_note refresh command is truncated (-o addopts= with no value). Complete the example: -o addopts= --benchmark-json=benchmark-results.json.

Comment on lines +239 to +245
- name: Run summary-cache benchmarks
run: >
python -m pytest tests/benchmarks/
--benchmark-only
--benchmark-json=benchmark-results.json
--benchmark-columns=min,max,mean,stddev,rounds
-o addopts=

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pinning --benchmark-min-rounds (e.g. 5) to reduce CI timing variance near the 20% threshold, especially for sub-millisecond hit/miss benchmarks.

Comment on lines +118 to +126
if base == 0:
print(f"WARN: baseline for {name!r} is zero; skipping ratio check")
continue
ratio = cur / base
tag = "FAIL" if ratio > threshold else "ok"
print(f"[{tag}] {name}: {cur:.6f}s vs {base:.6f}s ({ratio:.2f}x)")
if ratio > threshold:
failures.append(name)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New benchmarks without baselines only WARN; gated missing benchmarks correctly FAIL (fixed in 83bb248). Consider also failing when all current results are faster than baseline by >50% (stale baseline drift), or document a baseline-refresh workflow in benchmarks/baselines.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cppa-cursor-browser: Summary cache benchmark validation

2 participants