Skip to content

Make build throughput ignore skipped images#552

Open
simonrosenberg wants to merge 2 commits intomainfrom
fix/clarify-build-telemetry
Open

Make build throughput ignore skipped images#552
simonrosenberg wants to merge 2 commits intomainfrom
fix/clarify-build-telemetry

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Mar 22, 2026

Summary

  • make throughput mean built images/hour everywhere
  • keep built/skipped/failed counts explicit in batch and final logs
  • store a single built-only throughput metric in build-summary.json

Why

Skipped images should not inflate throughput. The build logs and summary should show skips clearly, but any throughput number used to reason about build efficiency should be computed from non-skipped images only.

Changes

  • batch logs now print built, skipped, failed, and a single built-only throughput
  • final logs now print built, skipped, failed, retried, and a single built-only throughput
  • build-summary.json now stores throughput_images_per_hour, computed from built images only
  • markdown summaries now show a single throughput value based on built images only

Validation

  • python3 -m ruff check benchmarks/utils/build_manifest.py benchmarks/utils/build_utils.py tests/test_build_manifest.py tests/test_image_utils.py
  • python3 -m ruff format benchmarks/utils/build_manifest.py benchmarks/utils/build_utils.py tests/test_build_manifest.py tests/test_image_utils.py --check
  • PYTHONPATH=/Users/simonrosenberg/tmp/benchmarks-telemetry-clarity python3 -m pytest tests/test_build_manifest.py tests/test_image_utils.py -k 'build_manifest or built_images_for_throughput'

Notes

The repo's pre-commit hook could not run in this detached worktree because uv could not resolve the local openhands-sdk workspace member from this checkout shape. The targeted lint/tests above were run directly instead.

all-hands-bot

This comment was marked as outdated.

@simonrosenberg simonrosenberg changed the title Clarify build throughput and cache telemetry Make build throughput ignore skipped images Mar 22, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic fix for a real observability problem, but has inefficiency that should be cleaned up.

VERDICT: ✅ Worth merging after addressing the double-calculation issue

KEY INSIGHT: The change correctly separates "images processed" from "images built" in throughput metrics, preventing fast skips from masking slow builds. However, the implementation calculates throughput twice and throws away the first result.

Comment on lines 934 to 941
[result.model_dump(mode="json") for result in results],
manifest_files=1 if results else 0,
)
summary_file.write_text(summary.model_dump_json(indent=2), encoding="utf-8")
overall_duration = time.monotonic() - overall_started_monotonic
throughput = (
(len(base_images) / overall_duration) * 3600 if overall_duration else 0.0
summary.wall_clock_seconds = _round_duration(overall_duration)
summary.throughput_images_per_hour = (
(built / overall_duration) * 3600 if overall_duration else 0.0
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Wasteful double calculation - you call summarize_build_records() which parses timestamps and calculates throughput, then immediately overwrite those values with monotonic timer measurements.

Better approach: Pass wall_clock_seconds as an optional parameter to summarize_build_records() so it can skip the timestamp calculation when you already have accurate timing from monotonic timers.

summary = summarize_build_records(
    [result.model_dump(mode="json") for result in results],
    manifest_files=1 if results else 0,
    wall_clock_seconds=overall_duration,  # Pass it in
)

Then in summarize_build_records(), only parse timestamps if wall_clock_seconds is None. This is the issue the previous reviewer flagged.

Comment on lines 204 to +206
).total_seconds()
throughput_images_per_hour = None
if wall_clock_seconds and wall_clock_seconds > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Consider whether storing both wall_clock_seconds and throughput_images_per_hour creates unnecessary redundancy. One can always be derived from the other + built count.

However, if this is for convenience (avoiding recalculation in consumers), that's pragmatic - just ensure they stay consistent. The current implementation is fine if you address the double-calculation issue above.

Comment on lines +577 to +616
@contextlib.contextmanager
def fake_prepare_cached_sdist():
yield None

class FakeFuture:
def __init__(self, result: BuildOutput):
self._result = result

def result(self) -> BuildOutput:
return self._result

class FakeExecutor:
def __init__(self, *args, **kwargs):
pass

def __enter__(self):
return self

def __exit__(self, exc_type, exc, tb):
return False

def submit(self, fn, **kwargs):
base = kwargs["base_image"]
if base == "base-2":
result = BuildOutput(
base_image=base,
tags=[f"tag:{base}"],
error=None,
status="skipped_remote_exists",
skip_reason="remote_image_exists",
duration_seconds=1.0,
)
else:
result = BuildOutput(
base_image=base,
tags=[f"tag:{base}"],
error=None,
status="built",
duration_seconds=10.0,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Heavy use of mocks (FakeFuture, FakeExecutor) tests implementation details rather than real behavior. Normally I'd say "mocks aren't tests," but given that the real code path runs Docker builds, this is a pragmatic trade-off to keep tests fast and deterministic.

The test does verify the actual calculation logic (2 built / 45s = 160 images/hour), which is what matters here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants