Make build throughput ignore skipped images#552
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Pragmatic fix for a real observability problem, but has inefficiency that should be cleaned up.
VERDICT: ✅ Worth merging after addressing the double-calculation issue
KEY INSIGHT: The change correctly separates "images processed" from "images built" in throughput metrics, preventing fast skips from masking slow builds. However, the implementation calculates throughput twice and throws away the first result.
| [result.model_dump(mode="json") for result in results], | ||
| manifest_files=1 if results else 0, | ||
| ) | ||
| summary_file.write_text(summary.model_dump_json(indent=2), encoding="utf-8") | ||
| overall_duration = time.monotonic() - overall_started_monotonic | ||
| throughput = ( | ||
| (len(base_images) / overall_duration) * 3600 if overall_duration else 0.0 | ||
| summary.wall_clock_seconds = _round_duration(overall_duration) | ||
| summary.throughput_images_per_hour = ( | ||
| (built / overall_duration) * 3600 if overall_duration else 0.0 | ||
| ) |
There was a problem hiding this comment.
🟠 Important: Wasteful double calculation - you call summarize_build_records() which parses timestamps and calculates throughput, then immediately overwrite those values with monotonic timer measurements.
Better approach: Pass wall_clock_seconds as an optional parameter to summarize_build_records() so it can skip the timestamp calculation when you already have accurate timing from monotonic timers.
summary = summarize_build_records(
[result.model_dump(mode="json") for result in results],
manifest_files=1 if results else 0,
wall_clock_seconds=overall_duration, # Pass it in
)Then in summarize_build_records(), only parse timestamps if wall_clock_seconds is None. This is the issue the previous reviewer flagged.
| ).total_seconds() | ||
| throughput_images_per_hour = None | ||
| if wall_clock_seconds and wall_clock_seconds > 0: |
There was a problem hiding this comment.
🟡 Suggestion: Consider whether storing both wall_clock_seconds and throughput_images_per_hour creates unnecessary redundancy. One can always be derived from the other + built count.
However, if this is for convenience (avoiding recalculation in consumers), that's pragmatic - just ensure they stay consistent. The current implementation is fine if you address the double-calculation issue above.
| @contextlib.contextmanager | ||
| def fake_prepare_cached_sdist(): | ||
| yield None | ||
|
|
||
| class FakeFuture: | ||
| def __init__(self, result: BuildOutput): | ||
| self._result = result | ||
|
|
||
| def result(self) -> BuildOutput: | ||
| return self._result | ||
|
|
||
| class FakeExecutor: | ||
| def __init__(self, *args, **kwargs): | ||
| pass | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb): | ||
| return False | ||
|
|
||
| def submit(self, fn, **kwargs): | ||
| base = kwargs["base_image"] | ||
| if base == "base-2": | ||
| result = BuildOutput( | ||
| base_image=base, | ||
| tags=[f"tag:{base}"], | ||
| error=None, | ||
| status="skipped_remote_exists", | ||
| skip_reason="remote_image_exists", | ||
| duration_seconds=1.0, | ||
| ) | ||
| else: | ||
| result = BuildOutput( | ||
| base_image=base, | ||
| tags=[f"tag:{base}"], | ||
| error=None, | ||
| status="built", | ||
| duration_seconds=10.0, | ||
| ) |
There was a problem hiding this comment.
🟢 Acceptable: Heavy use of mocks (FakeFuture, FakeExecutor) tests implementation details rather than real behavior. Normally I'd say "mocks aren't tests," but given that the real code path runs Docker builds, this is a pragmatic trade-off to keep tests fast and deterministic.
The test does verify the actual calculation logic (2 built / 45s = 160 images/hour), which is what matters here.
Summary
build-summary.jsonWhy
Skipped images should not inflate throughput. The build logs and summary should show skips clearly, but any throughput number used to reason about build efficiency should be computed from non-skipped images only.
Changes
build-summary.jsonnow storesthroughput_images_per_hour, computed from built images onlyValidation
python3 -m ruff check benchmarks/utils/build_manifest.py benchmarks/utils/build_utils.py tests/test_build_manifest.py tests/test_image_utils.pypython3 -m ruff format benchmarks/utils/build_manifest.py benchmarks/utils/build_utils.py tests/test_build_manifest.py tests/test_image_utils.py --checkPYTHONPATH=/Users/simonrosenberg/tmp/benchmarks-telemetry-clarity python3 -m pytest tests/test_build_manifest.py tests/test_image_utils.py -k 'build_manifest or built_images_for_throughput'Notes
The repo's pre-commit hook could not run in this detached worktree because
uvcould not resolve the localopenhands-sdkworkspace member from this checkout shape. The targeted lint/tests above were run directly instead.