Feat: Drop report.txt and use results_summary.json by nvzhihanj · Pull Request #353 · mlcommons/endpoints

nvzhihanj · 2026-06-12T05:48:12Z

Narrowed scope: this PR now covers only the result_summary.json / report.txt change.

What & why

result_summary.json and report.txt are two renderings of the same Report. report.txt has no programmatic consumers, and result_summary.json was missing the computed QPS/TPS (callers had to recompute them from duration + counts).

Changes

result_summary.json is now self-complete. Report.to_json injects the derived qps/tps into the serialized JSON. They remain methods (not stored fields) to avoid duplicating data already in duration_ns / counters / OSL, so no Report constructor sites change.
report.txt is no longer written. It was a human-readable dump of the same Report with zero programmatic/test consumers; the same summary is still logged to the console.
Docs: docs/LOCAL_TESTING.md — result_summary.json described as the primary, self-complete report; dropped report.txt and the never-written runtime_settings.json from the artifacts list.

Tests

result_summary.json serializes qps/tps (concrete value) and serializes them as null when there is no duration.
End-to-end perf run: result_summary.json is self-complete and no report.txt is produced.
Full suite + pre-commit green.

🤖 Generated with Claude Code

github-actions · 2026-06-12T05:48:20Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces a new truncate-results CLI command to shrink large benchmark results files by keeping a few full responses and hashing the rest. It also optimizes artifact generation by only running the event logger and saving event logs or sample index maps when accuracy scoring is active or when explicitly requested via configuration flags. Additionally, derived throughput metrics (qps and tps) are now directly serialized into the JSON report. The review feedback suggests several robust improvements: implementing atomic file writes and better error handling in the truncation command, wrapping the tmpfs salvage process in a try-except block to ensure resource cleanup, and using msgspec.structs.asdict for a more efficient struct-to-dict conversion.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T05:49:44Z

+    if config.in_place:
+        out_path = config.results
+    elif config.output is not None:
+        out_path = config.output
+    else:
+        out_path = config.results.with_name(config.results.stem + ".truncated.json")
+
+    out_path.write_text(json.dumps(truncated, indent=2))


Writing directly to the target file is risky, especially for large files (which can be gigabytes in size). If the write operation is interrupted or fails due to disk space issues, the original file will be corrupted or lost. Writing to a temporary file first and then atomically replacing the target file prevents corruption.

if config.in_place: out_path = config.results elif config.output is not None: out_path = config.output else: out_path = config.results.with_name(config.results.stem + ".truncated.json") temp_path = out_path.with_suffix(".tmp") try: temp_path.write_text(json.dumps(truncated, indent=2)) temp_path.replace(out_path) except Exception as e: if temp_path.exists(): temp_path.unlink() raise e

gemini-code-assist · 2026-06-12T05:49:44Z

+                _salvage_tmpfs(
+                    ctx.report_dir,
+                    bench.tmpfs_dir,
+                    keep_events=ctx.config.keep_events_artifacts,
+                )


If _salvage_tmpfs raises an exception (e.g., due to a destination disk-full or permission error during shutil.copy2), the subsequent shutil.rmtree call will be skipped. This will leak the multi-gigabyte tmpfs_dir in RAM (/dev/shm), potentially leading to RAM exhaustion on subsequent runs. Wrapping the salvage call in a try-except block ensures cleanup always runs.

Suggested change

_salvage_tmpfs(

ctx.report_dir,

bench.tmpfs_dir,

keep_events=ctx.config.keep_events_artifacts,

)

try:

_salvage_tmpfs(

ctx.report_dir,

bench.tmpfs_dir,

keep_events=ctx.config.keep_events_artifacts,

)

except Exception as e:

logger.warning(f"Failed to salvage tmpfs: {e}")

gemini-code-assist · 2026-06-12T05:49:44Z

+    if not config.results.exists():
+        raise InputValidationError(f"Results file not found: {config.results}")


Using is_file() is safer than exists() because it ensures the path is a regular file and not a directory, preventing potential IsADirectoryError exceptions when calling read_text().

Suggested change

if not config.results.exists():

raise InputValidationError(f"Results file not found: {config.results}")

if not config.results.is_file():

raise InputValidationError(f"Results file not found or is not a file: {config.results}")

gemini-code-assist · 2026-06-12T05:49:44Z

+    if not config.results.exists():
+        raise InputValidationError(f"Results file not found: {config.results}")
+
+    data = json.loads(config.results.read_text())


If the results file is empty or contains malformed JSON, json.loads will raise a json.JSONDecodeError, which propagates as an unhandled exception and prints a full traceback. Handling this gracefully and raising an InputValidationError provides a much better user experience.

try: data = json.loads(config.results.read_text()) except json.JSONDecodeError as e: raise InputValidationError(f"Invalid JSON in results file: {e}") from e

gemini-code-assist · 2026-06-12T05:49:44Z

+        # JSON is self-complete (consumers don't recompute qps/tps from
+        # duration + counts). They are methods, not stored fields, to avoid
+        # duplicating data already present in duration_ns / counters / osl.
+        payload: dict[str, Any] = msgspec.json.decode(msgspec.json.encode(self))


Using msgspec.json.decode(msgspec.json.encode(self)) to convert a msgspec.Struct to a dictionary introduces unnecessary serialization and deserialization overhead. Using msgspec.structs.asdict is the standard, high-performance way to convert a struct to a dictionary.

Suggested change

payload: dict[str, Any] = msgspec.json.decode(msgspec.json.encode(self))

import msgspec

payload = msgspec.structs.asdict(self)

result_summary.json is now the single source of truth for the run summary. Report.to_json injects the derived qps/tps so the JSON is self-complete and consumers no longer recompute them from duration + counts. report.txt (a human-readable dump of the same Report, with no programmatic consumers) is no longer written — the same summary is still logged to the console. Tests: result_summary serializes qps/tps (and null when there is no duration); an end-to-end perf run confirms result_summary.json is self-complete and no report.txt is produced. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Asserting that report.txt is *not* written guards nothing meaningful — the code simply no longer writes it. Keep the assertions that matter (result_summary.json carries qps/tps, i.e. it is self-complete) and rename the test to test_result_summary_self_complete. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the `msgspec.json.decode(msgspec.json.encode(self))` round-trip in Report.to_json with `msgspec.to_builtins(self)` — same wire-shaped dict (honors the encoder config, so the JSON schema is unchanged) without the encode→decode→encode detour. Addresses the reviewer note on this line; note that the suggested `msgspec.structs.asdict` would NOT be schema-safe (it uses Python attribute names rather than the encoder's field names). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Compute qps/tps once in Report.from_snapshot and store them as struct fields (default None) so they serialize naturally — to_json is a plain encode again, with no to_builtins/dict round-trip and no method-vs-JSON-key divergence (msgspec.json.encode(report) now matches report.to_json()). qps()/tps() methods are dropped; display() and execute.py read the attributes. Production only builds Report via from_snapshot, so the fields are always computed there. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

nvzhihanj requested a review from a team as a code owner June 12, 2026 05:48

github-actions Bot requested a review from arekay-nv June 12, 2026 05:48

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

nvzhihanj changed the title ~~feat: slim benchmark output artifacts~~ Draft: feat: slim benchmark output artifacts Jun 12, 2026

nvzhihanj marked this pull request as draft June 12, 2026 06:30

nvzhihanj force-pushed the feat/slim-benchmark-output-artifacts branch from 7a76518 to 8557e26 Compare June 12, 2026 06:42

nvzhihanj changed the title ~~Draft: feat: slim benchmark output artifacts~~ feat: make result_summary.json self-complete; drop report.txt Jun 12, 2026

nvzhihanj marked this pull request as ready for review June 12, 2026 06:44

nvzhihanj mentioned this pull request Jun 12, 2026

feat: add truncate-results command #354

Open

nvzhihanj changed the title ~~feat: make result_summary.json self-complete; drop report.txt~~ Feat: Drop report.txt and use results_summary.json Jun 12, 2026

nvzhihanj and others added 2 commits June 12, 2026 11:33

nvzhihanj marked this pull request as draft June 12, 2026 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Drop report.txt and use results_summary.json#353

Feat: Drop report.txt and use results_summary.json#353
nvzhihanj wants to merge 4 commits into
release/v0.5from
feat/slim-benchmark-output-artifacts

nvzhihanj commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if not config.results.exists():
		raise InputValidationError(f"Results file not found: {config.results}")

	payload: dict[str, Any] = msgspec.json.decode(msgspec.json.encode(self))
	import msgspec
	payload = msgspec.structs.asdict(self)

Conversation

nvzhihanj commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

Changes

Tests

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nvzhihanj commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading