Skip to content

Autoloop PR/issue should report cumulative performance improvement #69

@mrjf

Description

@mrjf

Problem

Autoloop programs that optimize a metric over many iterations don't report the cumulative improvement anywhere visible — not in the PR body, not in the issue, not in the state file summary. A reader of tsessebe PR #297 or issue #189 can see the current best metric but has no idea where the program started or how much progress has been made.

This was observed on the tsb-perf-evolve program (41 iterations, fitness ratio optimization). The PR body says "Current best metric: 21.048" but never mentions the starting value or total improvement.

Root causes

Three gaps in autoloop.md:

1. No Initial Metric in Machine State

The state file schema tracks Best Metric but not the baseline metric from the first accepted iteration. Without this, there's no reference point to compute cumulative improvement.

Fix: Add an Initial Metric field to the Machine State table schema. Set it on the first accepted iteration and never overwrite it.

2. PR body template doesn't include cumulative improvement

Step 5c says to update the PR body with "the latest metric and a summary of the most recent accepted iteration" but never specifies showing start-to-finish improvement.

Fix: Update the Step 5c PR body template to include: "Fitness: {best_metric} (started at {initial_metric}, {improvement_pct}% improvement)" or similar.

3. pending-ci fitness values never flow back into PR/issue

When the sandbox can't run the evaluation command (e.g., bun not available), iterations are pushed as pending-ci. After CI runs the benchmark and produces a fitness number, that result never flows back into the PR body, issue status comment, or state file. The PR stays stuck showing the last sandbox-measured metric.

Fix: Either (a) add a post-CI callback step that updates the PR body/state file with the CI-measured fitness, or (b) at minimum, have the next iteration read CI results from the previous run and update the state file retroactively before proposing a new change.

Expected behavior

A reader of any Autoloop PR or program issue should be able to see at a glance:

  • Where the metric started (initial/baseline value)
  • Where it is now (current best)
  • Total improvement (absolute delta and percentage)
  • A brief improvement trajectory (e.g., in the PR body or status comment)

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions