From a12c546ff3e46d4b23e1c9a62cfa7f3a93111952 Mon Sep 17 00:00:00 2001
From: Markus Neusinger <2921697+MarkusNeusinger@users.noreply.github.com>
Date: Tue, 19 May 2026 14:56:20 +0200
Subject: [PATCH] fix(prompts): align quality-evaluator.md with new VQ-01 +
 proportional rules
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

quality-evaluator.md is the standalone/offline reviewer (NOT workflow-active —
ai-quality-review.md is what impl-review.yml actually invokes). The two
prompts had drifted: quality-evaluator.md still demanded "Font sizes
explicitly set (not defaults)" while quality-criteria.md (just updated in
#7391) now says source-of-values is irrelevant and only the visual result
matters.

Changes:
- Header note clarifies the file's standalone role and points to
  ai-quality-review.md as the authoritative workflow-active reviewer
  when the two contradict.
- VQ-01 row: dropped the "(not defaults)" constraint, added mobile-
  readability check.
- VQ-05 row: added title-width / overflow guidance.
- New "Proportional sizing notes" block under the Visual Quality table:
  matches the rules in default-style-guide.md, ai-quality-review.md
  Section 5d, and quality-criteria.md VQ-01 — including the mandated-
  long-title exception and the "source-of-values irrelevant" principle.
---
 prompts/quality-evaluator.md | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/prompts/quality-evaluator.md b/prompts/quality-evaluator.md
index 6bc768ef3a..418adc9ce7 100644
--- a/prompts/quality-evaluator.md
+++ b/prompts/quality-evaluator.md
@@ -1,5 +1,7 @@
 # Quality Evaluator
 
+> **Note:** The workflow-active reviewer is `prompts/workflow-prompts/ai-quality-review.md` (wired via `impl-review.yml`). This file is a standalone JSON-output reviewer kept in sync with the same rubric for offline / ad-hoc evaluations and to document the agent's role in `prompts/README.md`. When the two contradict, `ai-quality-review.md` wins because it's the one that actually runs.
+
 ## Role
 
 You are a strict code reviewer for data visualizations. Most implementations are Python; ggplot2 is R. You evaluate plot implementations against `prompts/quality-criteria.md`.
@@ -176,14 +178,22 @@ If found: `auto_reject: "AR-08"`, score = 0, stop evaluation.
 
 | ID | Criterion | Max | Key Question |
 |----|-----------|-----|--------------|
-| VQ-01 | Text Legibility | 8 | All text readable at full size? Font sizes **explicitly set** (not defaults)? Readable in BOTH themes? |
+| VQ-01 | Text Legibility | 8 | All text readable at full size AND at ~400 px mobile width? Font sizes explicitly set (regardless of whether at style-guide defaults or AI-adjusted)? Readable in BOTH themes? |
 | VQ-02 | No Overlap | 6 | Any overlapping text? Tick labels? Legend on data? |
 | VQ-03 | Element Visibility | 6 | Markers/lines adapted to data density? |
 | VQ-04 | Color Accessibility | 2 | Adequate contrast + CVD-safe (beyond palette choice)? No red-green as sole distinguishing signal? |
-| VQ-05 | Layout & Canvas | 4 | Good proportions? Nothing cut off? |
+| VQ-05 | Layout & Canvas | 4 | Good proportions? Nothing cut off? Title ≤ ~90% width, balanced axis labels, no overflow? |
 | VQ-06 | Axis Labels & Title | 2 | Descriptive with units? |
 | VQ-07 | Palette Compliance | 2 | First categorical series = `#009E73`? Multi-series follows Okabe-Ito order? Continuous data uses `viridis`/`cividis`/`BrBG`? Plot background is `#FAF8F1` (light) / `#1A1A17` (dark) — never pure white/black? Both renders theme-correct? |
 
+**Proportional sizing notes (apply to VQ-01 / VQ-02 / VQ-05 holistically — no separate item):**
+- Title comfortably ~50–70% of plot width. The mandated `{spec-id} · {lang} · {lib} · anyplot.ai` title is ~67 chars and naturally fills 70–85% at the style-guide default fontsize — **expected, not a deduction**. Only deduct if title overflows past ~90% / clips edges, or fontsize is too generous for the title length.
+- Short axis labels ("Date", "Year") at oversized fontsizes that dominate the axis → deduct VQ-05.
+- Long descriptive labels at sensible fontsizes are fine as long as they don't overflow.
+- X/Y axis labels and tick labels should be visually similar in size (rotated long categorical labels excepted).
+- Marker / line size should match data density (sparse → prominent; dense → smaller + alpha).
+- Source-of-values is irrelevant: defaults, AI-tuned, or repair-loop-tuned all score equally — what matters is the visual result.
+
 ### Step 2: Design Excellence (20 pts)
 
 | ID | Criterion | Max | Key Question |