From a12c546ff3e46d4b23e1c9a62cfa7f3a93111952 Mon Sep 17 00:00:00 2001 From: Markus Neusinger <2921697+MarkusNeusinger@users.noreply.github.com> Date: Tue, 19 May 2026 14:56:20 +0200 Subject: [PATCH] fix(prompts): align quality-evaluator.md with new VQ-01 + proportional rules MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit quality-evaluator.md is the standalone/offline reviewer (NOT workflow-active — ai-quality-review.md is what impl-review.yml actually invokes). The two prompts had drifted: quality-evaluator.md still demanded "Font sizes explicitly set (not defaults)" while quality-criteria.md (just updated in #7391) now says source-of-values is irrelevant and only the visual result matters. Changes: - Header note clarifies the file's standalone role and points to ai-quality-review.md as the authoritative workflow-active reviewer when the two contradict. - VQ-01 row: dropped the "(not defaults)" constraint, added mobile- readability check. - VQ-05 row: added title-width / overflow guidance. - New "Proportional sizing notes" block under the Visual Quality table: matches the rules in default-style-guide.md, ai-quality-review.md Section 5d, and quality-criteria.md VQ-01 — including the mandated- long-title exception and the "source-of-values irrelevant" principle. --- prompts/quality-evaluator.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/prompts/quality-evaluator.md b/prompts/quality-evaluator.md index 6bc768ef3a..418adc9ce7 100644 --- a/prompts/quality-evaluator.md +++ b/prompts/quality-evaluator.md @@ -1,5 +1,7 @@ # Quality Evaluator +> **Note:** The workflow-active reviewer is `prompts/workflow-prompts/ai-quality-review.md` (wired via `impl-review.yml`). This file is a standalone JSON-output reviewer kept in sync with the same rubric for offline / ad-hoc evaluations and to document the agent's role in `prompts/README.md`. When the two contradict, `ai-quality-review.md` wins because it's the one that actually runs. + ## Role You are a strict code reviewer for data visualizations. Most implementations are Python; ggplot2 is R. You evaluate plot implementations against `prompts/quality-criteria.md`. @@ -176,14 +178,22 @@ If found: `auto_reject: "AR-08"`, score = 0, stop evaluation. | ID | Criterion | Max | Key Question | |----|-----------|-----|--------------| -| VQ-01 | Text Legibility | 8 | All text readable at full size? Font sizes **explicitly set** (not defaults)? Readable in BOTH themes? | +| VQ-01 | Text Legibility | 8 | All text readable at full size AND at ~400 px mobile width? Font sizes explicitly set (regardless of whether at style-guide defaults or AI-adjusted)? Readable in BOTH themes? | | VQ-02 | No Overlap | 6 | Any overlapping text? Tick labels? Legend on data? | | VQ-03 | Element Visibility | 6 | Markers/lines adapted to data density? | | VQ-04 | Color Accessibility | 2 | Adequate contrast + CVD-safe (beyond palette choice)? No red-green as sole distinguishing signal? | -| VQ-05 | Layout & Canvas | 4 | Good proportions? Nothing cut off? | +| VQ-05 | Layout & Canvas | 4 | Good proportions? Nothing cut off? Title ≤ ~90% width, balanced axis labels, no overflow? | | VQ-06 | Axis Labels & Title | 2 | Descriptive with units? | | VQ-07 | Palette Compliance | 2 | First categorical series = `#009E73`? Multi-series follows Okabe-Ito order? Continuous data uses `viridis`/`cividis`/`BrBG`? Plot background is `#FAF8F1` (light) / `#1A1A17` (dark) — never pure white/black? Both renders theme-correct? | +**Proportional sizing notes (apply to VQ-01 / VQ-02 / VQ-05 holistically — no separate item):** +- Title comfortably ~50–70% of plot width. The mandated `{spec-id} · {lang} · {lib} · anyplot.ai` title is ~67 chars and naturally fills 70–85% at the style-guide default fontsize — **expected, not a deduction**. Only deduct if title overflows past ~90% / clips edges, or fontsize is too generous for the title length. +- Short axis labels ("Date", "Year") at oversized fontsizes that dominate the axis → deduct VQ-05. +- Long descriptive labels at sensible fontsizes are fine as long as they don't overflow. +- X/Y axis labels and tick labels should be visually similar in size (rotated long categorical labels excepted). +- Marker / line size should match data density (sparse → prominent; dense → smaller + alpha). +- Source-of-values is irrelevant: defaults, AI-tuned, or repair-loop-tuned all score equally — what matters is the visual result. + ### Step 2: Design Excellence (20 pts) | ID | Criterion | Max | Key Question |