Skip to content

feat(plotnine): implement timeseries-forecast-uncertainty#7399

Merged
MarkusNeusinger merged 5 commits into
mainfrom
implementation/timeseries-forecast-uncertainty/plotnine
May 19, 2026
Merged

feat(plotnine): implement timeseries-forecast-uncertainty#7399
MarkusNeusinger merged 5 commits into
mainfrom
implementation/timeseries-forecast-uncertainty/plotnine

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Implementation: timeseries-forecast-uncertainty - python/plotnine

Implements the python/plotnine version of timeseries-forecast-uncertainty.

File: plots/timeseries-forecast-uncertainty/implementations/python/plotnine.py

Parent Issue: #3188


🤖 impl-generate workflow

github-actions Bot added 2 commits May 19, 2026 13:17
Regen from quality 92. Addressed:
- DE-01: Added INK_MUTED + ELEVATED_BG tokens, plot_subtitle theme element
- DE-03: Added annotate() label + labs(subtitle) for clearer narrative
- LM-02: Used annotate() and labs(subtitle) as distinctive plotnine features; element_blank() for minor grid
- Fixed title to include python language token
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 19, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): The plot renders on a warm off-white (#FAF8F1) background. The title "timeseries-forecast-uncertainty · python · plotnine · anyplot.ai" is displayed at the top in dark ink, spanning approximately 75% of the plot width — expected for the long mandated format. A subtitle explains the confidence bands. A legend at the top shows "series" as the header (the column name, not blank) with a green solid line for "Historical" and an orange dashed line for "Forecast". The y-axis "Electricity Demand (GWh)" runs from ~90 to 180, and the x-axis "Date" spans Jan 2022 to Jan 2026 with 45°-rotated date labels. A green (#009E73) solid line covers the historical period (Jan 2022–Jan 2025), a dashed orange (#D55E00) line covers the forecast period (Jan 2025–Jan 2026), flanked by two nested shaded bands (80%: more opaque orange, 95%: lighter outer band). A dashed vertical line marks the forecast start with a "Forecast period" annotation above. A thin panel border box frames the plot area. All text is clearly readable against the light background.

Dark render (plot-dark.png): The same plot on a warm near-black (#1A1A17) background. The title, subtitle, axis labels, and tick labels are all rendered in light/cream colors, clearly readable against the dark surface. The historical line remains #009E73 (identical to the light render). The forecast confidence bands render as darker, more saturated brown-amber tones on the dark background due to the semi-transparent fill over a dark surface — visually heavier than the light render but still clearly showing two distinct bands. The dashed forecast line and the vertical separator remain visible. No dark-on-dark text failures observed; all chrome elements correctly flip to light-on-dark. Data colors are consistent with the light render.

Score: 86/100

Category Score Max
Visual Quality 29 30
Design Excellence 12 20
Spec Compliance 13 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 7 10
Total 86 100

Visual Quality (29/30)

  • VQ-01: Text Legibility (7/8) — All font sizes explicitly set and readable in both themes. Minor deduction: the axis title and tick label sizes (20pt and 16pt on a 16×9 figure at 300 dpi) are well-proportioned, but the canvas is 4800×2700 px rather than the standard 3200×1800; not a blocking issue but slightly off-spec.
  • VQ-02: No Overlap (6/6) — Rotated x-axis labels, top legend, and annotation avoid all collisions.
  • VQ-03: Element Visibility (6/6) — Historical solid line, dashed forecast line, and nested confidence bands all clearly visible in both renders.
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito palette, CVD-safe. Bands use alpha differentiation.
  • VQ-05: Layout & Canvas (4/4) — Good proportions, data fills the canvas well, no cut-off.
  • VQ-06: Axis Labels & Title (2/2) — Y-axis has units (GWh); x-axis is "Date".
  • VQ-07: Palette Compliance (2/2) — First series #009E73, second #D55E00, correct backgrounds (#FAF8F1 / #1A1A17), theme-adaptive chrome throughout.

Design Excellence (12/20)

  • DE-01: Aesthetic Sophistication (5/8) — Above a generic default: intentional two-color scheme, nested bands, dashed-vs-solid distinction, and vertical separator create visual structure. Not yet publication-ready (no spine removal, panel border creates a box frame, annotation placement is functional but plain).
  • DE-02: Visual Refinement (3/6) — Grid is very subtle (alpha=0.1, major only) which is good. However, draws a box around all four sides; the style guide recommends removing top/right spines for an L-shaped frame instead. Legend frame is set to ELEVATED_BG which is correct.
  • DE-03: Data Storytelling (4/6) — The dashed vertical line + annotation clearly signal the forecast transition. Nested bands communicating expanding uncertainty tell a meaningful story. Color contrast between green historical and orange forecast creates a clear focal point. Visual hierarchy is present and effective.

Spec Compliance (13/15)

  • SC-01: Plot Type (5/5) — Correct: time series with historical line, forecast line, confidence bands, and vertical separator.
  • SC-02: Required Features (3/4) — Solid historical line ✓, dashed forecast line ✓, vertical line at forecast start ✓, nested 80%/95% bands ✓, semi-transparent fills ✓. Minor omission: the confidence band levels (80%, 95%) are explained only in the subtitle, not labeled in the legend — the spec explicitly asks for a legend that identifies confidence levels.
  • SC-03: Data Mapping (3/3) — Dates on x-axis, Electricity Demand (GWh) on y-axis; all data visible.
  • SC-04: Title & Legend (2/3) — Title format is exactly correct. The legend title displays "series" (the dataframe column name) rather than being suppressed to empty. in does not override the legend title in plotnine — requires or .

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Shows all required aspects: historical trend with seasonality and noise, expanding uncertainty, two confidence levels, clear historical/forecast split.
  • DQ-02: Realistic Context (5/5) — Monthly electricity demand is a neutral, real-world forecasting scenario.
  • DQ-03: Appropriate Scale (4/4) — Values (100–180 GWh), seasonality amplitude, trend, and widening CI are all plausible.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Flat structure: imports → tokens → data → plot → save.
  • CQ-02: Reproducibility (2/2) — .
  • CQ-03: Clean Imports (2/2) — All imported names are used.
  • CQ-04: Code Elegance (2/2) — Clean, Pythonic, appropriate complexity.
  • CQ-05: Output & API (1/1) — Saves as using current API.

Library Mastery (7/10)

  • LM-01: Idiomatic Usage (4/5) — Grammar of graphics layering, scale_color_manual, scale_x_datetime with date_breaks/date_labels, and multi-dataframe geoms are all idiomatic plotnine. Slight deduction for the legend title not being properly suppressed.
  • LM-02: Distinctive Features (3/5) — for confidence bands is a key ggplot2/plotnine grammar feature not replicated simply in matplotlib. Multi-dataframe geom layering is idiomatic. Not quite 5/5 as no advanced plotnine-specific features (scales, guides, stats) are pushed.

Score Caps Applied

  • None

Strengths

  • Correct Okabe-Ito palette throughout (Historical = #009E73, Forecast = #D55E00)
  • Nested confidence bands with clear alpha differentiation (0.15/0.30) communicate expanding uncertainty naturally
  • Vertical dashed separator + annotation creates a clean forecast-start marker
  • Full theme adaptation: all chrome elements flip correctly between light and dark
  • Clean, reproducible flat code structure with excellent data quality
  • Realistic electricity demand scenario with plausible seasonality and trend

Weaknesses

  • draws a box around all four sides; replace with and explicitly remove top/right spines to achieve the L-shaped frame recommended by the style guide
  • Legend title shows "series" (the column name) instead of being suppressed; fix with in addition to in scale_color_manual
  • Confidence band levels (80% and 95%) are not labeled in the legend; add manual annotation near the bands (e.g., text labels "80%" and "95%" on the right edge of each band) or add legend entries for the ribbons

Issues Found

  1. DE-02 LOW (3/6): Panel border creates a four-sided box frame instead of an L-shaped spine. The style guide recommends removing top and right spines.
    • Fix: Replace with . Add if not already present to keep the bottom/left lines visible.
  2. SC-04 PARTIAL (2/3): Legend title shows the raw column name "series" instead of being hidden.
    • Fix: Add to the call.
  3. SC-02 PARTIAL (3/4): Confidence interval levels not labeled in the legend.
    • Fix: Add text annotations "95% CI" and "80% CI" positioned at the right edge of each ribbon near the final forecast date, using .

AI Feedback for Next Attempt

Fix three targeted issues: (1) Replace with to remove the box frame and let theme_minimal give an L-shaped spine look. (2) Add inside to suppress the "series" label. (3) Add small text annotations ("95% CI", "80% CI") near the right edge of the confidence bands to explicitly label them. These three fixes address DE-02, SC-04, and SC-02 respectively and should push the score above 90.

Verdict: REJECTED

@github-actions github-actions Bot added quality:86 Quality score 86/100 ai-rejected Quality not OK, triggers update labels May 19, 2026
@github-actions github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 1/4

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 19, 2026

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The plot displays on a warm off-white (#FAF8F1) background. The title "timeseries-forecast-uncertainty · python · plotnine · anyplot.ai" renders in dark ink across ~80% of the canvas width at fontsize=24. A descriptive subtitle explains the confidence intervals. The legend at top shows "Historical" (green #009E73 icon) and "Forecast" (orange #D55E00 icon). The historical period (Jan 2022 – Jan 2025) shows a solid green line with realistic trend and seasonality. The forecast period (Feb 2025 – Jan 2026) features a dashed orange line flanked by nested confidence bands: a lighter outer 95% CI band and a slightly darker inner 80% CI band, both in the same orange fill (#D55E00) with different alpha levels (0.15 / 0.30). A dashed vertical line marks the forecast boundary, with a "Forecast period" text label above. CI labels ("95% CI", "80% CI") appear within the shaded area. The x-axis tick labels are rotated 45° and span Jan 2022 to Jan 2026 in 6-month steps. The y-axis shows "Electricity Demand (GWh)" with values ~95–175. All text is clearly readable against the light background.

Dark render (plot-dark.png): The same layout renders on a near-black (#1A1A17) background. Title, subtitle, axis labels, tick labels, and annotations all appear in appropriate light-colored ink — no dark-on-dark failures observed. The historical green line (#009E73) is identical to the light render. The forecast orange (#D55E00) and CI bands retain the same hue and alpha; the darker background makes the CI bands appear as warmer brown-orange tones, which is expected for this alpha level over the dark surface. The "Forecast period" and CI labels remain readable. Legend text is light-colored against the elevated dark background (#242420). No theme adaptation failures detected.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 89/100

Category Score Max
Visual Quality 29 30
Design Excellence 12 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 8 10
Total 89 100

Visual Quality (29/30)

  • VQ-01: Text Legibility (7/8) — All font sizes explicitly set; plot is readable in both themes. Minor deduction: axis_title=20 is proportionally large for the short label "Date" (style guide suggests ~12pt for short labels); title and tick labels are otherwise well-proportioned.
  • VQ-02: No Overlap (6/6) — No overlapping elements; rotated tick labels are clean and spaced well.
  • VQ-03: Element Visibility (6/6) — Lines at size=1.5 are well-visible; CI ribbons at alpha=0.15/0.30 create a clear visual layering without obscuring the forecast line.
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito green/orange are CVD-safe and clearly distinguishable.
  • VQ-05: Layout & Canvas (4/4) — Canvas utilisation is excellent; 75%/25% historical/forecast split feels natural for the data.
  • VQ-06: Axis Labels & Title (2/2) — Y-axis "Electricity Demand (GWh)" includes units; X-axis "Date" is appropriate for a datetime axis.
  • VQ-07: Palette Compliance (2/2) — First series = #009E73 ✓; second = #D55E00 ✓; backgrounds #FAF8F1 / #1A1A17 ✓; data colors identical across themes ✓.

Design Excellence (12/20)

  • DE-01: Aesthetic Sophistication (5/8) — Above the "well-configured default" level: nested CI bands with distinct alpha values, dashed forecast line, subtitle, "Forecast period" annotation, and on-band CI labels show intentional design. Not yet at FiveThirtyEight/publication level — typography hierarchy could be sharper and colour coordination more deliberate.
  • DE-02: Visual Refinement (3/6) — Grid is subtle (alpha=0.10, minor grid removed). However, panel_border=element_rect(...) adds back a full four-sided border around the panel, contradicting the style-guide recommendation to keep only the L-shaped frame (remove top and right spines). The result still looks reasonably clean but is a notable missed refinement.
  • DE-03: Data Storytelling (4/6) — The visual transition from solid to dashed line at the forecast boundary, expanding bands, and "Forecast period" annotation create a coherent narrative about growing uncertainty. A clearer focal emphasis (e.g., highlighted near-term vs. long-term uncertainty) would push this further.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct: time-series with forecast line and nested confidence bands.
  • SC-02: Required Features (4/4) — Historical line ✓, dashed forecast line ✓, 80% CI band ✓, 95% CI band ✓, vertical forecast-boundary marker ✓, legend ✓, CI labels on bands ✓.
  • SC-03: Data Mapping (3/3) — X=date, Y=GWh; both historical (36 months) and forecast (12 months) periods fully visible.
  • SC-04: Title & Legend (3/3) — Title format is timeseries-forecast-uncertainty · python · plotnine · anyplot.ai ✓; legend labels "Historical" / "Forecast" match series names ✓.

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Shows trend + seasonality in historical period, continuity at forecast start, expanding CI bands demonstrating horizon-dependent uncertainty — all key aspects of the plot type.
  • DQ-02: Realistic Context (5/5) — Monthly electricity demand forecasting is a real, neutral, business-relevant scenario.
  • DQ-03: Appropriate Scale (4/4) — GWh values (~95–175) are plausible for a medium-sized utility region; CI widths expand realistically with forecast horizon.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Imports → data generation → plot construction → save; no functions or classes.
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set.
  • CQ-03: Clean Imports (2/2) — All plotnine symbols imported are used; no stray imports.
  • CQ-04: Code Elegance (2/2) — Idiomatic layered construction; no over-engineering; no fake interactivity.
  • CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png ✓.

Library Mastery (8/10)

  • LM-01: Idiomatic Usage (5/5) — Full grammar-of-graphics idiom: layered geom composition, aes() mappings, scale_color_manual(), scale_x_datetime() with date_breaks/date_labels, theme customisation via theme(). Expert plotnine usage.
  • LM-02: Distinctive Features (3/5) — geom_ribbon() with layered alpha values is a signature ggplot2/plotnine feature for uncertainty bands. scale_x_datetime(date_breaks=..., date_labels=...) uses plotnine's native datetime scale formatting. Doesn't quite reach 5 (would need something like facet_wrap or a particularly novel plotnine feature).

Score Caps Applied

  • None — all cap conditions clear (VQ-02≠0, VQ-03≠0, SC-01≠0, DQ-02≠0, DE-01=5>2, CQ-04≠0).

Strengths

  • Perfect spec compliance: all required elements (historical line, dashed forecast, vertical boundary, nested 80%/95% CI bands, legend, CI labels, subtitle) are present and working
  • Excellent data quality: realistic electricity demand context, expanding uncertainty bands, trend+seasonality in historical data
  • Clean idiomatic plotnine code with proper theme-adaptive tokens throughout
  • Both renders pass theme-readability check with no dark-on-dark or light-on-light failures
  • geom_ribbon() layered with distinct alpha values is the correct plotnine idiom for this plot type

Weaknesses

  • panel_border=element_rect(color=INK_SOFT, fill=None, size=0.5) retains a full four-sided box around the panel — remove this and let theme_minimal() handle border removal; or use axis_line only to keep the L-shaped frame (left + bottom axes only)
  • Axis title fontsize=20 is disproportionately large for the short label "Date" — reduce axis_title to size=13 and increase axis_text to size=14 for better balance
  • dpi=300 in save call; library guide recommends dpi=400

Issues Found

  1. DE-02 MODERATE: panel_border overrides theme_minimal()'s spine removal, adding back a full rectangular border
    • Fix: Remove the panel_border element or set panel_border=element_blank() to restore the L-shaped frame
  2. VQ-01 MINOR: Axis title size=20 vs axis text size=16 — short label "Date" dominated by oversized title font
    • Fix: Set axis_title=element_text(size=13) and axis_text=element_text(size=14) for better label/tick balance

AI Feedback for Next Attempt

Remove panel_border (set to element_blank() or delete) to restore the clean L-shaped spine that theme_minimal() produces. Rebalance axis font sizes: axis_title=13 and axis_text=14 works better for short labels at this canvas size. These two changes would push DE-02 to 5/6 and VQ-01 to 8/8, improving the total to ~93+.

Verdict: APPROVED

@github-actions github-actions Bot added quality:89 Quality score 89/100 ai-approved Quality OK, ready for merge and removed quality:86 Quality score 86/100 labels May 19, 2026
@MarkusNeusinger MarkusNeusinger merged commit b4665b8 into main May 19, 2026
@MarkusNeusinger MarkusNeusinger deleted the implementation/timeseries-forecast-uncertainty/plotnine branch May 19, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt quality:89 Quality score 89/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant