docs: add agentic CI plan for automated PR reviews and daily maintenance by andreatgretel · Pull Request #473 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2026-03-30T13:53:51Z

📋 Summary

Plan for adding an agentic CI layer to the repo: GitHub Actions workflows that run Claude Code or Codex on a self-hosted runner to review PRs and run daily tech debt maintenance. Closes #472.

🔄 Changes

✨ Added

plans/472/agentic-ci-plan.md - full plan covering:
- Recipe format and directory layout (.agents/recipes/)
- PR review recipe composing the existing review-code skill
- Five daily maintenance suites rotating Mon-Fri: docs-and-references, dependencies, structure, code-quality, test-health
- Runner memory for dedup and delta tracking across runs
- Security model: collaborator-only triggers, minimal permissions, prompt injection mitigations
- Phased rollout (4 phases)

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

plans/472/agentic-ci-plan.md - This is a plan-only PR (no implementation). Review for feasibility, security concerns, and whether the suite coverage and phasing make sense for the team.

🤖 Generated with AI

Closes #472

eric-tramel

Excellent initiative, let's do it 🚀

eric-tramel · 2026-03-30T14:38:35Z

plans/472/agentic-ci-plan.md

+
+Constraints:
+- Only runs on non-draft PRs
+- Skips if the PR only touches docs/markdown (configurable per recipe)


We should probably have agent workflow reviews in this case, too (e.g. keeping docs in sync, making sure the edits are faithful to the codebase.

eric-tramel · 2026-03-30T14:40:31Z

plans/472/agentic-ci-plan.md

+Running all suites every day is technically possible (stagger them hourly, e.g.,
+06:00-10:00 UTC) and would surface issues faster. But it creates up to five
+PR/issue streams per day, which risks becoming noise the team tunes out - the
+opposite of the goal. One suite per day keeps the output digestible and gives
+each finding proper attention.


Once we have trust in the system, especially crossing the Rubicon on automated merging of simple edits (e.g. fix a doc, etc), then increasing the rate of review will be valuable.

eric-tramel · 2026-03-30T14:43:04Z

plans/472/agentic-ci-plan.md

+
+#### Wednesday / structure
+
+Enforces the multi-package layering that makes DataDesigner work.


Could also include the import / cli bootup time checks that @nabinchha had added previously.

We have a test for it already, so it should be covered...?

eric-tramel · 2026-03-30T14:46:13Z

plans/472/agentic-ci-plan.md

+| Wed       | structure          | import boundaries, circular deps, dead exports        |
+| Thu       | code-quality       | complexity, exception hygiene, type gaps, TODO aging  |
+| Fri       | test-health        | coverage deltas, hollow tests, fixtures, smoke tests  |
+| Sat/Sun   | off                | -                                                     |


We should reserve Sat/Sun for longer performance benchmarking and AI-QA tests. This isn't fully built out yet, but that will be the natural thing to add to the automation once it is (e.g. measuring mocked execution times, memory overhead, hotspot detection).

Additionally, AI-QA test would be: let the agent go through and try to construct SDG workflows and then execute them, record friction and problems.

nabinchha · 2026-03-30T18:06:15Z

Nice work on this one, @andreatgretel — this is a thorough and well-structured plan. Here are my thoughts.

Summary

This PR adds a comprehensive plan for introducing agentic CI to DataDesigner: GitHub Actions workflows that run Claude Code or Codex on a self-hosted runner to perform automated PR reviews and rotating daily maintenance audits. The plan covers architecture (recipe format, directory layout), security (prompt injection, minimal permissions), phased rollout, and runner memory for cross-run dedup. The implementation matches the stated intent in the PR description and closes #472.

Findings

Warnings — Worth addressing

plans/472/agentic-ci-plan.md:189 — PR review skipping docs/markdown PRs may miss important changes

What: The PR review workflow constraint says "Skips if the PR only touches docs/markdown (configurable per recipe)." For a project that treats documentation as a first-class artifact (architecture docs, AGENTS.md, STYLEGUIDE.md, skills), skipping agent review on docs-only PRs could miss broken cross-references, stale guidance, or inconsistencies with code.
Why: Eric's inline comment on this line raises a related point — agent reviews should also verify docs stay in sync with the codebase. Skipping docs PRs entirely works against that goal.
Suggestion: Default to reviewing docs PRs as well, but with a lighter recipe variant (skip linting, focus on link validity and consistency with code). The skip behavior could be reserved for trivial changes like typo fixes, gated by a label (e.g., skip-agent-review) rather than by file type.

plans/472/agentic-ci-plan.md:365-369 — Memory committed to a branch creates merge friction

What: The plan proposes committing runner memory (.agents/memory/) to a long-lived agentic-ci/state branch that rebases on main before each run.
Why: Rebasing a branch with frequent automated commits against an active main branch is a common source of CI failures — merge conflicts in the JSON state file, failed rebases when main moves fast, and noisy commit history. The state branch also needs its own CI exemption (no lint, no tests on memory commits) or it'll trigger the full pipeline on every update.
Suggestion: Could we evaluate GitHub Actions cache or a simple artifact-based approach as the primary storage, with the committed branch as a fallback for auditability? The plan already mentions this as an alternative — it might be worth making it the default and keeping the branch approach as the transparent-audit option. Alternatively, a single runner-state.json file on main updated via a dedicated bot PR (squashed, auto-merged) would avoid the long-lived branch entirely.

Suggestions — Take it or leave it

plans/472/agentic-ci-plan.md:569-575 — Open questions could address cost controls

What: The open questions section covers flaky tests and dry-run mode, but doesn't mention cost/budget controls — a practical concern for daily agent runs against paid model APIs.
Why: Each suite run consumes tokens. Without a budget cap or cost tracking, a runaway recipe (e.g., one that reads the entire codebase on every run) could generate unexpected bills. The audit trail section mentions logging token usage, but there's no discussion of limits or alerts.
Suggestion: Add an open question about cost guardrails: per-run token budget, monthly spend alerts, or automatic recipe disabling if cost exceeds a threshold. This is especially relevant for Phase 2+ when five suites run weekly.

plans/472/agentic-ci-plan.md:1-6 — Frontmatter could include a status field

What: The plan frontmatter has date and authors but no status indicator. Other plans in the repo (e.g., plans/427/) follow the same pattern.
Why: As the plan moves through phases, it would be useful to know at a glance whether it's "proposed", "accepted", "in-progress", or "completed" without reading the full document or checking the PR state.
Suggestion: Consider adding status: proposed to the frontmatter. This is a minor convention that could be adopted across all plans if the team finds it useful.

What Looks Good

Skill composition model is exactly right. The decision to have recipes invoke existing skills rather than duplicating review logic is the strongest design choice in the plan. It means the review-code skill remains the single source of truth, and improvements flow to both interactive and CI usage automatically. The clear separation — "recipes own when/how, skills own what" — is clean and maintainable.
Security section is unusually thorough for a plan document. The prompt injection surface analysis with per-input-type risk/mitigation, the explicit pull_request vs pull_request_target guidance, and the YOLO mode hardening constraints show real thought about the threat model. This will serve as a solid reference when implementing the workflows.
The rotation rationale is well-argued. Rather than just stating "one suite per day," the plan explains why (noise management, attention budget) and then lists four concrete alternatives with trade-offs for each. This makes it easy for the team to revisit the decision later with full context.

Verdict

Needs changes — The memory storage approach and the docs-skip behavior are worth resolving before merge. None of these require major restructuring — they're refinements to an already solid plan.

johnnygreco · 2026-03-31T16:56:00Z

plans/472/agentic-ci-plan.md

+| Wed       | structure          | import boundaries, circular deps, dead exports        |
+| Thu       | code-quality       | complexity, exception hygiene, type gaps, TODO aging  |
+| Fri       | test-health        | coverage deltas, hollow tests, fixtures, smoke tests  |
+| Sat/Sun   | off                | -                                                     |


Weekends seem like a good time for agents to be busy. Something to think about as we evolve this.

johnnygreco · 2026-03-31T16:57:17Z

plans/472/agentic-ci-plan.md

+Keeps the dependency graph healthy and secure.
+
+- **Version pinning audit**: compare pinned versions in all three `pyproject.toml`
+  files against latest available. Prefer strict pins (`==`) over loose (`>=`) for


strict pins are tricky, since we also want to balance with UX/DX, though the recent litellm issue should give us pause

johnnygreco · 2026-03-31T17:02:44Z

plans/472/agentic-ci-plan.md

+
+---
+
+### Suite Details


The proposed suites cover code and docs, which is great. Perhaps we should also create a suite for repo maintenance like analyzing open issues and PRs and creating a report for us to review each week. Eventually, we'll add issue solving to the list too.

docs: add agentic CI plan for automated PR reviews and daily maintenance

ac4909c

Closes #472

eric-tramel previously approved these changes Mar 30, 2026

View reviewed changes

docs: add API configuration and auth modes to agentic CI plan

10f609f

andreatgretel dismissed eric-tramel’s stale review via 10f609f March 30, 2026 16:07

docs: add PoC lessons and operational details to agentic CI plan

df859fb

andreatgretel added 4 commits March 30, 2026 19:47

docs: add runner label targeting to agentic CI plan

f4d75d8

docs: add re-review label and workflow_dispatch triggers to PR review

d524686

docs: rename runner label to agentic-ci

83bd765

docs: add check run as gate for PR review, output stays as comment

5690420

johnnygreco reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add agentic CI plan for automated PR reviews and daily maintenance#473

docs: add agentic CI plan for automated PR reviews and daily maintenance#473
andreatgretel wants to merge 7 commits intomainfrom
andreatgretel/feat/agentic-ci

andreatgretel commented Mar 30, 2026 •

edited

Loading

Uh oh!

eric-tramel left a comment

Uh oh!

eric-tramel Mar 30, 2026

Uh oh!

eric-tramel Mar 30, 2026

Uh oh!

eric-tramel Mar 30, 2026

Uh oh!

nabinchha Mar 30, 2026

Uh oh!

eric-tramel Mar 30, 2026

Uh oh!

nabinchha commented Mar 30, 2026

Uh oh!

johnnygreco Mar 31, 2026

Uh oh!

johnnygreco Mar 31, 2026

Uh oh!

johnnygreco Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		#### Wednesday / structure

		Enforces the multi-package layering that makes DataDesigner work.


		---

		### Suite Details

Conversation

andreatgretel commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🔄 Changes

✨ Added

🔍 Attention Areas

Uh oh!

eric-tramel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nabinchha commented Mar 30, 2026

Summary

Findings

Warnings — Worth addressing

Suggestions — Take it or leave it

What Looks Good

Verdict

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andreatgretel commented Mar 30, 2026 •

edited

Loading