Skip to content

feat(db): zero-downtime migration safety lint + db-migrate skill#5041

Merged
TheodoreSpeaks merged 7 commits into
stagingfrom
feat/db-migrate-skill
Jun 16, 2026
Merged

feat(db): zero-downtime migration safety lint + db-migrate skill#5041
TheodoreSpeaks merged 7 commits into
stagingfrom
feat/db-migrate-skill

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Add check:migrations lint (scripts/check-migrations-safety.ts) — a CI gate that scans newly-added migrations for backward-incompatible changes that break the previously-deployed app during the blue/green window. Tiers: hard errors (rewrite), annotate-to-acknowledge contract ops (-- migration-safe: <reason>), and backfill warnings. Scoped to new migrations via git-diff; existing corpus grandfathered.
  • Add /db-migrate skill — the judgment half: expand/contract phasing, app-code cross-reference, and authoring the annotation the lint requires.
  • Wire check:migrations into test-build.yml, and make /ship run /cleanup + the migration-safety review.

Type of Change

  • New feature (tooling / CI)

Testing

  • 20/20 unit tests (bun test scripts/check-migrations-safety.test.ts)
  • Validated against the 0236 org-scope migration (PR improvement(permissions): permission groups scoped to organization level #5035): lint flags 15 blocking issues incl. the DROP COLUMN workspace_id that caused the SlackTrigger query failure; a context-free agent following the skill independently reached NOT SAFE + the correct expand/contract split
  • bun run check:api-validation:strict passed; biome clean

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

TheodoreSpeaks and others added 3 commits June 13, 2026 18:56
Add scripts/check-migrations-safety.ts (check:migrations), a CI gate that
classifies statements in newly-added migrations into hard errors (rewrite),
annotate-to-acknowledge contract ops (`-- migration-safe: <reason>`), and
backfill warnings. Wire it into test-build.yml. Add the /db-migrate skill as
the judgment half (expand/contract phasing, app-code cross-ref, annotation
authoring).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 15, 2026 11:43pm

Request Review

@cursor

cursor Bot commented Jun 14, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes how every new DB migration is reviewed and merged in CI; mis-annotated contract drops could still ship, but the gate mainly blocks risky shapes and requires explicit acknowledgment for destructive ops.

Overview
Adds a zero-downtime migration gate for new Drizzle SQL under packages/db/migrations/: bun run check:migrations runs scripts/check-migrations-safety.ts, which git-scopes to added/changed migrations and classifies statements as hard errors (e.g. renames, non-concurrent indexes, NOT NULL without default), annotate-required contract ops (DROP COLUMN / DROP TABLE need -- migration-safe: <reason>), or non-blocking backfill warnings.

CI and workflow: test-build.yml runs the check against the PR base; root package.json exposes check:migrations. The new /db-migrate agent skill documents expand/contract, contract-pending markers in schema.ts, and when annotations are allowed; /ship runs /cleanup for UI diffs and /db-migrate + check:migrations when schema or migrations change.

Reviewed by Cursor Bugbot for commit 4c3f3fa. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.

Comment thread scripts/check-migrations-safety.ts
Comment thread scripts/check-migrations-safety.ts
Comment thread scripts/check-migrations-safety.ts
@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a zero-downtime migration safety linter (scripts/check-migrations-safety.ts) with a companion /db-migrate skill, wires the linter into CI via test-build.yml, and updates the /ship skill to run the migration review before pushing. It also ships fixes for several issues found during review: DROP INDEX is now a hard error requiring CONCURRENTLY IF EXISTS, RENAME CONSTRAINT is correctly excluded from the rename check, and the ALTER COLUMN TYPE regex no longer false-positives on string defaults containing the word TYPE.

  • Linter core (check-migrations-safety.ts): classifies SQL statements into hard errors (rewrite required), annotate-tier ops (require -- migration-safe: <reason>), and non-blocking backfill warnings; scoped to git-new migration files so the existing corpus is grandfathered.
  • DROP INDEX hardening: plain DROP INDEX is now a hard error, and DROP INDEX CONCURRENTLY without IF EXISTS is also a hard error (concurrent-drop-index-not-idempotent), symmetric with the CREATE INDEX CONCURRENTLY idempotency rule.
  • 28-test suite covers all rule tiers, annotation parsing, parser robustness (dollar-quoting, semicolons in literals), and the new DROP INDEX regression cases.

Confidence Score: 4/5

Safe to merge with one linter gap to address: ADD CONSTRAINT UNIQUE/PRIMARY KEY on existing tables currently passes CI without any check, despite taking an ACCESS EXCLUSIVE lock identical to the FK case the linter does catch.

The linter logic is solid and the new DROP INDEX CONCURRENTLY hardening is correctly implemented. The one remaining gap — ADD CONSTRAINT UNIQUE and PRIMARY KEY on existing tables not being flagged — means a real category of lock-downtime migrations can land undetected. It is an omission in a new tool, not a regression in production code, but it leaves a hole in the safety net the PR is explicitly building.

scripts/check-migrations-safety.ts — the constraint-not-valid guard needs to cover UNIQUE and PRIMARY KEY with a dedicated hard-error message pointing to the CONCURRENTLY index + USING INDEX path.

Important Files Changed

Filename Overview
scripts/check-migrations-safety.ts New migration safety linter; correctly handles DROP INDEX CONCURRENTLY idempotency, RENAME CONSTRAINT, alter-type string false-positives, and annotate-tier flows, but ADD CONSTRAINT UNIQUE/PRIMARY KEY on existing tables passes without any check — same ACCESS EXCLUSIVE lock risk as the FK case it does catch.
scripts/check-migrations-safety.test.ts 28-test suite covers additive ops, all hard-error rules (including new DROP INDEX CONCURRENTLY + IF EXISTS regression tests), annotate-tier annotation parsing, parser robustness (dollar-quoting, semicolons in strings). No test for ADD CONSTRAINT UNIQUE on an existing table.
.github/workflows/test-build.yml Adds migration safety audit step wired for PR events (using origin/base_ref with a shallow fetch) and push events (HEAD~1); the push-event path silently passes on shallow checkouts (already flagged in a previous thread).
.agents/skills/db-migrate/SKILL.md New skill document covering expand/contract phasing, per-operation playbook, and annotation authoring guidance; aligns accurately with the lint rules and migrate.ts breakpoint conventions.
.agents/skills/ship/SKILL.md Updated /ship skill to run /cleanup then /db-migrate (when migrations touched) before pre-ship checks; step numbering renumbered cleanly.
package.json Adds check:migrations script entry pointing to the new lint script; no other changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[New migration .sql files\nvia git diff vs base] --> B[parseStatements\nrespects strings / dollar-quotes]
    B --> C{classify statement}
    C -->|CREATE INDEX\nno CONCURRENTLY| E1[❌ index-not-concurrent]
    C -->|CONCURRENTLY\nno IF NOT EXISTS| E2[❌ concurrent-index-not-idempotent]
    C -->|CONCURRENTLY\nno COMMIT breakpoint| E3[❌ concurrent-index-no-commit]
    C -->|ADD COLUMN NOT NULL\nno DEFAULT| E4[❌ add-not-null-no-default]
    C -->|RENAME COLUMN/TABLE| E5[❌ rename]
    C -->|ADD CONSTRAINT FK/CHECK\nno NOT VALID| E6[❌ constraint-not-valid]
    C -->|DROP INDEX\nno CONCURRENTLY| E7[❌ drop-index-not-concurrent]
    C -->|DROP INDEX CONCURRENTLY\nno IF EXISTS| E8[❌ concurrent-drop-index-not-idempotent]
    C -->|DROP TABLE/COLUMN\nDROP DEFAULT, SET NOT NULL\nalter-type, drop-constraint| ANN[annotate-tier op]
    C -->|UPDATE / DELETE| W[⚠ data-backfill warning]
    ANN --> GA{preceding line has\nmigration-safe annotation?}
    GA -->|yes, with reason| OK[✓ pass]
    GA -->|annotation empty| E9[❌ no reason]
    GA -->|missing| E10[❌ unannotated contract op]
    C -->|ADD CONSTRAINT UNIQUE\nPRIMARY KEY| MISS[⚠️ NOT CHECKED — gap]
    style MISS fill:#f99,stroke:#c00
    style E1 fill:#fdd
    style E2 fill:#fdd
    style E3 fill:#fdd
    style E4 fill:#fdd
    style E5 fill:#fdd
    style E6 fill:#fdd
    style E7 fill:#fdd
    style E8 fill:#fdd
    style E9 fill:#fdd
    style E10 fill:#fdd
Loading

Comments Outside Diff (1)

  1. scripts/check-migrations-safety.ts, line 616-628 (link)

    P1 ADD CONSTRAINT UNIQUE and ADD CONSTRAINT PRIMARY KEY escape the lock check

    The constraint-not-valid guard filters on /\b(FOREIGN KEY|CHECK)\b/i, so ALTER TABLE "user" ADD CONSTRAINT "user_email_unique" UNIQUE("email") passes silently. Postgres builds a unique index and validates it in a single ACCESS EXCLUSIVE lock hold — identical downtime risk to an unguarded FK. Drizzle generates exactly this form for any unique() column modifier added to an existing table.

    NOT VALID isn't valid syntax for UNIQUE or PRIMARY KEY, so the correct zero-downtime path is different and needs its own hard error message: "Create a unique index first with CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS, then bind it with ADD CONSTRAINT … UNIQUE USING INDEX." Without an explicit check these two forms silently pass CI today.

Reviews (4): Last reviewed commit: "fix(db): enforce IF EXISTS on DROP INDEX..." | Re-trigger Greptile

Comment thread scripts/check-migrations-safety.ts Outdated
Comment thread scripts/check-migrations-safety.ts
Comment thread scripts/check-migrations-safety.ts Outdated
Comment thread scripts/check-migrations-safety.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

… false-positive, alter-type literal match

- Non-concurrent DROP INDEX is now a hard error (ACCESS EXCLUSIVE lock), symmetric with CREATE INDEX; DROP INDEX CONCURRENTLY after a COMMIT passes clean. Removes the false-confidence annotate path.
- RENAME rule narrowed to RENAME COLUMN / table RENAME TO; RENAME CONSTRAINT and ALTER INDEX ... RENAME (metadata-only) no longer flagged.
- alter-type regex now requires TYPE to follow the column identifier, so it no longer matches TYPE inside a string default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Thanks @greptile-apps — all three classifier findings were valid and are fixed in 84093a2:

  1. DROP INDEX lock symmetry — plain DROP INDEX is now a hard error (drop-index-not-concurrent), since it takes the same ACCESS EXCLUSIVE table lock as CREATE INDEX. DROP INDEX CONCURRENTLY after a COMMIT; breakpoint passes clean; without a preceding COMMIT it errors (can't run in a txn). The annotate-past-the-gate path is gone.
  2. RENAME CONSTRAINT false-positive — the rename rule is narrowed to RENAME COLUMN and table RENAME TO. RENAME CONSTRAINT and ALTER INDEX … RENAME (metadata-only, don't break running code) are no longer flagged.
  3. alter-type literal match — the regex now requires TYPE to immediately follow the column identifier (SET DATA TYPE / ALTER COLUMN <col> TYPE), so SET DEFAULT 'has TYPE word' no longer false-matches.

Added regression tests for each (27 tests pass). Skill playbook updated with the DROP INDEX CONCURRENTLY guidance.

@greptile review

Comment thread .github/workflows/test-build.yml
…potency

Symmetric with the CREATE INDEX CONCURRENTLY rule: a post-COMMIT DROP INDEX
CONCURRENTLY replays from the top on failure, so without IF EXISTS it aborts
re-dropping an already-gone index.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Fixed in be6d0bcDROP INDEX CONCURRENTLY now requires IF EXISTS (concurrent-drop-index-not-idempotent), symmetric with the CREATE INDEX CONCURRENTLY idempotency rule and the post-COMMIT replay rationale in migrate.ts. Added a regression test (28 pass).

@greptile review

…ion base to staging

- /ship runs /cleanup only when the diff touches UI code (.tsx or apps/sim/components|hooks|stores); the six passes are React-only.
- /ship runs check:migrations against origin/staging (the PR base).
- check:migrations default baseRef is now origin/staging instead of origin/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rops

Establishes a durable, greppable marker (`contract-pending(<precondition>): ...`)
left on the legacy column in schema.ts when an expand defers a drop, so the
contract phase doesn't rot. The outstanding-work list is `grep -rn contract-pending`;
the contract PR's `-- migration-safe:` annotation references the expand and deletes
the marker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TheodoreSpeaks TheodoreSpeaks merged commit f277f5f into staging Jun 16, 2026
15 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the feat/db-migrate-skill branch June 16, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant