-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat(db): zero-downtime migration safety lint + db-migrate skill #5041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+802
−6
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
7a3ba6a
feat(db): zero-downtime migration safety lint + db-migrate skill
TheodoreSpeaks 47b9894
Merge remote-tracking branch 'origin/staging' into feat/db-migrate-skill
TheodoreSpeaks 0b278d3
feat(skills): run cleanup and db-migrate safety checks in /ship
TheodoreSpeaks 84093a2
fix(db): address review — DROP INDEX lock symmetry, RENAME CONSTRAINT…
TheodoreSpeaks be6d0bc
fix(db): enforce IF EXISTS on DROP INDEX CONCURRENTLY for replay idem…
TheodoreSpeaks a986f58
improvement(skills): gate /ship cleanup on UI changes; default migrat…
TheodoreSpeaks 4c3f3fa
docs(db-migrate): add contract-pending TODO convention for deferred d…
TheodoreSpeaks File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| --- | ||
| name: db-migrate | ||
| description: Author or review a Drizzle DB migration for zero-downtime safety — expand/contract phasing, backward-compatibility with the deployed app version, and writing the `-- migration-safe` acknowledgment the check:migrations lint requires. Use when adding/editing files under `packages/db/migrations/` or changing `packages/db/schema.ts`. | ||
| --- | ||
|
|
||
| # DB Migrate Skill | ||
|
|
||
| You make schema changes that survive a deploy without downtime. The `check:migrations` lint (`scripts/check-migrations-safety.ts`) is the deterministic gate; you are the judgment that decides whether a flagged change is actually safe and writes the annotation that satisfies it. | ||
|
|
||
| ## The window (why this matters) | ||
|
|
||
| A deploy runs the migration, then rolls out the new app image via blue/green. The two are **not atomic and cannot be** — during cutover the old task set keeps serving against the **already-migrated** schema. So: | ||
|
|
||
| > Every migration must be backward-compatible with the app version that is *already deployed*. | ||
|
|
||
| If a migration drops a column the old code still reads, renames one, or adds a `NOT NULL` the old inserts don't populate, the old code throws until traffic fully shifts — the downtime we're guarding against. You can't fix this by reordering the pipeline; the only fix is discipline. | ||
|
|
||
| ## Expand / contract | ||
|
|
||
| Split every breaking change across **two deploys**: | ||
|
|
||
| 1. **Expand** (this PR): additive, backward-compatible schema + code that tolerates *both* the old and new shape. | ||
| 2. **Contract** (a later PR, after expand is fully deployed): remove the old thing, now that nothing reads it. | ||
|
|
||
| Never put expand and contract in the same PR. If this PR both removes the code that used a column *and* drops the column, the old code is still live during cutover — split it. | ||
|
|
||
| ### Per-operation playbook | ||
|
|
||
| | You want to | Do (deploy 1 = expand) | Do (deploy 2 = contract) | | ||
| |---|---|---| | ||
| | Add a required column | `ADD COLUMN` nullable or `DEFAULT`; code writes it | backfill, then `SET NOT NULL` | | ||
| | Rename a column/table | add the new name; code dual-writes / reads new-then-old | drop the old name | | ||
| | Drop a column/table | stop all reads/writes in code; ship it | `DROP` (annotate) | | ||
| | Change a column type | add a new column of the new type; dual-write | backfill, swap reads, drop old | | ||
| | Add FK / CHECK | `ADD CONSTRAINT ... NOT VALID` | `VALIDATE CONSTRAINT` separately | | ||
| | Index an existing table | `COMMIT;` breakpoint → `SET lock_timeout = 0` → `CREATE INDEX CONCURRENTLY IF NOT EXISTS` (see `packages/db/scripts/migrate.ts`) | — | | ||
| | Drop an index | `COMMIT;` breakpoint → `DROP INDEX CONCURRENTLY` — plain `DROP INDEX` takes ACCESS EXCLUSIVE on the table | — | | ||
| | Backfill data | batched + idempotent `UPDATE` (keyset/`WHERE`, bounded) | — | | ||
|
|
||
| A `CREATE INDEX`, `ADD COLUMN`, or `ADD CONSTRAINT` against a table **created in the same migration** is always safe (no rows, no live traffic) — the lint already suppresses those. | ||
|
|
||
| ## Tracking the contract (don't let it rot) | ||
|
|
||
| The contract half is deferred to a later deploy — and that is exactly when it gets forgotten, leaving dead columns, orphaned tables, and `NOT NULL`s that never land. Every deferred contract must become a durable, greppable TODO. | ||
|
|
||
| When an expand defers a drop, leave a **`contract-pending`** marker on the legacy column/table in `packages/db/schema.ts` — that is the file you will be editing when you finally do the drop, so the reminder lives where the work happens: | ||
|
|
||
| ```ts | ||
| // contract-pending(after #5035 is fully deployed): drop once permission-check.ts stops reading it | ||
| workspaceId: text('workspace_id'), | ||
| ``` | ||
|
|
||
| Format: `contract-pending(<precondition>): <what to drop> — <why it's safe once the precondition holds>`. The precondition names the PR/release that removes the last reader and **must be fully deployed** before the contract ships. | ||
|
|
||
| - **The TODO list is a grep** — always accurate, never drifts: `grep -rn "contract-pending" packages/db apps/sim`. Run it when starting migration work to see what is owed. | ||
| - For anything with a real owner or schedule, also open a tracking issue and put its number in the marker. | ||
| - **Close the loop in the contract PR:** the contract migration's `-- migration-safe:` annotation references the expand, and you **delete the `contract-pending` marker** in the same PR: | ||
| ```sql | ||
| -- migration-safe: contract of #5035 — workspace_id readers removed there, deployed 2026-06-10 | ||
| ALTER TABLE "permission_group" DROP COLUMN "workspace_id"; | ||
| ``` | ||
| - An expand merged **without** a marker for the drop it defers, or a contract merged **without** removing its marker, is a bug — flag it in review. | ||
|
|
||
| ## The judgment the lint can't do | ||
|
|
||
| The lint flags risky *shapes*; it cannot know whether a given drop is *safe right now*. For each flagged statement, do the work it can't: | ||
|
|
||
| 1. **Is the dependency gone?** Grep the app for the table/column: search `apps/sim` and `packages` for the column name, the Drizzle field (camelCase), and the table object. If any live read/write remains, it is **not** safe — fix the code first. | ||
| 2. **Did the expand already ship?** The removal of that read/write must be in a deploy that is *already out*, not this same PR. If it's in this PR, split: land the code change now, do the destructive migration in a follow-up after it deploys. | ||
| 3. **Backfills:** confirm the `UPDATE`/`DELETE` is batched (bounded `WHERE`/keyset, not a single whole-table statement), idempotent (safe to replay — a failed migration re-runs unjournaled files from the top), and safe under concurrent writes from the still-live old app. | ||
|
|
||
| ## Workflow | ||
|
|
||
| 1. Edit `packages/db/schema.ts`, then `cd packages/db && bunx drizzle-kit generate` to produce the SQL. If this is an expand that defers a drop, leave a `contract-pending` marker on the legacy column (see "Tracking the contract"). If this is the contract, delete the marker it resolves. | ||
| 2. Hand-edit the generated SQL where the playbook requires it: `CONCURRENTLY` + `COMMIT;` breakpoint for indexes on existing tables, `NOT VALID` for constraints, batching for backfills. | ||
| 3. Run `bun run check:migrations` (base defaults to `origin/staging`). | ||
| - **Hard errors** (`add-not-null-no-default`, `rename`, `index-not-concurrent`, `constraint-not-valid`, …): rewrite into expand/contract. Do **not** try to annotate them away — the lint won't accept it. | ||
| - **Annotate tier** (`drop-table`, `drop-column`, `drop-default`, `set-not-null`, `alter-type`, `drop-index`): only after you've confirmed steps 1–3 above, add a comment on the line directly above the statement: | ||
| ```sql | ||
| -- migration-safe: `secret` read removed in v0.6.1 (#1234), shipped two deploys ago | ||
| ALTER TABLE "webhook" DROP COLUMN "secret"; | ||
| ``` | ||
| The reason must be specific and name the PR/version that removed the dependency. An empty reason fails the lint. | ||
| - **Warnings** (`data-backfill`): non-blocking, but confirm the batching/idempotency before merging. | ||
| 4. Verify locally: `cd packages/db && bun run db:migrate` against a dev DB. | ||
|
|
||
| ## Hard rule | ||
|
|
||
| Never annotate a destructive statement just to make the lint pass. The annotation is a claim that you verified the old code no longer depends on it. If you can't make that claim truthfully, the change belongs in a later deploy — tell the user to split it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,200 @@ | ||
| /** | ||
| * Run with: bun test scripts/check-migrations-safety.test.ts | ||
| * (Root scripts are bun-native and not part of the turbo/vitest workspaces.) | ||
| */ | ||
| import { describe, expect, test } from 'bun:test' | ||
| import { lintSql } from './check-migrations-safety.ts' | ||
|
|
||
| const rules = (sql: string) => lintSql(sql).map((f) => `${f.tier}:${f.rule}`) | ||
|
|
||
| describe('additive / safe', () => { | ||
| test('nullable add column passes', () => { | ||
| expect(lintSql('ALTER TABLE "webhook" ADD COLUMN "provider_config" json;')).toEqual([]) | ||
| }) | ||
|
|
||
| test('NOT NULL with DEFAULT passes', () => { | ||
| expect(lintSql('ALTER TABLE "user" ADD COLUMN "flag" boolean DEFAULT false NOT NULL;')).toEqual( | ||
| [] | ||
| ) | ||
| }) | ||
|
|
||
| test('CREATE TABLE plus index and FK on that new table passes', () => { | ||
| const sql = `CREATE TABLE "kb" ("id" text PRIMARY KEY NOT NULL, "user_id" text NOT NULL); | ||
| --> statement-breakpoint | ||
| CREATE INDEX "kb_user_id_idx" ON "kb" USING btree ("user_id"); | ||
| --> statement-breakpoint | ||
| ALTER TABLE "kb" ADD CONSTRAINT "kb_user_fk" FOREIGN KEY ("user_id") REFERENCES "user"("id");` | ||
| expect(lintSql(sql)).toEqual([]) | ||
| }) | ||
|
|
||
| test('CONCURRENTLY index after a COMMIT breakpoint passes', () => { | ||
| const sql = `COMMIT; | ||
| --> statement-breakpoint | ||
| SET lock_timeout = 0; | ||
| --> statement-breakpoint | ||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_x" ON "embedding" ("kb_id");` | ||
| expect(lintSql(sql)).toEqual([]) | ||
| }) | ||
| }) | ||
|
|
||
| describe('hard errors', () => { | ||
| test('ADD COLUMN NOT NULL without default', () => { | ||
| expect(rules('ALTER TABLE "user" ADD COLUMN "email" text NOT NULL;')).toEqual([ | ||
| 'error:add-not-null-no-default', | ||
| ]) | ||
| }) | ||
|
|
||
| test('RENAME column', () => { | ||
| expect(rules('ALTER TABLE "marketplace" RENAME COLUMN "executions" TO "views";')).toEqual([ | ||
| 'error:rename', | ||
| ]) | ||
| }) | ||
|
|
||
| test('CREATE INDEX on existing table without CONCURRENTLY', () => { | ||
| expect(rules('CREATE INDEX "idx_y" ON "embedding" ("kb_id");')).toEqual([ | ||
| 'error:index-not-concurrent', | ||
| ]) | ||
| }) | ||
|
|
||
| test('CONCURRENTLY index without IF NOT EXISTS', () => { | ||
| const sql = `COMMIT; | ||
| --> statement-breakpoint | ||
| CREATE INDEX CONCURRENTLY "idx_z" ON "embedding" ("kb_id");` | ||
| expect(rules(sql)).toEqual(['error:concurrent-index-not-idempotent']) | ||
| }) | ||
|
|
||
| test('CONCURRENTLY index without a preceding COMMIT', () => { | ||
| expect( | ||
| rules('CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_z" ON "embedding" ("kb_id");') | ||
| ).toEqual(['error:concurrent-index-no-commit']) | ||
| }) | ||
|
|
||
| test('ADD FOREIGN KEY on existing table without NOT VALID', () => { | ||
| expect( | ||
| rules( | ||
| 'ALTER TABLE "session" ADD CONSTRAINT "s_fk" FOREIGN KEY ("uid") REFERENCES "user"("id");' | ||
| ) | ||
| ).toEqual(['error:constraint-not-valid']) | ||
| }) | ||
| }) | ||
|
|
||
| describe('annotate tier', () => { | ||
| const drop = 'ALTER TABLE "webhook" DROP COLUMN "secret";' | ||
|
|
||
| test('DROP COLUMN unannotated fails', () => { | ||
| expect(rules(drop)).toEqual(['error:drop-column']) | ||
| }) | ||
|
|
||
| test('DROP COLUMN annotated passes', () => { | ||
| const sql = `-- migration-safe: secret read removed in v0.6.1 (#1234), shipped two deploys ago\n${drop}` | ||
| expect(lintSql(sql)).toEqual([]) | ||
| }) | ||
|
|
||
| test('annotation tolerates an intervening statement-breakpoint line', () => { | ||
| const sql = `ALTER TABLE "webhook" ADD COLUMN "provider_config" json; | ||
| --> statement-breakpoint | ||
| -- migration-safe: secret read removed in v0.6.1 (#1234) | ||
| ${drop}` | ||
| expect(lintSql(sql)).toEqual([]) | ||
| }) | ||
|
|
||
| test('dangling annotation with empty reason fails', () => { | ||
| const sql = `-- migration-safe:\n${drop}` | ||
| const found = lintSql(sql) | ||
| expect(found).toHaveLength(1) | ||
| expect(found[0].tier).toBe('error') | ||
| expect(found[0].message).toContain('no reason') | ||
| }) | ||
|
|
||
| test('annotation on the wrong statement does not bleed', () => { | ||
| const sql = `-- migration-safe: removing secret | ||
| ALTER TABLE "webhook" ADD COLUMN "x" json; | ||
| --> statement-breakpoint | ||
| ${drop}` | ||
| expect(rules(sql)).toEqual(['error:drop-column']) | ||
| }) | ||
|
|
||
| test('type change and DROP TABLE are annotate-tier', () => { | ||
| expect( | ||
| rules( | ||
| 'ALTER TABLE "user_table_rows" ALTER COLUMN "order_key" SET DATA TYPE text COLLATE "C";' | ||
| ) | ||
| ).toEqual(['error:alter-type']) | ||
| expect(rules('DROP TABLE "marketplace_execution" CASCADE;')).toEqual(['error:drop-table']) | ||
| }) | ||
| }) | ||
|
|
||
| describe('warnings (non-blocking)', () => { | ||
| test('UPDATE backfill warns but does not error', () => { | ||
| const found = lintSql('UPDATE "user_table_definitions" SET "schema" = \'{}\' WHERE id = \'1\';') | ||
| expect(found.map((f) => f.tier)).toEqual(['warn']) | ||
| }) | ||
|
|
||
| test('UPDATE without WHERE flags the whole-table note', () => { | ||
| const found = lintSql('UPDATE "user" SET "active" = true;') | ||
| expect(found[0].tier).toBe('warn') | ||
| expect(found[0].message).toContain('no WHERE') | ||
| }) | ||
| }) | ||
|
|
||
| describe('review fixes', () => { | ||
| test('RENAME CONSTRAINT is metadata-only — not flagged', () => { | ||
| expect( | ||
| lintSql('ALTER TABLE "permission_group" RENAME CONSTRAINT "old_fk" TO "new_fk";') | ||
| ).toEqual([]) | ||
| }) | ||
|
|
||
| test('ALTER INDEX ... RENAME is metadata-only — not flagged', () => { | ||
| expect(lintSql('ALTER INDEX "old_idx" RENAME TO "new_idx";')).toEqual([]) | ||
| }) | ||
|
|
||
| test('table RENAME TO is still a hard error', () => { | ||
| expect(rules('ALTER TABLE "marketplace" RENAME TO "listings";')).toEqual(['error:rename']) | ||
| }) | ||
|
|
||
| test('plain DROP INDEX is a hard error (ACCESS EXCLUSIVE lock)', () => { | ||
| expect(rules('DROP INDEX "permission_group_workspace_name_unique";')).toEqual([ | ||
| 'error:drop-index-not-concurrent', | ||
| ]) | ||
| }) | ||
|
|
||
| test('DROP INDEX CONCURRENTLY after a COMMIT passes clean', () => { | ||
| const sql = `COMMIT; | ||
| --> statement-breakpoint | ||
| DROP INDEX CONCURRENTLY IF EXISTS "stale_idx";` | ||
| expect(lintSql(sql)).toEqual([]) | ||
| }) | ||
|
|
||
| test('DROP INDEX CONCURRENTLY without IF EXISTS is not idempotent', () => { | ||
| const sql = `COMMIT; | ||
| --> statement-breakpoint | ||
| DROP INDEX CONCURRENTLY "stale_idx";` | ||
| expect(rules(sql)).toEqual(['error:concurrent-drop-index-not-idempotent']) | ||
| }) | ||
|
|
||
| test('DROP INDEX CONCURRENTLY without a preceding COMMIT errors', () => { | ||
| expect(rules('DROP INDEX CONCURRENTLY IF EXISTS "stale_idx";')).toEqual([ | ||
| 'error:concurrent-drop-index-no-commit', | ||
| ]) | ||
| }) | ||
|
|
||
| test('alter-type does not match TYPE inside a string default', () => { | ||
| expect(lintSql(`ALTER TABLE "x" ALTER COLUMN "y" SET DEFAULT 'change TYPE later';`)).toEqual([]) | ||
| }) | ||
| }) | ||
|
|
||
| describe('parser robustness', () => { | ||
| test('semicolon inside a string literal does not split', () => { | ||
| expect(lintSql(`ALTER TABLE "x" ADD COLUMN "y" text DEFAULT 'a;b' NOT NULL;`)).toEqual([]) | ||
| }) | ||
|
|
||
| test('dollar-quoted DO block is one statement; FK on a new table is suppressed', () => { | ||
| const sql = `CREATE TABLE "jobs" ("id" text PRIMARY KEY NOT NULL, "wid" text NOT NULL); | ||
| --> statement-breakpoint | ||
| DO $$ BEGIN | ||
| ALTER TABLE "jobs" ADD CONSTRAINT "jobs_fk" FOREIGN KEY ("wid") REFERENCES "workspace"("id"); | ||
| EXCEPTION WHEN duplicate_object THEN null; | ||
| END $$;` | ||
| expect(lintSql(sql)).toEqual([]) | ||
| }) | ||
| }) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.