Skip to content

Commit f4cb375

Browse files
committed
docs(clickhouse): require max+1 numbering and idempotent DDL for migrations
Codify two rules that came out of the 029/030 ordering incident (TRI-9367 test cloud deploy): 1. ClickHouse migration files must be numbered max(existing)+1. Goose runs in strict mode in the deploy pipeline and refuses to apply a missing version below the current version, which blocks the deploy. Branches that fell behind main need to renumber before merging. 2. DDL must be idempotent (ADD COLUMN IF NOT EXISTS, DROP COLUMN IF EXISTS, CREATE TABLE IF NOT EXISTS, etc.) so a retry or out-of-order apply (e.g. goose up --allow-missing for local recovery) is a no-op rather than an error. Rules go in internal-packages/clickhouse/CLAUDE.md; reviewer-facing summary added to .claude/REVIEW.md as a new 🔴 finding.
1 parent eedde27 commit f4cb375

2 files changed

Lines changed: 32 additions & 3 deletions

File tree

.claude/REVIEW.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Reserve 🔴 for things that would page someone or block a rollback. In this cod
1111
- A Redis data shape used by both versions changes in place. New shapes need a new key namespace.
1212
- A migration is not backward-compatible with the prior image.
1313
- **Schema / migration safety.** Prisma migrations must be backward-compatible with the prior deploy. Adding NOT NULL without a default, dropping a column an old image still reads, renaming a column — all 🔴.
14+
- **ClickHouse migration ordering + idempotency.** Goose runs in strict mode in the deploy pipeline and refuses to apply a missing version below the current version — slotting a new file in below the latest already-applied version blocks the deploy. New ClickHouse migration files MUST use the next available number (`max(files in internal-packages/clickhouse/schema/) + 1`); if main has added migrations while you've been on a branch, renumber yours. DDL must also be idempotent (`ADD COLUMN IF NOT EXISTS`, `DROP COLUMN IF EXISTS`, `CREATE TABLE IF NOT EXISTS`, `ADD INDEX IF NOT EXISTS`) so a partial / `--allow-missing` apply elsewhere doesn't fail on retry. Either fault is 🔴 — both break test/prod deploys. Rules live in `internal-packages/clickhouse/CLAUDE.md`.
1415
- **Queue / concurrency correctness.** RunQueue, MarQS (V1, legacy), redis-worker — any change to enqueue / dequeue / locking semantics. Re-derive the invariant on paper before flagging or accepting.
1516
- **Missing index on a hot table.** New Prisma queries against `TaskRun`, `TaskRunExecutionSnapshot`, `JobRun`, `Project`, etc. must use an existing index. Check `internal-packages/database/prisma/schema.prisma` for the relevant `@@index` lines — don't guess and don't propose `EXPLAIN`.
1617
- **Recovery-path queries.** Any `TaskRun.findFirst` / `findMany` added to a schedule, run-recovery, or restart loop. Recovery fan-outs (Redis crash, restart storms) turn "rare indexed query" into a DB incident. 🔴 even if indexed.

internal-packages/clickhouse/CLAUDE.md

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,46 @@
44

55
## Migrations
66

7-
Goose-format SQL migrations live in `schema/`. Create new numbered files:
7+
Goose-format SQL migrations live in `schema/`. Two rules below are load-bearing — both can block a deploy.
8+
9+
### Rule 1: number to `max + 1`, never slot in
10+
11+
Goose runs in strict mode in the deploy pipeline. If a migration file numbered *below* the version currently recorded in `goose_db_version` ever shows up, goose refuses to apply it and the deploy fails:
12+
13+
```
14+
goose run: error: found 1 missing migrations before current version 30:
15+
version 29: 029_add_task_kind_to_task_runs_v2.sql
16+
```
17+
18+
When adding a migration:
19+
20+
1. Look at `schema/` and take the largest existing number, call it `N`.
21+
2. Name your file `0(N+1)_descriptive_name.sql`.
22+
3. If you've been on a branch while main added migrations, **rebase and renumber** before opening the PR — a file numbered below the new max will block the next deploy after your PR merges.
23+
24+
### Rule 2: DDL must be idempotent
25+
26+
Migrations can be applied out of order in some environments (`goose up --allow-missing` for local recovery, manual fixups, etc.) and may be retried. Always use idempotent forms so a re-apply is a no-op:
827

928
```sql
1029
-- +goose Up
1130
ALTER TABLE trigger_dev.your_table
12-
ADD COLUMN new_column String DEFAULT '';
31+
ADD COLUMN IF NOT EXISTS new_column String DEFAULT '';
1332

1433
-- +goose Down
1534
ALTER TABLE trigger_dev.your_table
16-
DROP COLUMN new_column;
35+
DROP COLUMN IF EXISTS new_column;
1736
```
1837

38+
Equivalent forms for other DDL:
39+
40+
- `CREATE TABLE IF NOT EXISTS …`
41+
- `DROP TABLE IF EXISTS …`
42+
- `ADD INDEX IF NOT EXISTS …` / `DROP INDEX IF EXISTS …`
43+
- `CREATE MATERIALIZED VIEW IF NOT EXISTS …` / `DROP VIEW IF EXISTS …`
44+
45+
ClickHouse supports `IF [NOT] EXISTS` on all of the above. Older migrations in this directory predate the rule and are not idempotent — leave them as-is unless you're explicitly hardening one.
46+
1947
## Naming Conventions
2048

2149
- `raw_` prefix for input tables (where data lands first)

0 commit comments

Comments
 (0)