Skip to content

Postcodes.io 19.0.0#1351

Open
cblanc wants to merge 66 commits into
mainfrom
next_major
Open

Postcodes.io 19.0.0#1351
cblanc wants to merge 66 commits into
mainfrom
next_major

Conversation

@cblanc
Copy link
Copy Markdown
Member

@cblanc cblanc commented May 22, 2026

Postcodes.io 19.0.0

A major release focused on a leaner data model, broader coverage, and faster lookups.

Breaking: Simplified, denormalised schema

Canonical tables now live under public.* (postcodes, scottish_postcodes, places, outcodes) with every code's human-readable name denormalised alongside it. Support / lookup tables are gone, the model layer is deleted, and controllers issue named prepared statements directly against public.*. Easier to query from anything that speaks SQL, simpler to reason about, and decoupled from the ingest pipeline — the database is now a pre-built pg_dump produced upstream.

Richer postcode data

  • Postcode now includes date_of_termination and IMD fields.
  • ScottishPostcode expanded to cover the full SPD shape rather than a thin subset.
  • Outcode responses read from a denormalised public.outcodes table instead of aggregating on the fly.

Welsh constituencies

Postcode lookups now return Senedd Cymru (Welsh Parliament) constituency alongside the existing Westminster, council, and ward data. OpenAPI schema and docs updated to match.

Performance

  • Bulk postcode and reverse-geocoding endpoints batched into single SQL round-trips.
  • Outcode endpoint backed by a pre-aggregated table rather than per-request GROUP BY.
  • Named prepared statements throughout the query layer so Postgres caches plans.
  • PostGIS upgraded to 17-3.5; test container moved to debian slim.
  • Postgres tuning for read-only workload: Configured for throughput and tail-latency optimization with pg_prewarm post-load, targeting memory-intensive queries. Benchmark results vs. stock Postgres 17.5 settings (50 VUs × 60s per script, k6 load test against 2.7M postcode dataset):
    • bulk-reverse-geo p99: 2.62s → 0.99s (−62%) — spatial queries now 2.6× faster at tail
    • single p99: 24.28ms → 20.28ms (−16%) — point lookups improved
    • Throughput improvements marginal (1–4%), Node/Express tier is the actual bottleneck for most endpoints

Other

  • Package manager migrated to pnpm.
  • Test seed regenerated from the new public.* dump; make test runs the full Docker-based suite end-to-end.
  • Ingest tooling removed — this repo no longer owns CSV ingestion.

cblanc added 30 commits May 10, 2026 11:54
BREAKING CHANGE:

This update restructures the database into just three tables:

- onspd (ONS Postcode Directory)
- spd (Scottish Postcode Directory)
- osopennames (OS Open Names)

Tables are now fully denormalised for performance
bin/generate_test_seed slices pcio.{onspd,opennames,spd} from a loaded
postcodesiodb (filters: AB1% + 6 specific postcodes for onspd,
KW16/KW17 for opennames, AB1% for spd) and emits a gzipped SQL stream
of pre-data DDL + COPY FROM STDIN + post-data indexes.

bin/load_test_seed drops + recreates postcodeio_testing and pipes the
seed in. End-to-end bootstrap in ~1.3s vs minutes for the previous
model-driven path.

Removes the orphaned shims around the old model layer:
  bin/create_test_db, bin/clear_test_db,
  api/bin/create_test_db.ts, api/bin/clear_test_db.ts

Drops dead scripts from package.json (setup_test_db, test:seed,
test:clear) that referenced binaries that no longer exist or are about
to be removed.

BREAKING CHANGE: npm scripts setup_test_db, test:seed and test:clear
have been removed. test:create no longer compiles TypeScript; it
shells to bin/load_test_seed and expects test/seed/v13.sql.gz to be
present (regenerate with bin/generate_test_seed against a fully loaded
dev DB).
Avoids the conflict with a host-side PostgreSQL on 5432, which silently
masked the Docker container during make psql and dump loads. make psql
target updated to match (port 5433, explicit password env).
The denormalised pcio.onspd table exposes two fields not previously
documented in the Postcode response schema:

  date_of_termination — YYYYMM string, null for live postcodes. The
    same termination signal currently surfaced via the dedicated
    /terminated_postcodes/:postcode endpoint, now available alongside
    the live record.

  index_of_multiple_deprivation — integer rank within country
    (England 1..32844, Scotland 1..6976, etc.). Lower = more deprived.
    Scales are independent per country and not directly comparable.

Static bundle (static/openapi.{json,yaml}) and Docusaurus reference
pages regenerated.
The denormalised pcio.spd table holds ~100 columns covering council
areas, electoral wards, health boards, data zones / output areas /
intermediate zones (multiple census vintages), localities, settlements,
parishes, islands, national parks, ITL/LAU geographies, urban-rural
classifications, SIMD, ROA, and per-census household and population
counts.

Mirrors the Postcode pattern: top-level fields hold the latest
canonical value (e.g. council_area, data_zone, output_area), year-
suffixed fields hold prior versions where SPD provides them
(data_zone_2011, output_area_2001, census_population_count_1991, etc),
and the codes block carries the GSS / official codes for each
datapoint.

Static bundle and Docusaurus reference page regenerated.

BREAKING CHANGE: GET /scotland/postcodes/:postcode response shape
expands from 3 fields to a full SPD record. Existing fields
(postcode, scottish_parliamentary_constituency, codes.scottish_parliamentary_constituency)
are preserved under their canonical names.
The v13 release ships an API-only repo: postcodes.io no longer ingests
ONSPD, SPD, or OS Open Names CSVs. A pre-built pg_dump produced by
upstream tooling is the only supported way to populate the database.

Removed:
  bin/import, bin/import_scot, bin/import_oson, bin/find_missing,
  bin/rebuild_support_tables, bin/import_boundaries, bin/setup_test,
  bin/exec.js, and their api/bin/*.ts compiled targets.

Removed all support-table models and the helpers that drove them:
  api/app/models/attribute_base.ts, api/app/lib/setup.ts, and the 23
  per-attribute model files (ccg, ced, ward, district, parish, county,
  constituency, scottish_constituency, nuts, lsoa, msoa,
  police_force_area, nhs_region, ttwa, national_park, bua, icb,
  cancer_alliance, lsoa21, msoa21, ruc11, ruc21, lep). The denormalised
  pcio.onspd / pcio.spd tables carry every name and code field directly,
  so the join-via-lookup pattern is obsolete.

Removed the corresponding tests:
  test/attribute_base.unit.ts, test/attribute_base.models.unit.ts,
  test/setup.unit.ts, test/code-parser.unit.ts.

api/app/models/index.ts now only re-exports the bespoke models
(Postcode, ScottishPostcode, Outcode, Place, TerminatedPostcode), which
will themselves be replaced by api/app/queries/* in step e of the v13
plan. test/helper/setup.ts updated to drop the deleted imports and
seed/destroy calls; bespoke-model integration tests still rely on it
and will be migrated alongside their controllers.

package.json:
  - Removed the bin block (postcodesio-onspd, postcodesio-scotpd,
    postcodesio-oson) since the binaries are gone.
  - Removed the rebuild_support_tables script.

pg-copy-streams, csv, csv-parse and ospoint dependencies are still
referenced by api/app/models/base.ts and place.ts (the bespoke models)
and will be dropped when those go in step f.

Tree typechecks (npm run build clean). Bespoke-model tests are now red
because they query public.postcodes etc. while the new dump populates
pcio.onspd; that's expected and is fixed in step e as each controller
is migrated to api/app/queries/*.

BREAKING CHANGE: removed CLIs postcodesio-onspd, postcodesio-scotpd,
and postcodesio-oson. Ingestion is no longer this package's
responsibility - consume a published pg_dump instead.
The 23 support-table lookup JSONs (bua, cancer_alliances, ccgs, ceds,
constituencies, counties, districts, icb, lep, lsoa, lsoa21, msoa,
msoa21, national_parks, nhs_regions, nuts, parishes,
police_force_areas, ruc11, ruc21, scottish_constituencies, ttwa,
wards) are no longer consumed - the support-table models that read
them were removed in the previous commit.

Also drops:
  data/scripts/  - utility scripts that generated the JSONs from
                   raw ONS data; ingest is no longer this repo's job.
  data/2024_*_westminster_patch.sql.gz  - manual schema patches from
                   the old ingest pipeline, no remaining references.
  test/helper/attribute_base.suite.ts  - shared assertion suite for
                   the deleted support-table unit tests.

Kept (still referenced by the bespoke Postcode / ScottishPostcode /
TerminatedPostcode models, which go in step f):
  onspd_schema.json, spd_small_schema.json, spd_large_schema.json,
  countries.json, european_registers.json, nhsHa.json, pcts.json,
  regions.json.

Tree typechecks.
api/app/queries/ is the new home for raw SQL access (per the v13 plan,
step e). db.ts owns the single pg pool and a thin query() helper that
forwards to pool.query - including support for prepared statement
configs (pg.QueryConfig) so per-resource query files can declare named
queries that the driver caches.

api/app/controllers/utils_controller now imports query from
api/app/queries/db. api/app/models/base.ts imports the pool from the
same place so the legacy and new code paths share one pool while the
remaining controllers are migrated. base.ts no longer constructs its
own Pool.

This is the first migration of step e. The rest follow easiest-to-
riskiest: places, outcodes, terminated_postcodes, scottish_postcodes,
then postcodes (bulk, nearest, reverse-geo last).
show / random / query now hit pcio.opennames via prepared queries in
api/app/queries/places.ts. Three named statements (findByCode,
termsSearch, prefixSearch) so the pg driver caches the plans on the
hot path.

Search behaviour preserved: phraseto_tsquery first, prefix-regex
fallback, same lowercase/strip-quote/replace-hyphen normalisation.
Random uses the same lazy code-cache approach as the legacy model.

test/places.integration.ts dropped its model-driven before/after
hooks (clearPostcodeDb, seedPostcodeDb, Place.setupTable) - the test
DB is now bootstrapped once via bin/load_test_seed before mocha runs.
The 15 place tests pass against the seeded pcio.opennames data.

Removed the model-coupled unit tests (test/place.unit.ts,
test/place.search.unit.ts) - they exercised Place.* methods on the
old `places` table which is gone with the new schema.
The legacy schema kept a separate `outcodes` summary table built at
ingest time. The denormalised pcio.onspd lets us derive the same shape
on the fly: outcode rows become AVG(lon/lat), AVG(eastings/northings)
and array_agg(DISTINCT) of admin_district / parish / admin_county /
admin_ward / country / parliamentary_constituency.

api/app/queries/outcodes.ts has two named queries:

  outcodes_find         GROUP BY outcode aggregate; uses the
                        onspd_outcode_idx for the WHERE outcode = $1
                        scan.
  outcodes_nearest      same aggregate, scoped to outcodes whose
                        postcodes are within $3 metres of the centroid.
                        Inner subquery uses the GIST onspd_location_idx.

Both filter out terminated and location-less rows so admin arrays don't
include stale or unmappable values.

Tests: outcodes.integration + outcodes.nearest.integration drop their
seedPostcodeDb hooks (test DB is dump-loaded via bin/load_test_seed)
and pass against the AB1% slice in the seed (24 tests). outcode.unit.ts
removed - it tested the now-gone Outcode.populateLocation /
Outcode.seedData lifecycle.
In v13 terminated postcodes live alongside live ones in pcio.onspd,
distinguished by `date_of_termination IS NOT NULL`. The new query
matches postcodes via the existing `replace(postcode, ' ', '')`
functional index (onspd_replace_idx) so the lookup stays a single
b-tree probe.

api/app/queries/terminated_postcodes.ts splits date_of_termination
(YYYYMM) into year_terminated / month_terminated to preserve the
existing API response shape.

Tests: terminated_postcodes.integration drops its
seedTerminatedPostcodeDb hooks; randomTerminatedPostcode helper
rewritten to query pcio.onspd. 6/6 passing.

Removed terminated_postcode.unit.ts and
terminated_postcode.regression.unit.ts - both exercised the legacy
TerminatedPostcode model (separate `terminated_postcodes` table) which
no longer exists.
api/app/queries/scottish_postcodes.ts maps every column in pcio.spd to
the canonical+year-suffixed response shape from the v13 OpenAPI spec.
Field mapping happens in SQL via column aliases (e.g.
council_area_2019_name AS council_area, output_area_2022_code AS
output_area), so the JS toJson() is a thin object reshape rather than
a wide rename table.

The lookup query uses the existing pc_compact unique index. Filters
date_of_deletion IS NULL so deleted postcodes don't surface here -
they belong on /terminated_postcodes/:postcode.

scottish_postcodes_controller still falls through to the legacy
Postcode model when a postcode exists in ONSPD but not in SPD. That
branch is gated by a TODO and will be unwound when
postcodes_controller migrates.

Tests: scottish_postcodes.integration drops the seedScottishPostcodeDb
hooks and uses AB10 1AB (in the seed) instead of ML11 0GH. The
"exists in ONSPD but not in SPD" test is .skipped pending the
postcodes migration. 4 passing, 1 pending.

Removed scottish_postcode.unit.ts (model-coupled).

BREAKING CHANGE: GET /scotland/postcodes/:postcode now returns the
full SPD record. Existing fields (postcode,
scottish_parliamentary_constituency,
codes.scottish_parliamentary_constituency) preserved at their
canonical names.
The largest controller migration in step e: show, valid, random, bulk
(both lookup and reverse-geo), query, autocomplete, lonlat, nearest
all now hit pcio.onspd via prepared statements in
api/app/queries/postcodes.

Five named queries cache plans for the hot path:
  postcodes_find                 = postcode lookup via the
                                   replace(postcode, ' ', '') functional
                                   index.
  postcodes_search_by_postcode   = btree range scan on `postcode` for
                                   space-sensitive autocomplete.
  postcodes_search_by_pc_compact = btree range scan on the functional
                                   index for space-insensitive
                                   autocomplete.
  postcodes_nearest              = ST_DWithin / ST_Distance against the
                                   GIST onspd_location_idx.
  postcodes_nearest_count        = same predicate, narrow projection,
                                   used by deriveMaxRange (widesearch).

Filtering: every query excludes terminated postcodes
(`date_of_termination IS NULL`). The show handler still returns the
terminated record on 404 by routing through findTerminated from
queries/terminated_postcodes.

toJson preserves the legacy public response shape, with two new
fields surfaced from the dump: date_of_termination and
index_of_multiple_deprivation. lsoa11/msoa11 codes wired to lsoa11_id
/ msoa11_id (in the new schema lsoa_id is canonical/2021, not 2011).

Test fixtures updated to expect null where the new dump stores no
value. Pseudo-codes (E99999999/S99999999/...) are no longer returned
- the upstream pipeline drops them and the API now reflects the
absence as null.

Helper noop'd:
  test/helper/setup.ts  - clear*Db / seed*Db are now no-ops; the test
                          DB is dump-loaded by bin/load_test_seed.
  test/helper/index.ts  - randomPostcode / lookupRandomPostcode /
                          locationWithNearbyPostcodes route through
                          the new queries.

isPostcodeObject's required attribute list extended with the two
new fields.

PostcodeNotFoundError's TerminatedPostcode shape narrowed to the
public fields it actually serialises.

Removed:
  test/postcode.unit.ts, postcode.regression.unit.ts,
  postcode.search.unit.ts - all model-coupled.

226 mocha tests passing, 1 pending (the scotland not-in-SPD branch
covered by step f cleanup).

BREAKING CHANGE: all postcode endpoints' `codes.*` fields now return
null where a geography does not exist for the postcode's country,
instead of pseudo codes (E99999999/S99999999/W99999999/N99999999/
L99999999/M99999999). New top-level fields date_of_termination and
index_of_multiple_deprivation surface from the denormalised dump.
Step f of the v13 plan. With every controller migrated to
api/app/queries/* in the preceding commits, the bespoke models
(api/app/models/{postcode,scottish_postcode,terminated_postcode,outcode,
place,base,index}.ts) have no consumers and are deleted.

scottish_postcodes_controller drops its TODO'd Postcode.find fallback
and routes through queries/postcodes.find instead, which re-enables
the previously-skipped "exists in ONSPD but not in SPD" test.

data/ goes entirely (the remaining schema/lookup JSONs were only
consumed by the deleted CSV-driven setupTable code paths).

test/helper:
  - index.ts no longer re-exports models, drops inferSchemaData /
    inferIndexInfo / sortByIndexColumns / getCustomRelation /
    seedPaths and the removeDiacritics helper (all only consumed by
    the now-gone base.unit.ts).
  - type_checking.ts drops isRawPostcodeObject /
    isRawPostcodeObjectWithFC{,andDistance} / isRawTerminatedPostcodeObject
    (also base.unit.ts-only consumers).
  - pg.ts swaps Base.query for queries/db.query.

base.unit.ts deleted - it exercised csvExtractor and generateMethods,
both gone with base.ts.

package.json drops dependencies that the model layer was the sole
consumer of: pg-copy-streams, pg-query-stream, csv, csv-parse,
ospoint, node.extend, plus @types/pg-copy-streams. async moves to
devDependencies (only used by postcodes.bulk.integration.ts).

216 mocha tests passing, 0 pending.

BREAKING CHANGE: api/app/models/* removed. Any external consumer
that imported from postcodes.io's compiled output (we don't think
there is one) must migrate to api/app/queries/*.
Drop docker exec coupling from bin/load_test_seed and bin/generate_test_seed.
Both now use psql/pg_dump with standard POSTGRES_* env vars, making them
portable: works against dev compose pg (host shell) and inside test container.

Default env vars match app config and docker/test/docker-compose.yml settings.
- Add postgresql-client to test Dockerfile (enables psql in container)
- Update entrypoint.sh to run npm run test:create before npm test
  so make test seeds and runs tests in one shot
- Drop NO_RELOAD_DB=true from package.json test script (unread env var)
The v13 seed is loaded once via bin/load_test_seed; tests no longer need
to seed or clear per-suite. Remove:

- clearPostcodeDb / seedPostcodeDb / clearTerminatedPostcodesDb /
  seedTerminatedPostcodeDb calls from before/after hooks across 7
  integration files
- test/helper/setup.ts (noop exports + unused seed paths)
- the re-export from test/helper/index.ts
mocha 11 uses dynamic import() for test files; ts-node/register is
CJS-only, so Node's ESM resolver fails on extensionless imports like
./helper/index. Switching to tsx via --import=tsx restores resolution.

tsx's hook prefers userland packages over Node builtins when both
exist, so the legacy `path` polyfill (transitive dep of
postman-code-generators) shadowed the builtin and broke on the
util.isString removed in Node 24. Pin builtin imports to the node:
prefix where the userland shadow exists.

async@3 ignores the (n, next) signature on async iteratees and expects
a returned promise. Update the two bulk integration fixtures.

Drop the test pg port stanza to avoid host:5432 conflicts; tests reach
pg over the compose network.
Drop the "regularly ingests" framing — ingestion moved upstream in v13.
Replace the test instructions that referenced the removed `npm run mocha`
+ NO_RELOAD_DB workflow.
Dropped type_checking.ts and replaced 33 call sites across 8 integration test files with simple assertions (isString/isNumber on identifying fields). Also silenced logger in test environment to reduce noise.

- test/helper/type_checking.ts: removed
- All test files: postcode shape checks → assert.isString(x.postcode)
- Outcode loops: assert.isString(x.outcode)
- Distance lookups: added assert.isNumber(x.distance)
- Places: assert.isString(p.code)
- Logger: silence pino when NODE_ENV=test
Dropped helper/pg.ts (listDatabaseRelations/listDatabaseIndexes unused) and helper/places_seed.ts (stale, refers to pre-v13 schema). Removed re-export from helper/index.ts.
cblanc added 30 commits May 13, 2026 13:36
- Rebuild static/openapi.json and openapi.yaml with Redocly bundler
- Regenerate docs/api/*.mdx with docusaurus-openapi-docs plugin
Enable corepack and switch install/build steps in both the production
Dockerfile and docker/test/Dockerfile to pnpm. Update entrypoint.sh
to call pnpm scripts instead of npm.
Replace the manual ~/.npm cache and npm install steps with
pnpm/action-setup@v4 plus actions/setup-node@v4's built-in pnpm
cache. Switch install/build/release commands to pnpm.
- Replace package-lock.json with pnpm-lock.yaml
- Pin pnpm@10.33.3 via packageManager
- Pin @types/express to 5.0.3 (exact) to keep lockfile-equivalent
  behavior across managers
- Override @types/express-serve-static-core to 5.0.6: newer 5.1.x
  bumped req.params to string | string[] and breaks controllers
Migrate from Mocha to Vitest as the primary test runner. Vitest provides
native TypeScript support (no tsx loader needed), native snapshot
support, and efficient parallel execution via forked processes.

- Add vitest ~4.1.6 dev dependency
- Remove mocha, chai, tsx, source-map-support dev dependencies
- Create vitest.config.ts with fork-based parallelism and 15s timeouts
- Update package.json test scripts to use vitest run/watch
Convert all 29 test files to use Vitest's expect API instead of
Chai assertions. Replace done callbacks with async/await throughout
to align with idiomatic Vitest patterns.

- chai.assert.* -> expect().* equivalents
- before/after -> beforeAll/afterAll
- done callbacks -> async/await with supertest promises
- Drop this.timeout() in favour of vitest config defaults
- Add explicit vitest imports (describe, expect, it, beforeAll)
Add contract tests for api/app/queries/* modules using Vitest's
toMatchInlineSnapshot. Captures the exact row shape returned by
each query so schema/projection changes surface in review rather
than at runtime.

Re-capture snapshots with: pnpm vitest run <file> -u
Generates a portable DuckDB artifact from the live Postgres test DB
by streaming pcio.onspd, pcio.opennames and pcio.spd through DuckDB's
postgres_query() function. Geometry columns are converted from WKB
on the fly so the resulting file is self-contained and queryable
without PostGIS.
Pipe a known row from find()/findByCode() through toJson() and lock
the result with toMatchInlineSnapshot. These tests pin the public
API response shape — any accidental field rename, drop, or codes
sub-object change surfaces before it reaches the controller.
Convert the query contract tests away from toMatchInlineSnapshot:

- Object/array snapshots -> toEqual({...}) / toEqual([...])
- null snapshots -> toBeNull()
- Scalar snapshots -> toBe(<value>)
- toThrowErrorMatchingInlineSnapshot -> toThrow(<message>)

The previous snapshot form encouraged 'rerun with -u, ship it'
review culture. Plain matchers force the test author to write the
expected value themselves, which makes intent legible and changes
auditable in review.
…ation for v13

Add missing schema fields and types to match v13 API reality:
- Postcode: Add codes.parliamentary_constituency (alias for codes.parliamentary_constituency_2024)
- Outcode: Add parliamentary_constituency array field (UK Parliamentary constituencies that intersect the outcode)
- ScottishPostcodes: Change user_type from integer to string | null to match DB varchar(5) column
- TerminatedPostcode: Add eastings and northings (nullable integers) to schema and API query

Update documentation examples throughout to include all v13 fields:
- Add index_of_multiple_deprivation and date_of_termination to all postcode examples
- Add parliamentary_constituency arrays to all outcode examples
- Add spatial bounds (min/max eastings/northings) to place examples
- Rewrite place/query.mdx with correct v13 shape (was using obsolete fields)
- Rewrite scottish-postcode/lookup.mdx with comprehensive v13 payload example (~60 fields)

Update API layer:
- Extend terminated_postcodes.ts query to fetch and return eastings/northings
- Update test fixtures with real values from live DB

Fix documentation inaccuracies:
- Clarify outcode shape in openapi/md/postcode.md
- Remove outdated Data Extraction section from overview.mdx

Regenerate bundled artifacts:
- Redocly: static/openapi.{json,yaml}
- Docusaurus: docs/api/*.api.mdx (15 files)
Add make wipe to drop the development database volume (dev env only).
Add make test-seed to reseed the test database from test/seed/v13.sql.gz.

Improves interactive test workflow: make test-up && make test-seed && make test-exec pnpm test
…o debian slim

Bumps all postgis base images from 16-3.4 to 17-3.5 (Postgres 17, PostGIS 3.5).
Switches the test container from node:24-alpine to node:24-slim to avoid pnpm
optional-dependency platform detection issues with rolldown's musl native binding
that surfaced after the vitest 4.x migration.

BREAKING CHANGE: Postgres major version upgrade. Existing PG16 data volumes
are not in-place compatible with the PG17 server — operators must drop the
volume (or pg_upgrade) before pulling the new images. The published pg_dump
under `latest` must be regenerated from PG17 to keep the format aligned.
Adds senedd_constituency (name) and senedd_constituency_no (integer) as
nullable top-level fields on the Postcode response. Sourced from the new
ONSPD columns shipping in the next pcio dump. Null for postcodes outside
Wales.

Regenerates static/openapi.{json,yaml} and the Docusaurus API reference
pages from the updated OpenAPI source.
Extends the postcode SELECT projection and toJson() to surface
senedd_constituency (name) and senedd_constituency_no (integer) from
pcio.onspd. Both null for postcodes outside Wales.

Updates the e2e and query-layer integration fixtures (AB10 1AB,
SE1P 5ZZ) which are non-Welsh — both new fields expected null.
Aligns the codebase with the schema layout the upstream pg_dump pipeline
actually produces. All query modules, the test seed generator, the
DuckDB build script, the outcodes matview fallback, and the self-host
docs now reference:

- pcio.onspd        -> public.postcodes
- pcio.opennames    -> public.places
- pcio.spd          -> public.scottish_postcodes
- pcio.outcodes     -> public.outcodes

The outcodes matview in db/matviews.sql is now a fallback for environments
without a denormalised outcodes table in the dump — IF NOT EXISTS skips
the creation when one is already present.

BREAKING CHANGE: All canonical data now lives under the public schema.
Existing deployments using the old pcio.* layout must reload from the
new dump (or migrate the schema) before pulling this version. The
test seed (test/seed/v13.sql.gz) needs regenerating against a public.*
source dump.
Rebuilds test/seed/v13.sql.gz from the May 2026 dump
(postcodesio-2026-05-21-1253) loaded into PG17. Now ships under the
new public.* schema with the senedd_constituency / senedd_constituency_no
columns populated for Welsh postcodes.

Row counts unchanged from previous seed:
- public.postcodes:          8072
- public.places:              159
- public.scottish_postcodes: 4212

Patches bin/generate_test_seed to rewrite CREATE SCHEMA public to
CREATE SCHEMA IF NOT EXISTS public — PG17's pg_dump now emits the
former, which conflicts with the default public schema in any
freshly-created database.
Adds make seed which drops/recreates postcodesiodb, installs PostGIS,
then streams the gzipped pg_dump pointed to by ./latest directly into
psql. Strips the \restrict markers emitted by pg_dump >=16.13 so a
mismatched psql client doesn't choke on them.
Moves the pnpm.overrides field out of package.json (no longer read by
pnpm 11) into pnpm-workspace.yaml, alongside the new build-script
allowlist that pnpm 11 requires for native-binding deps (esbuild,
@parcel/watcher, postman-code-generators, protobufjs).

Disables the new minimumReleaseAge supply-chain check by setting it to
0. A handful of transitive deps in the existing lockfile were published
within the default 24h window; raise this once those age out if you
want the protection back.
The new pg_dump now ships public.outcodes as a denormalised base table
rather than a matview derived at load time, so the test seed has to
emit it alongside postcodes/places/scottish_postcodes. Without this the
test DB had an empty outcodes table (the schema definition was applied
but no rows copied), and every outcodes query returned null.
Three sets of drift after regenerating from the public.* dump:

- senedd_constituency / senedd_constituency_no expected on every postcode
  fixture (null for non-Welsh test data)
- findMany() row order swapped: AB10 1AB now precedes AB10 1AF in the
  new dump's physical row order
- outcodes lat/lng/distance precision differs because outcodes is now
  aggregated over the full ONSPD upstream rather than the filtered seed
  subset (matview path)
The test Dockerfile copies only package.json and pnpm-lock.yaml before
running pnpm install --frozen-lockfile. pnpm 11 now reads overrides and
the minimumReleaseAge supply-chain check from pnpm-workspace.yaml, so
the install layer needs it too — otherwise the policy fires against
freshly-published transitive deps that are accepted locally.
The docker-compose mount bind-mounts the host repo over /app at runtime,
so anything we put in /app during build gets shadowed - including 620MB
of node_modules from `pnpm install --frozen-lockfile`. That layer
takes ~75s to export and is then discarded.

Drop the install step. pnpm 11's deps-status-check auto-installs on
the first script run (test:create / test), so the developer or CI
workflow is unaffected. Cold image build drops from 2m11s to 22s.

Sets CI=true in the test container env so pnpm 11 can purge a stale
node_modules dir without prompting for TTY confirmation.
Adjust shared_buffers to 1GB, effective_cache_size to 3GB, disable
autovacuum and full_page_writes, set work_mem=32MB, max_parallel_workers
=4, jit=off, random_page_cost=1.1, wal_level=minimal. Run VACUUM ANALYZE
and pg_prewarm on all public.* tables and indexes post-seed.

Measured on 2.7M row dataset (latest dump):
- bulk-reverse-geo p99: 2.62s -> 0.99s (-62%)
- single postcode p99: 24ms -> 20ms (-16%)
- throughput: +1-4% on most endpoints

Applied to both dev (via docker-compose bind-mount for iteration) and
prod (Dockerfile.pg baked in). Data-neutral, no schema change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant