docs: architecture overview — concepts, lifecycle, sync kinds, plugin contracts#128
docs: architecture overview — concepts, lifecycle, sync kinds, plugin contracts#128turban wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new developer-facing architecture reference and links it into the MkDocs navigation to help contributors understand Climate API core concepts, ingestion lifecycle, sync behavior, and plugin extension points.
Changes:
- Adds
docs/architecture.mddescribing core concepts (templates, artifacts, managed datasets, extent), lifecycle, sync kinds, and plugin contracts. - Documents ingestion/transform pipeline responsibilities and GeoZarr metadata expectations.
- Wires the new document into the MkDocs Concepts nav.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| mkdocs.yml | Adds the new Architecture page under Concepts in the docs nav. |
| docs/architecture.md | Introduces a platform architecture overview, including lifecycle and plugin contract documentation. |
Comments suppressed due to low confidence (7)
docs/architecture.md:37
- This path is not always
data/artifacts/records.json: the artifact index path is derived fromdata_dirin CLIMATE_API_CONFIG (e.g../data/artifacts/records.json) and otherwise falls back to an XDG data directory. Consider describing it as “{data_dir}/artifacts/records.json (or XDG fallback)” to match the implementation.
Artifacts are stored in `data/artifacts/records.json`. This is an internal implementation detail — consumers should never depend on artifact IDs or artifact paths directly.
docs/architecture.md:145
- The download function contract here omits
country_code, but the framework may pass acountry_codekwarg when the provider function declares it (see downloader.download_dataset). To avoid misleading plugin authors, please includecountry_code: stras an optional parameter (similar tobbox) or document that additional optional kwargs likecountry_codemay be provided based on the function signature.
def download(
*,
start: str, # ISO 8601 date or datetime
end: str,
dirname: Path, # write output files here
prefix: str, # use as filename prefix, e.g. f"{prefix}_{year}.nc"
overwrite: bool,
bbox: list[float], # optional — only if the source needs a spatial filter
**kwargs, # default_params from the YAML template
) -> None:
docs/architecture.md:158
- Transform functions can receive extra keyword parameters: the transforms pipeline forwards any YAML-declared
paramsviafunc(ds, dataset, **params). The signature shown here ((ds, dataset)) will raiseTypeErrorif template authors pass params. Please document that transforms should accept**kwargsor explicit keyword-only parameters.
def my_transform(ds: xr.Dataset, dataset: dict) -> xr.Dataset:
# Receive the dataset after download, return a modified dataset.
# Modify ds[dataset["variable"]] values and variable attributes.
# Do not modify dataset-level ds.attrs — the framework manages those.
**docs/architecture.md:193**
* GeoZarr root attributes are not always stored in `zarr.json`: flat Zarr writes typically produce Zarr v2 metadata (`.zattrs`), while pyramid writes use Zarr v3 (`zarr.json`). The code explicitly supports reading attrs from either file. Please update this text to mention both layouts so readers don’t assume `zarr.json` will always exist.
Every zarr artifact must have GeoZarr root attributes for map rendering to work correctly. These are written into zarr.json at the store root:
**docs/architecture.md:213**
* The instance CRS is configured via a top-level `crs:` key in `climate-api.yaml` (see `climate-api.yaml.example` and `climate_api.config.get_crs()`), not as `extent.crs`. This example/config description should be updated to match the actual configuration shape.
The instance CRS is configured in climate-api.yaml:
extent:
name: Norway
bbox: [3.0, 57.0, 32.0, 72.5]
crs: EPSG:32633 # optional; defaults to EPSG:4326**docs/architecture.md:228**
* Artifact reuse/deduplication is described as matching an “overlapping time range”, but the implementation only reuses artifacts when `request_scope` matches exactly and temporal coverage matches the requested start (and end, unless the request is open-ended). Please adjust these bullets to reflect exact-scope reuse rather than overlap.
When a new ingestion request arrives, the framework checks whether an existing artifact already covers the requested scope:
- same
dataset_id - same bbox (from the configured extent)
- overlapping time range
**docs/architecture.md:248**
* Two internal function names in this guarantee table don’t match the code: chunk sizing is implemented via `_compute_time_space_chunks` (plus xarray auto-chunking), and artifact persistence is handled by `_store_artifact_record` / `_mutate_records` rather than `_store_artifact`. Either update the names or remove the specific helper references to avoid sending readers to non-existent functions.
| Coordinate name normalisation (lat → y, etc.) | build_dataset_zarr |
| Reprojection to instance CRS | reproject_to_instance_crs |
| Zarr chunking (auto-sized per period_type) | _compute_chunk_sizes |
| Multiscale pyramid generation (when dims > 2048×2048) | build_dataset_zarr |
| GeoZarr root attributes (spatial:bbox, proj:code) | build_dataset_zarr |
| Artifact coverage computation | _coverage_from_dataset |
| Artifact record persistence | _store_artifact |
| pygeoapi publication | publish_artifact_record if publish=true |
</details>
turban
left a comment
There was a problem hiding this comment.
Fixed in 8cb6f6f — built-ins now reference climate_api/data/datasets/ (package path via importlib.resources) and artifacts reference {data_dir}/artifacts/records.json with a note that data_dir is configured in climate-api.yaml.
yambanso
left a comment
There was a problem hiding this comment.
The documentation refers to "download function" throughout, but this conflates two separate concerns. The function is actually responsible for ingestion (fetching source data + writing NetCDF files), while the framework handles the true download coordination (HTTP fetching, caching, retries). Additionally, there's no clear pattern for reusing ingestion logic across multiple YAML templates with similar source patterns.
|
Good points, both addressed: Terminology: "download function" was a poor name — the function is responsible for ingestion (fetching source data and writing NetCDF files), while the framework owns the HTTP coordination, caching, and retries. Renamed to "ingestion function" throughout the doc. This also aligns with the YAML key, which is already Reuse pattern: the intended pattern is to reference the same Python function from multiple YAML templates and differentiate via One more clarification added: the ingestion function is called identically by both |
…dpoint usage and reuse pattern
yambanso
left a comment
There was a problem hiding this comment.
i think we on the reuse pattern we can go for the code once and reuse many whereby we have a single generic ingest function with the default parameters shared (specified) across multiple YAML function.....
yambanso
left a comment
There was a problem hiding this comment.
i think it would be good to add the function signature for the "lagged_latest_available"
Summary
Adds
docs/architecture.md— a developer-facing reference that explains the platform's design, the relationships between its core concepts, and the consequences of key architectural decisions. Wired into the mkdocs nav under Concepts.Why this document
Before merging feature work (especially new sync kinds from #127), we need a shared understanding of what the platform is, why it is structured this way, and what the rules are. This document is the foundation for that.
What it covers
temporal,release, andstatic— a decision table with when to use each, what happens on each sync, the trade-offs betweenappendandrematerializeWhat it does not cover
sync.kind: derivedandsync.kind: remoteare covered in #127, which depends on this PR. API endpoint reference, setup instructions, and provider-specific dataset details remain in their existing docs.