feat: zarr generation improvements and Zarr/GeoZarr concepts doc#129
feat: zarr generation improvements and Zarr/GeoZarr concepts doc#129turban wants to merge 21 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the platform’s Zarr/GeoZarr ingestion and STAC time-step handling to be ISO-8601-resolution-driven (via extents.temporal.resolution), standardizes store writes on Zarr v3, and adds a developer-facing concepts document describing Zarr/GeoZarr storage and serving.
Changes:
- Add a new Concepts doc (
docs/zarr_and_geozarr.md) and link it inmkdocs.yml. - Use shared
resolve_iso_step()for STAC temporalstep, removing the local period-type lookup. - Improve Zarr generation: Zarr v3 for flat stores, larger spatial chunks (512), and ISO-duration-derived time chunk sizing.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Adds “Zarr and GeoZarr” page under Concepts navigation. |
| docs/zarr_and_geozarr.md | New concepts/reference doc explaining Zarr/GeoZarr rationale, layout, chunking, and serving. |
| climate_api/stac/services.py | Uses resolve_iso_step() instead of a period_type→step lookup for STAC temporal step override. |
| climate_api/shared/time.py | Adds ISO-8601 duration parsing and helpers for resolution-driven chunk sizing. |
| climate_api/data_manager/services/downloader.py | Updates chunk sizing logic (512 spatial), removes auto-chunk intermediate, and writes flat stores as Zarr v3. |
Comments suppressed due to low confidence (1)
docs/zarr_and_geozarr.md:80
- This table also uses double pipes (
|| ... ||), which can break markdown rendering. Convert it to standard single-pipe markdown table syntax.
| Attribute | Example value | Purpose |
|-----------|--------------|---------|
| `spatial:bbox` | `[270000, 6450000, 1120000, 7950000]` | Bounding box in the native CRS |
| `proj:code` | `EPSG:32633` | CRS of the stored coordinates |
| `zarr_conventions` | `[{...}]` | Convention declarations |
…d types, remove auto-chunk fallback
Add _PERIOD_TYPE_ISO_STEP dict and time_chunk_for_iso_step() to shared/time.py as the single source of truth for period type → ISO 8601 step mapping. Chunk sizes are now derived from the duration via a tiered formula (~1 week for sub-daily, ~1 month for daily/sub-weekly, ~1 year for weekly and coarser) rather than a hardcoded if/elif table. Remove the duplicate _period_step() from stac/services.py and replace with the shared period_type_to_iso_step(). Adding a new period type now requires only an entry in _PERIOD_TYPE_ISO_STEP — chunking and STAC metadata follow automatically.
…ookup table Custom datasets can now declare their own temporal step in the template YAML (period_step: P14D) without requiring any changes to core code. The built-in lookup table remains as a convenience default for standard period type names.
extents.temporal.resolution is already ISO 8601 and already present in every template — it is the correct source of truth. The period_step field added in the previous commit is unnecessary; extents.temporal.resolution covers the same case, including custom period types in plugin templates. The built-in lookup table remains as a fallback for templates that predate the extents.temporal.resolution field. See issue #94 for the longer-term cleanup.
Remove the period_type lookup table fallback. Dataset templates must now declare extents.temporal.resolution as an ISO 8601 duration. All three built-in templates already have this field.
Eliminates the v2/v3 inconsistency where flat stores used Zarr v2 (.zmetadata) and pyramid stores used Zarr v3 (zarr.json). All stores are now v3. consolidated=True is kept as a performance hint; consolidated metadata in v3 is an experimental zarr extension already in use for pyramid stores.
Replace period_type table with ISO 8601 duration tier explanation, remove v2 metadata filename references (.zarray, .zattrs, .zgroup), and drop the "(or .zattrs for v2)" qualifier now that all stores are v3.
…fixtures and doc paths
8c667b2 to
ba5cdf7
Compare
| | Daily to sub-weekly | 24 h – 168 h | ~1 month | `P1D` (daily) → 30 steps | | ||
| | Weekly and coarser | ≥ 168 h | ~1 year | `P1M` (monthly) → 12 steps | | ||
|
|
||
| This calculation is fully data-driven: any dataset — including custom or plugin datasets — only needs to declare `extents.temporal.resolution` and the correct chunk size is computed automatically. |
There was a problem hiding this comment.
Does this mean extents.temporal.resolution is required? downloader.py accepts missing resolution it just logs a warning.
There was a problem hiding this comment.
Fixed — updated the wording to 'When present and valid' and added a note that the fallback to period_type is used when the field is absent or invalid.
| _remove_helper_variables(collection_payload) | ||
| _round_spatial_steps(collection_payload) | ||
| _override_time_step(collection_payload, _period_step(source_dataset.get("period_type"))) | ||
| _override_time_step(collection_payload, resolve_iso_period_step(source_dataset)) |
There was a problem hiding this comment.
seems we are consuming whatever extents.temporal.resolution we get from resolve_iso_period_step. We need to validate ... probably best we do this centrally in the shared time.py
There was a problem hiding this comment.
Fixed in time.py — resolve_iso_period_step now validates the string with _iso_step_to_approx_hours before returning it, logging a warning and returning None on invalid input. The call site in stac/services.py is already safe since _override_time_step no-ops on None. The broader period_type migration is tracked in #136.
Validates the ISO 8601 duration string using _iso_step_to_approx_hours before returning it, logging a warning and returning None on invalid input rather than passing a malformed string to callers. Updates docs to reflect that the field is optional with a fallback, not required. Closes the validation gap noted in PR #129 review. Ref: #136
Depends on
#128 (architecture documentation) — branched from `docs/platform-concepts`.
Summary
Adds `docs/zarr_and_geozarr.md` — a developer-facing reference explaining why the platform uses Zarr, how stores are structured and served, and how GeoZarr root attributes enable map rendering. Wired into the mkdocs nav under Concepts.
Also improves the zarr store generation pipeline based on a review against open issues.
Changes
Zarr generation improvements
Documentation