Skip to content

feat: zarr generation improvements and Zarr/GeoZarr concepts doc#129

Open
turban wants to merge 21 commits into
docs/platform-conceptsfrom
docs/zarr-and-geozarr
Open

feat: zarr generation improvements and Zarr/GeoZarr concepts doc#129
turban wants to merge 21 commits into
docs/platform-conceptsfrom
docs/zarr-and-geozarr

Conversation

@turban
Copy link
Copy Markdown
Contributor

@turban turban commented May 13, 2026

Depends on

#128 (architecture documentation) — branched from `docs/platform-concepts`.

Summary

Adds `docs/zarr_and_geozarr.md` — a developer-facing reference explaining why the platform uses Zarr, how stores are structured and served, and how GeoZarr root attributes enable map rendering. Wired into the mkdocs nav under Concepts.

Also improves the zarr store generation pipeline based on a review against open issues.

Changes

Zarr generation improvements

  • All stores now use Zarr v3 format — flat stores pass `zarr_format=3` to `to_zarr`, making them consistent with pyramid stores (which were already forced to v3 by topozarr). Consolidated metadata is embedded in `zarr.json` for both store types.
  • Spatial chunk size increased from 256×256 to 512×512 — better balance between tile rendering (fewer HTTP requests) and analysis workloads. Closes GeoZarr: optimal chunking and multiscales strategy for web tiles and analysis #45.
  • `extents.temporal.resolution` drives chunk sizing and STAC metadata — `resolve_iso_period_step()` in `shared/time.py` reads the ISO 8601 duration directly from the template, so any period type is supported without touching core code. The old `period_type` lookup table in `stac/services.py` is removed.
  • Time chunk sizes derived from ISO 8601 duration formula — tiered by magnitude (~1 week for sub-daily, ~1 month for daily/sub-weekly, ~1 year for weekly and coarser) rather than hardcoded per period type name.
  • `extents.temporal.resolution` validated centrally — `resolve_iso_period_step()` validates the value against the ISO 8601 duration regex before returning it. An absent or malformed value logs a warning and returns `None`; callers fall back gracefully. Closes review comments from feat: zarr generation improvements and Zarr/GeoZarr concepts doc #129. Long-term migration of remaining `period_type` usages tracked in Replace period_type enum with ISO 8601 duration-driven period handling #136.
  • Removed `ds.chunk("auto")` intermediate step — the auto-chunk block computed dask chunk sizes for all dimensions and then immediately overrode all three with the period-type-specific values. Now `_compute_time_space_chunks` is called directly.

Documentation

  • Why Zarr — chunk-level random access, HTTP-native file serving, cloud compatibility, xarray integration
  • Store layout — flat vs. pyramid stores, where they live on disk
  • Chunk sizing — 512×512 spatial, ISO 8601 duration-based time chunks derived from `extents.temporal.resolution`, rationale for each
  • Multiscale pyramids — the 2048×2048 threshold, level computation formula, Zarr v3 format, and the time-coordinate-at-root workaround for ZarrLayer
  • GeoZarr root attributes — `spatial:bbox`, `proj:code`, `zarr_conventions`; what breaks without them and why the framework writes them
  • CRS handling, zarr serving, map viewer integration, fill value handling

@turban turban changed the title docs: Zarr and GeoZarr concepts page feat: zarr generation improvements and Zarr/GeoZarr concepts doc May 13, 2026
@turban turban requested review from abyot and Copilot May 13, 2026 17:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the platform’s Zarr/GeoZarr ingestion and STAC time-step handling to be ISO-8601-resolution-driven (via extents.temporal.resolution), standardizes store writes on Zarr v3, and adds a developer-facing concepts document describing Zarr/GeoZarr storage and serving.

Changes:

  • Add a new Concepts doc (docs/zarr_and_geozarr.md) and link it in mkdocs.yml.
  • Use shared resolve_iso_step() for STAC temporal step, removing the local period-type lookup.
  • Improve Zarr generation: Zarr v3 for flat stores, larger spatial chunks (512), and ISO-duration-derived time chunk sizing.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
mkdocs.yml Adds “Zarr and GeoZarr” page under Concepts navigation.
docs/zarr_and_geozarr.md New concepts/reference doc explaining Zarr/GeoZarr rationale, layout, chunking, and serving.
climate_api/stac/services.py Uses resolve_iso_step() instead of a period_type→step lookup for STAC temporal step override.
climate_api/shared/time.py Adds ISO-8601 duration parsing and helpers for resolution-driven chunk sizing.
climate_api/data_manager/services/downloader.py Updates chunk sizing logic (512 spatial), removes auto-chunk intermediate, and writes flat stores as Zarr v3.
Comments suppressed due to low confidence (1)

docs/zarr_and_geozarr.md:80

  • This table also uses double pipes (|| ... ||), which can break markdown rendering. Convert it to standard single-pipe markdown table syntax.
| Attribute | Example value | Purpose |
|-----------|--------------|---------|
| `spatial:bbox` | `[270000, 6450000, 1120000, 7950000]` | Bounding box in the native CRS |
| `proj:code` | `EPSG:32633` | CRS of the stored coordinates |
| `zarr_conventions` | `[{...}]` | Convention declarations |

Comment thread climate_api/data_manager/services/downloader.py
Comment thread climate_api/data_manager/services/downloader.py Outdated
Comment thread climate_api/shared/time.py
Comment thread climate_api/stac/services.py
Comment thread docs/zarr_and_geozarr.md Outdated
Comment thread docs/zarr_and_geozarr.md
turban added 20 commits May 14, 2026 09:52
Add _PERIOD_TYPE_ISO_STEP dict and time_chunk_for_iso_step() to shared/time.py
as the single source of truth for period type → ISO 8601 step mapping. Chunk
sizes are now derived from the duration via a tiered formula (~1 week for
sub-daily, ~1 month for daily/sub-weekly, ~1 year for weekly and coarser)
rather than a hardcoded if/elif table.

Remove the duplicate _period_step() from stac/services.py and replace with
the shared period_type_to_iso_step(). Adding a new period type now requires
only an entry in _PERIOD_TYPE_ISO_STEP — chunking and STAC metadata follow
automatically.
…ookup table

Custom datasets can now declare their own temporal step in the template YAML
(period_step: P14D) without requiring any changes to core code. The built-in
lookup table remains as a convenience default for standard period type names.
extents.temporal.resolution is already ISO 8601 and already present in every
template — it is the correct source of truth. The period_step field added in
the previous commit is unnecessary; extents.temporal.resolution covers the
same case, including custom period types in plugin templates.

The built-in lookup table remains as a fallback for templates that predate the
extents.temporal.resolution field. See issue #94 for the longer-term cleanup.
Remove the period_type lookup table fallback. Dataset templates must now
declare extents.temporal.resolution as an ISO 8601 duration. All three
built-in templates already have this field.
Eliminates the v2/v3 inconsistency where flat stores used Zarr v2 (.zmetadata)
and pyramid stores used Zarr v3 (zarr.json). All stores are now v3.

consolidated=True is kept as a performance hint; consolidated metadata in v3 is
an experimental zarr extension already in use for pyramid stores.
Replace period_type table with ISO 8601 duration tier explanation,
remove v2 metadata filename references (.zarray, .zattrs, .zgroup),
and drop the "(or .zattrs for v2)" qualifier now that all stores are v3.
@turban turban force-pushed the docs/zarr-and-geozarr branch from 8c667b2 to ba5cdf7 Compare May 14, 2026 07:53
Comment thread docs/zarr_and_geozarr.md Outdated
| Daily to sub-weekly | 24 h – 168 h | ~1 month | `P1D` (daily) → 30 steps |
| Weekly and coarser | ≥ 168 h | ~1 year | `P1M` (monthly) → 12 steps |

This calculation is fully data-driven: any dataset — including custom or plugin datasets — only needs to declare `extents.temporal.resolution` and the correct chunk size is computed automatically.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean extents.temporal.resolution is required? downloader.py accepts missing resolution it just logs a warning.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the wording to 'When present and valid' and added a note that the fallback to period_type is used when the field is absent or invalid.

_remove_helper_variables(collection_payload)
_round_spatial_steps(collection_payload)
_override_time_step(collection_payload, _period_step(source_dataset.get("period_type")))
_override_time_step(collection_payload, resolve_iso_period_step(source_dataset))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we are consuming whatever extents.temporal.resolution we get from resolve_iso_period_step. We need to validate ... probably best we do this centrally in the shared time.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in time.pyresolve_iso_period_step now validates the string with _iso_step_to_approx_hours before returning it, logging a warning and returning None on invalid input. The call site in stac/services.py is already safe since _override_time_step no-ops on None. The broader period_type migration is tracked in #136.

Validates the ISO 8601 duration string using _iso_step_to_approx_hours
before returning it, logging a warning and returning None on invalid
input rather than passing a malformed string to callers.

Updates docs to reflect that the field is optional with a fallback,
not required. Closes the validation gap noted in PR #129 review.

Ref: #136
@turban turban requested a review from abyot May 15, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants