Skip to content

geotiff: cross-library accuracy test (rasterio vs xrspatial vs zarr) (#1961)#1963

Merged
brendancol merged 2 commits into
mainfrom
issue-1961
May 15, 2026
Merged

geotiff: cross-library accuracy test (rasterio vs xrspatial vs zarr) (#1961)#1963
brendancol merged 2 commits into
mainfrom
issue-1961

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1961.

Adds xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py.
Reads the same TIFF with rasterio, open_geotiff, and a to_zarr/open_zarr
round trip, and asserts pixel values, shape, dtype, nodata sentinel, x/y
coords, CRS (normalised through _resolve_crs_to_wkt), and the affine
transform agree.

Cases:

rasterio and zarr go through pytest.importorskip so the CPU-only matrix
without those deps stays green.

Test plan

  • pytest xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py - 8 passed

Copilot AI review requested due to automatic review settings May 15, 2026 17:05
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: cross-library accuracy test (rasterio vs xrspatial vs zarr) (#1961)

Blockers (must fix before merge)

  • None

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py:292TestMultibandUint16PerBandNodata advertises per-band nodata but writes a single shared nodata=0 via rasterio's dataset-level parameter. Either rename the class to TestMultibandUint16SharedNodata, or set per-band nodata via dst.nodatavals / dst.update_tags and assert the reader picks up the distinct values.
  • xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py:33-34pytest.importorskip('rasterio') and pytest.importorskip('zarr') work today but pytest 9.1 deprecates implicit ImportError matching; passing exc_type=ImportError makes the skip stable across pytest upgrades. Worth aligning all geotiff importorskip calls in a separate sweep.

Nits (optional improvements)

  • xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py:214-217 — the nodata equality check relies on np.isnan(xrs.attrs.get('nodata', 0.0)) to dodge np.isnan(None). Pulling the value into a local first and gating on is not None reads more clearly than relying on the default.
  • xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py:340-368 — the south-up case uses positive pixel_height=+1.0, which rasterio reports but most GeoTIFF writers never produce. Worth a one-line comment noting this is a synthetic case, since real south-up data tends to come from non-standard sources.
  • xrspatial/geotiff/tests/test_round_trip_parity_rasterio_zarr_1961.py:46ATOL_FLOAT32 = 0.0 is fine for the round-trip path but the constant name implies a dtype-specific tolerance. Rename to ATOL_PIXEL = 0.0 or inline it where used.

What looks good

  • The three-way structure (rasterio raw read, xrspatial reader, xrspatial -> zarr -> xarray reopen) is a real cross-validation, not self-comparison.
  • Coord comparisons rebuild pixel-centre coords from the rasterio transform rather than reusing xrspatial's coords as the reference.
  • CRS comparison goes through pyproj.CRS equality, which dodges WKT string drift across PROJ versions.
  • Integer vs float nodata semantics are handled separately and the dtype promotion expectations match the reader's documented behaviour.
  • Temp filenames use the tmp_1961_* prefix, matching the project convention for collision-resistant temp paths.
  • Each test class targets a case that has regressed before (stripes geotiff: 1xN / Nx1 georeferenced writes silently strip transform #1945, no-georef geotiff: to_geotiff turns no-georef int64 pixel coords into a fake transform on round-trip #1949, south-up, tiled COG).

Cross-library parity notes

The test reads each fixture with rasterio.open(...).read() to get a reference array, transform, CRS, and nodata that come from a different code path than xrspatial's reader. Pixel arrays go through _assert_pixels_equal with NaN-position equality, transform tuples go through _assert_transforms_match against the rasterio Affine, and CRS goes through pyproj equality. The Zarr leg writes the xrspatial DataArray, reopens it, and re-checks pixels, coords, and the crs / crs_wkt / nodata / transform attrs survive the trip. Coverage gaps worth noting for follow-up: no compressed inputs (LZW, Deflate, predictor=2/3), no overviews, no signed integer dtypes, no datetime or extra-dim coords on the zarr round-trip, and the multi-band case does not actually exercise per-band nodata despite the class name. The parity guarantees are meaningful for the cases listed.

Checklist

  • Algorithm matches reference/paper
  • All implemented backends produce consistent results (numpy only; no cupy / dask leg)
  • NaN handling is correct
  • Edge cases are covered by tests
  • Dask chunk boundaries handled correctly (not exercised)
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed (not needed for a parity test)
  • README feature matrix updated (if applicable) (N/A)
  • Docstrings present and accurate

brendancol added a commit that referenced this pull request May 15, 2026
…skip exc_type)

Renames TestMultibandUint16PerBandNodata to TestMultibandUint16SharedNodata
since GDAL_NODATA is a single dataset-level tag and rasterio's GTiff path
collapses per-band writes to one value. Adds a docstring explaining the
format limitation.

Passes exc_type=ImportError to the rasterio and zarr importorskip calls so
pytest 9.1 does not raise PytestDeprecationWarning when the package
imports but a native extension is missing.
@brendancol
Copy link
Copy Markdown
Contributor Author

Addressed the review in 0da084c:

  • Renamed TestMultibandUint16PerBandNodata to TestMultibandUint16SharedNodata. The GTiff format stores one GDAL_NODATA tag per file, and rasterio's _set_nodatavals overwrites that single tag on disk, so a real per-band sentinel cannot round trip through this format. Docstring explains the limitation.
  • Passed exc_type=ImportError to both pytest.importorskip calls.

All 8 tests pass locally.

@brendancol
Copy link
Copy Markdown
Contributor Author

Rebased onto main at 9305773 to pick up the int-coord hotfix from #1969.

brendancol added a commit that referenced this pull request May 15, 2026
…skip exc_type)

Renames TestMultibandUint16PerBandNodata to TestMultibandUint16SharedNodata
since GDAL_NODATA is a single dataset-level tag and rasterio's GTiff path
collapses per-band writes to one value. Adds a docstring explaining the
format limitation.

Passes exc_type=ImportError to the rasterio and zarr importorskip calls so
pytest 9.1 does not raise PytestDeprecationWarning when the package
imports but a native extension is missing.
…1961)

Reads the same TIFF three ways (rasterio, open_geotiff, to_zarr/open_zarr)
and asserts they agree on pixels, coords, CRS, transform, and nodata.
Covers cases that have drifted before: float32 with non-NaN nodata,
multi-band uint16 with per-band nodata, north-up vs south-up, 1xN/Nx1
stripes (#1945), tiled no-overview, and no-georef integer coords (#1949).
rasterio and zarr are gated with importorskip.
…skip exc_type)

Renames TestMultibandUint16PerBandNodata to TestMultibandUint16SharedNodata
since GDAL_NODATA is a single dataset-level tag and rasterio's GTiff path
collapses per-band writes to one value. Adds a docstring explaining the
format limitation.

Passes exc_type=ImportError to the rasterio and zarr importorskip calls so
pytest 9.1 does not raise PytestDeprecationWarning when the package
imports but a native extension is missing.
@brendancol brendancol merged commit 5ffc763 into main May 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: cross-library accuracy test (rasterio vs xarray-spatial vs zarr round-trip)

2 participants