Skip to content

geotiff: cubic overview on integer COG masks sentinel before zoom (#1975)#1979

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-15-1778875892
May 16, 2026
Merged

geotiff: cubic overview on integer COG masks sentinel before zoom (#1975)#1979
brendancol merged 3 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-15-1778875892

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Fixes geotiff: cubic overview on integer COG poisons pixels near nodata border #1975_block_reduce_2d's cubic branch in
    xrspatial/geotiff/_writer.py gated sentinel masking on
    arr2d.dtype.kind == 'f', so integer rasters with a finite nodata
    sentinel hit an unmasked scipy.ndimage.zoom(...) and the bicubic
    spline blended the sentinel value into neighbouring cells.
  • Symptom: on a 1024x1024 int16 raster with a 256x256 nodata corner
    and nodata=-9999, the level-1 overview boundary at row 128 read
    [1082, 1082, 1085, 1134, 5, 93, 100, 100] against an actual data
    value of 100, with values as low as -11104 near the border. The
    read-side int-to-NaN mask only matches exact sentinel hits, so the
    poisoned values survived as legitimate measurements.
  • Same root cause as COG cubic overview poisoned by nodata sentinel #1623 (float cubic + nodata) but for the integer
    dtype branch. Both CPU and GPU writers were affected because
    _block_reduce_2d_gpu's cubic path falls back to the CPU helper.

Fix

The cubic branch now mirrors the float branch for integer inputs:

  1. Gate on np.isfinite(nodata), integer-valued, and in-range for the
    source dtype (matches _int_nodata_in_range in _reader.py).
  2. Promote the cropped block to float64 so NaN can survive the spline.
  3. Mask the sentinel to NaN.
  4. Run scipy.ndimage.zoom(... prefilter=False) so a single NaN does
    not poison the entire row/column.
  5. Rewrite NaN back to the sentinel.
  6. np.round(...).astype(source_int_dtype) so the integer cast is
    well-defined, matching the existing integer-mean/min/max/median tail.

Test plan

  • 12 regression tests in
    xrspatial/geotiff/tests/test_cog_cubic_int_overview_nodata_1975.py:
    • Helper-level cubic per int dtype (int16, uint16, int32) with finite
      sentinel produces only 100 or the sentinel — no ringing.
    • No-nodata regression: cubic on int still runs the plain zoom path.
    • Out-of-range sentinel (e.g. uint16 + nodata=-1) is a no-op.
    • Fractional sentinel on integer dtype is a no-op.
    • All-sentinel block rounds back to the sentinel.
    • Float cubic regression guard (COG cubic overview poisoned by nodata sentinel #1623 contract preserved).
    • End-to-end 1024x1024 to_geotiff / open_geotiff round-trip:
      lvl1 finite values are exactly 100.
    • Non-constant int + cubic + no nodata: dtype preserved, shape halved.
    • Cubic-vs-mean parity: identical nodata mask, identical finite cells.
    • GPU/CPU byte parity: GPU cubic falls back to the same CPU helper.
  • Full geotiff suite re-run: 3186 passed, 11 skipped. Two
    pre-existing failures (test_predictor2_big_endian_gpu_1517,
    test_size_param_validation_gpu_vrt_1776) are unrelated and
    already documented in the sweep state CSV.

)

_block_reduce_2d's cubic branch in xrspatial/geotiff/_writer.py gated
the sentinel-to-NaN mask on arr2d.dtype.kind=='f'. For integer rasters
the function fell through to an unmasked scipy.ndimage.zoom(arr2d, 0.5,
order=3), and the bicubic spline blended the sentinel value (e.g.
-9999) into neighbouring valid cells. Cast back to the source integer
dtype, boundary pixels surfaced as silent garbage; on a 1024x1024 int16
test the boundary at lvl1 row 128 read [1082, 1082, 1085, 1134, 5, 93,
100, 100] against an actual data value of 100. The read-side int-to-NaN
mask only catches exact sentinel hits, so the poisoned values survive
the round-trip as legitimate measurements.

Same root cause as #1623 (float cubic + nodata) but for the integer
dtype branch. Both CPU and GPU writers affected because
_block_reduce_2d_gpu's cubic path falls back to _block_reduce_2d on CPU.

Fix mirrors the float branch: promote the cropped block to float64 so
NaN can carry through the spline, mask the sentinel to NaN via the
integer-range guard (matches _int_nodata_in_range in _reader.py), run
zoom(... prefilter=False) so a single NaN does not poison the entire
row/column, rewrite NaN back to the sentinel, then np.round(...).astype
to the source integer dtype so the cast is well-defined.

12 regression tests in test_cog_cubic_int_overview_nodata_1975.py:
helper-level cubic per int dtype (int16, uint16, int32), no-nodata
regression, out-of-range sentinel no-op, fractional sentinel no-op,
all-sentinel block fallback, float cubic regression guard, end-to-end
1024x1024 round-trip, non-constant int regression, cubic-vs-mean
sentinel-mask parity, and GPU/CPU byte parity.

Closes #1975.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
@brendancol brendancol requested a review from Copilot May 15, 2026 20:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

The integer sentinel-to-NaN gate (np.isfinite + is_integer + dtype-range
check) now appears three times in _block_reduce_2d after the cubic-int
fix: the new cubic branch, the mean/median int mask, and the
post-reduction NaN rewrite. Extract _resolve_int_nodata(dtype, nodata)
so the three sites stay in sync if the contract changes again.

Also extend the _block_reduce_2d docstring's cubic paragraph to
document the integer branch (float64 promotion + np.round-then-cast)
alongside the existing #1623 float-branch description.

No behaviour change; the 12 regression tests in
test_cog_cubic_int_overview_nodata_1975.py still pass, and the wider
overview/nodata test set runs 154 passed.
@brendancol brendancol merged commit 279c402 into main May 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: cubic overview on integer COG poisons pixels near nodata border

2 participants