geotiff: cubic overview on integer COG masks sentinel before zoom (#1975)#1979
Merged
brendancol merged 3 commits intoMay 16, 2026
Merged
Conversation
) _block_reduce_2d's cubic branch in xrspatial/geotiff/_writer.py gated the sentinel-to-NaN mask on arr2d.dtype.kind=='f'. For integer rasters the function fell through to an unmasked scipy.ndimage.zoom(arr2d, 0.5, order=3), and the bicubic spline blended the sentinel value (e.g. -9999) into neighbouring valid cells. Cast back to the source integer dtype, boundary pixels surfaced as silent garbage; on a 1024x1024 int16 test the boundary at lvl1 row 128 read [1082, 1082, 1085, 1134, 5, 93, 100, 100] against an actual data value of 100. The read-side int-to-NaN mask only catches exact sentinel hits, so the poisoned values survive the round-trip as legitimate measurements. Same root cause as #1623 (float cubic + nodata) but for the integer dtype branch. Both CPU and GPU writers affected because _block_reduce_2d_gpu's cubic path falls back to _block_reduce_2d on CPU. Fix mirrors the float branch: promote the cropped block to float64 so NaN can carry through the spline, mask the sentinel to NaN via the integer-range guard (matches _int_nodata_in_range in _reader.py), run zoom(... prefilter=False) so a single NaN does not poison the entire row/column, rewrite NaN back to the sentinel, then np.round(...).astype to the source integer dtype so the cast is well-defined. 12 regression tests in test_cog_cubic_int_overview_nodata_1975.py: helper-level cubic per int dtype (int16, uint16, int32), no-nodata regression, out-of-range sentinel no-op, fractional sentinel no-op, all-sentinel block fallback, float cubic regression guard, end-to-end 1024x1024 round-trip, non-constant int regression, cubic-vs-mean sentinel-mask parity, and GPU/CPU byte parity. Closes #1975.
The integer sentinel-to-NaN gate (np.isfinite + is_integer + dtype-range check) now appears three times in _block_reduce_2d after the cubic-int fix: the new cubic branch, the mean/median int mask, and the post-reduction NaN rewrite. Extract _resolve_int_nodata(dtype, nodata) so the three sites stay in sync if the contract changes again. Also extend the _block_reduce_2d docstring's cubic paragraph to document the integer branch (float64 promotion + np.round-then-cast) alongside the existing #1623 float-branch description. No behaviour change; the 12 regression tests in test_cog_cubic_int_overview_nodata_1975.py still pass, and the wider overview/nodata test set runs 154 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_block_reduce_2d's cubic branch inxrspatial/geotiff/_writer.pygated sentinel masking onarr2d.dtype.kind == 'f', so integer rasters with a finite nodatasentinel hit an unmasked
scipy.ndimage.zoom(...)and the bicubicspline blended the sentinel value into neighbouring cells.
and
nodata=-9999, the level-1 overview boundary at row 128 read[1082, 1082, 1085, 1134, 5, 93, 100, 100]against an actual datavalue of 100, with values as low as
-11104near the border. Theread-side int-to-NaN mask only matches exact sentinel hits, so the
poisoned values survived as legitimate measurements.
dtype branch. Both CPU and GPU writers were affected because
_block_reduce_2d_gpu's cubic path falls back to the CPU helper.Fix
The cubic branch now mirrors the float branch for integer inputs:
np.isfinite(nodata), integer-valued, and in-range for thesource dtype (matches
_int_nodata_in_rangein_reader.py).float64so NaN can survive the spline.scipy.ndimage.zoom(... prefilter=False)so a single NaN doesnot poison the entire row/column.
np.round(...).astype(source_int_dtype)so the integer cast iswell-defined, matching the existing integer-mean/min/max/median tail.
Test plan
xrspatial/geotiff/tests/test_cog_cubic_int_overview_nodata_1975.py:sentinel produces only
100or the sentinel — no ringing.uint16+nodata=-1) is a no-op.to_geotiff/open_geotiffround-trip:lvl1finite values are exactly100.3186 passed, 11 skipped. Twopre-existing failures (
test_predictor2_big_endian_gpu_1517,test_size_param_validation_gpu_vrt_1776) are unrelated andalready documented in the sweep state CSV.