Skip to content

geotiff: refuse non-numeric nodata / _FillValue early (#1973)#1983

Merged
brendancol merged 2 commits into
mainfrom
issue-1973
May 16, 2026
Merged

geotiff: refuse non-numeric nodata / _FillValue early (#1973)#1983
brendancol merged 2 commits into
mainfrom
issue-1973

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1973.

Summary

  • _validate_nodata_arg rejects non-numeric nodata= kwargs at the writer entry point. to_geotiff, _write_vrt_tiled, and write_geotiff_gpu all call it before any downstream np.isnan usage.
  • _resolve_nodata_attr now raises ValueError naming the offending attribute when attrs['nodata'] or attrs['_FillValue'] is non-numeric, instead of returning the value verbatim and letting NumPy crash later with a generic ufunc TypeError.
  • attrs['nodatavals'] (rioxarray's per-band tuple) keeps its existing skip-on-non-numeric behaviour, since those values often come from arbitrary upstream pipelines and a single bad entry should not block writing.

Test plan

  • New xrspatial/geotiff/tests/test_nodata_validation_1973.py covers _validate_nodata_arg, _resolve_nodata_attr (both attrs), the writer kwarg path, the _FillValue attr path, and the .vrt dispatch path.
  • Full pytest xrspatial/geotiff/tests/ — same 8 pre-existing failures as main, none related.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
attrs['_FillValue']='missing' (or attrs['nodata']='missing', or a
non-numeric nodata= kwarg) used to crash deep inside the writer with
``ufunc 'isnan' not supported``. The TypeError did not name the
offending attribute and gave no hint that nodata was the cause.

Add _validate_nodata_arg in _validation.py and call it from
to_geotiff, _write_vrt_tiled, and write_geotiff_gpu so the kwarg
path is checked at the boundary. Tighten _resolve_nodata_attr to
raise ValueError naming the bad attr when attrs['nodata'] or
attrs['_FillValue'] is non-numeric. The rioxarray-style
attrs['nodatavals'] branch keeps its existing skip-on-non-numeric
behaviour (per-band tuples often carry placeholder entries from
arbitrary upstream pipelines, so a single bad entry should not
block writing).
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review (self-review)

Suggestions

  • Error-type drift versus the pre-existing bool check. _writers/eager.py:261 already rejects nodata=True with TypeError, and _geotags.py:1165 does the same. The new _validate_nodata_arg raises ValueError. In practice the bool check fires first, so nodata=True still raises TypeError from the existing site — but a future caller (write_geotiff_gpu, _write_vrt_tiled) hits my validator before the bool check exists in their path, and a bool would slip through (float(True) == 1.0). Worth adding a isinstance(nodata, bool) branch to _validate_nodata_arg so all three writer entry points reject bool the same way (consistent with _validate_crs_arg in geotiff: reject bool and unresolvable EPSG in writer crs= (#1971) #1978).
  • attrs['nodatavals'] = ('foo', 'bar') (all non-numeric) silently resolves to nodata=None — the per-band skip loop never produces a numeric and the function returns None from the final fall-through. That matches the docstring ("skips non-numeric entries"), but a tuple where every entry is non-numeric is probably user error rather than a legitimate "no sentinel" case. Worth flagging when at least one non-numeric entry was seen and none were usable. Not blocking.

Nits

  • The error messages on the two _resolve_nodata_attr branches (attrs['nodata'] and attrs['_FillValue']) are near-duplicates. Could factor into a helper or share a string. Minor.
  • _FillValue='NaN' (the string) is accepted because float('NaN') succeeds, then treated as NaN and returns None. That's the documented behaviour for NaN values, just worth being aware that a string 'NaN' is now indistinguishable from a real NaN. Probably fine.

What looks good

  • Validator runs at the boundary (to_geotiff, _write_vrt_tiled, write_geotiff_gpu), so the kwarg path is closed at each writer.
  • _resolve_nodata_attr raises with the attr name in the message (attrs['_FillValue']=...), so the caller can tell which attr is wrong.
  • attrs['nodatavals'] keeps its per-entry skip behaviour — correct, since rioxarray's per-band tuples often carry partial sentinels.
  • Tests cover the validator, both attr branches, the kwarg path, and the .vrt dispatch.

Checklist

  • Algorithm matches reference/paper — not applicable
  • All implemented backends produce consistent results — validator runs in eager / GPU / VRT writers
  • NaN handling — _FillValue=NaN correctly returns None
  • Edge cases covered by tests — None, numeric, NaN-string, non-numeric, sequence
  • Dask chunk boundaries — not applicable
  • No premature materialisation
  • No new benchmark needed
  • Docstrings present and accurate
  • nodata=True reaches the new validator unguarded in the GPU and VRT writers — see suggestion

…numeric nodatavals

- add an isinstance(nodata, bool) guard to _validate_nodata_arg so the
  GPU and VRT writers refuse nodata=True with the same TypeError the
  eager writer already raises for #1911; previously float(True) == 1.0
  silently coerced the bool past the numeric branch on those paths
- warn (UserWarning) from _resolve_nodata_attr when every
  attrs['nodatavals'] entry is non-numeric, since a tuple with zero
  usable sentinels is more likely a user error than an intentional
  no-sentinel signal; return value stays None to honour the function's
  skip-on-non-numeric contract
- factor the duplicate non-numeric error string used by the
  attrs['nodata'] and attrs['_FillValue'] branches into a shared
  _nodata_attr_non_numeric_msg helper
- add regression tests for the bool branch (validator + eager / VRT /
  GPU entry points), the all-non-numeric warning, and the no-warn
  paths (usable entry present, all-NaN tuple)
@brendancol brendancol merged commit 1ada658 into main May 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: non-numeric _FillValue crashes with opaque TypeError from np.isnan

1 participant