Skip to content

geotiff: parallelise tile decode in _fetch_decode_cog_http_tiles (#1980)#1981

Merged
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-geotiff-2026-05-15-1778875947
May 16, 2026
Merged

geotiff: parallelise tile decode in _fetch_decode_cog_http_tiles (#1980)#1981
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-geotiff-2026-05-15-1778875947

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • The HTTP COG reader fetched tiles concurrently (issues Fetch COG tiles concurrently in HTTP path to mask RTT #1480, Fetch COG tiles concurrently in HTTP path to mask RTT #1487) but decoded them sequentially in a Python for loop. The sibling local-file path in the same module (_read_tiles) already parallelises decode whenever tile_pixels >= 64 * 1024 and n_tiles > 1; the HTTP path was structurally similar but never adopted the same gate.
  • Mirror the local-path threshold and ThreadPoolExecutor in _fetch_decode_cog_http_tiles. Codec extensions (deflate, zstd, LZW) release the GIL inside their C implementations, so the pool overlaps decode work across cores. The placement loop that writes pixels into result stays serial to avoid contended writes.
  • 5 new tests in test_cog_http_parallel_decode_2026_05_15.py: parallel + serial round-trip correctness, structural pool-instantiation check above the threshold, single-tile path skips the decode pool, _decode_strip_or_tile call count == n_tiles.

Pass 10 of /sweep-performance on the geotiff module. Closes #1980.

Test plan

  • New tests pass: python -m pytest xrspatial/geotiff/tests/test_cog_http_parallel_decode_2026_05_15.py.
  • Existing COG/HTTP suite passes: python -m pytest xrspatial/geotiff/tests/ -k 'cog or http' -> 262 passed.
  • Full geotiff suite passes apart from two pre-existing failures (test_predictor2_big_endian_gpu_1517, test_size_param_validation_gpu_vrt_1776::test_tile_size_positive_works) flagged in prior sweep notes.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
The HTTP COG read path fetches tiles concurrently via
``read_ranges_coalesced`` (xarray-contrib#1480, xarray-contrib#1487) and then decoded them
sequentially in a Python for-loop. Local ``_read_tiles`` already
parallelises decode via ``ThreadPoolExecutor`` whenever
``tile_pixels >= 64 * 1024`` and ``n_tiles > 1`` (see _reader.py:2017);
the HTTP path was structurally similar but never adopted the same
gate, so wide windowed COG reads with many tiles left deflate/zstd
decoding single-threaded after a parallel fetch.

Mirror the local-file path's threshold and pool here. Codec
extensions release the GIL inside their C implementations, so a
``ThreadPoolExecutor`` overlaps decode work across cores. The
placement loop that writes pixels into ``result`` stays serial to
avoid contended writes.

Pass 10 of the geotiff performance sweep.

Tests in test_cog_http_parallel_decode_2026_05_15.py:
- parallel-branch end-to-end correctness (deflate, tile_size=256)
- serial-branch end-to-end correctness (tile_size=128, single tile)
- ThreadPoolExecutor instantiation above the threshold
- single-tile path skips the decode pool
- _decode_strip_or_tile invoked exactly once per placement
@brendancol brendancol force-pushed the deep-sweep-performance-geotiff-2026-05-15-1778875947 branch from 6850a61 to 59ca288 Compare May 16, 2026 01:05
…e paths

Review nits on the parallel-HTTP-decode change:

- Extract the 64K-pixel cutoff into a module-level
  _PARALLEL_DECODE_PIXEL_THRESHOLD constant. The same magic number lived
  in _read_tiles and _fetch_decode_cog_http_tiles; the only docstring
  describing why it was chosen sat next to the local path.

- Lift _decode_one out of the parallel branch in
  _fetch_decode_cog_http_tiles so the serial fallback calls the same
  helper instead of inlining a duplicate _decode_strip_or_tile call.
  Mirrors how _read_tiles defines _decode_one once and reuses it for
  both paths.

- Drop the ThreadPoolExecutor-as-_Pool alias for parity with the local
  path, which imports ThreadPoolExecutor by its real name.

No behaviour change. Existing 268 COG/HTTP and 734 tile/reader tests
still pass; the 6 pre-existing failures (predictor2 big-endian GPU,
tile_size=4 validator) predate this PR per the prior sweep notes.
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: parallelise tile decode in _fetch_decode_cog_http_tiles

Suggestions (should fix, not blocking)

  • _reader.py:2747: the 64K-pixel threshold now appears in two places, _read_tiles (line 2017 pre-PR) and _fetch_decode_cog_http_tiles. Pull it up to a module-level constant so the comment about why the bound is inclusive lives in one spot.
  • _reader.py:2755: _decode_one is only defined inside the if decode_in_parallel: branch, and the else branch inlines the same _decode_strip_or_tile call. _read_tiles defines _decode_one once above the branch and reuses it for both paths.

Nits

  • _reader.py:2752: from concurrent.futures import ThreadPoolExecutor as _Pool. _read_tiles imports it under its real name; small consistency thing.

What looks good

  • The gate matches _read_tiles exactly (n_tiles > 1 and tile_pixels >= 64*1024), so HTTP and local paths agree on when to spin up a pool.
  • Pool is bounded by min(n_decode_tiles, cpu_count()) and cleaned up via with, so no leaked threads on decode error.
  • Placement loop stays serial, so there's no race on result.
  • Tests cover the parallel branch, the serial branch, pool instantiation count, and a monkey-patched call-count check on _decode_strip_or_tile that would catch a dropped or duplicated tile.
  • Test fixtures use a real range-aware HTTP server rather than mocks.

Checklist

  • No algorithmic change; output bytes match the serial path.
  • Backend parity not applicable (HTTP COG path is numpy-only).
  • NaN handling not applicable (decode is byte-exact).
  • Edge cases covered (single-tile path skipped, threshold boundary tested).
  • Dask chunk boundaries not exercised by this PR.
  • No new materialisation or copies.
  • No benchmark added; sweep notes cover the perf budget.
  • README feature matrix N/A.
  • Docstrings present.

I pushed a follow-up commit (164c524) applying the two suggestions and the nit.

@brendancol brendancol merged commit e5da0ef into xarray-contrib:main May 16, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(geotiff): parallelise tile decode in _fetch_decode_cog_http_tiles

1 participant