TIFF to GPU memory via cog3pio backend entrypoint by weiji14 · Pull Request #81 · xarray-contrib/cupy-xarray

weiji14 · 2026-03-05T02:27:41Z

Read TIFF data into GPU memory inside an xarray data structure via cog3pio's experimental CudaCogReader struct that uses nvTIFF as its backend. Based on a proof-of-concept I got working at weiji14/cog3pio#71. Would like it to live in cupy-xarray instead 😃

Need to install cog3pio with the 'cuda' feature flag enabled like so to get the:

MATURIN_PEP517_ARGS="--features cuda,pyo3" pip install  -v "cog3pio[cuda] @ git+https://github.com/weiji14/cog3pio.git@178a3ffb8163c97f7af9e71bc68b6545a4e8e192"

Notes:

No direct-to-gpu is happening. As far as I can tell, compressed data still goes through CPU RAM, and is sent to the GPU to be decompressed with nvCOMP, which is all handled by nvTIFF
Decoded output is a 1-D array, so need to do some reshaping to 3-D (Channels, Height, Width). May need to double check dim order for different TIFF planar configurations
NaN data is not handled properly yet

TODO:

Initial implementation
Add more unit tests
Refactor to use rasterix for coordinate labels?
Document installation better (might need special handling for nvTIFF)
Polish API documentation with proper intersphinx links
Create tutorial showing example usage
Remove circular dependency on cog3pio side requiring cupy-cuda13x

References:

https://discourse.pangeo.io/t/decode-geotiff-to-gpu-memory

Read TIFF data into xarray via cog3pio's experimental CudaCogReader struct that uses nvTIFF as its backend. Use cupy.from_dlpack to read the DLPack tensor, and reshape the 1-D array into a 3-D array (CHW form), setting the coordinates as appropriate. Added some API docs and basic unit tests. Cherry-picked from weiji14/cog3pio#71

jacobtomlinson · 2026-03-05T10:44:33Z

cupy_xarray/cog3pio.py

+    >>> dataarray: xr.DataArray = xr.open_dataarray(
+    ...     filename_or_obj="https://github.com/OSGeo/gdal/raw/v3.11.0/autotest/gcore/data/byte_zstd.tif",
+    ...     engine="cog3pio",
+    ...     device_id=0,  # cuda:0


How would this be handles on a multi-GPU system? You may want to load many tif files into a dask-cupy-xarray object where different chunks are on different GPUs. This API feels a little inflexible for this use case.

Exactly the feedback I needed! Short answer is: I'm probably gonna change the signature of this parameter to device_id: int | None = None. Where the default of None means to get the 'current device' from cp.cuda.runtime.getDevice().

Longer answer is: I'm currently using nvtiffDecoderCreateSimple() which uses the default memory allocator. The multi-gpu case would probably mean I need to use nvtiffDecoderCreate instead that allows a custom device allocator, which I presume dask will have some way of handling. I see dask's scope as more to do with parallel compute, not I/O from a file format, so would appreciate any advice here (the xarray <-> dask integration piece has always felt very CPU-centric to me 🙂)

Note
Alternatively, I also considered having the parameter as just device to take in a cupy.cuda.Device object. I didn't go with this option (yet) because I'd prefer to have something more cross-framework (e.g. allow torch.cuda.device or tf.device) to get the device_id, something touched on in data-apis/array-api#972 which proposes a __dlpack_device__() protocol.

the default of None means to get the 'current device'

This would probably be fine for a multi-GPU setup. Generally the NVIDIA_VISIBLE_DEVICES env var is set to a unique index for each worker (in Dask this is something dask_cuda.LocalCUDACluser and dask_cuda.CUDAWorker handle), so when a worker uses the "current device" it would be different for each worker.

I see dask's scope as more to do with parallel compute, not I/O from a file format

It's just a task scheduler with some high-level collections. It doesn't matter if the task is compute, IO or anything else (is there anything else? 😅). But overall you need to think about how the high-level collection object filters down to the lower level Dask calls.

If I have a VM with four GPUs, and I call something along the lines of xr.open_mfdataset(filename_or_obj="mytiffs/*.tiff", engine="cog3pio") you want to avoid being explicit with the device otherwise everything will end up on one device and wasting the other three.

the xarray <-> dask integration piece has always felt very CPU-centric to me 🙂

It's true that dask-cuda is a separate package that adds GPU logic to Dask. But GPUs are well supported in Dask today. There may just be work to be done wiring things up to collections like xarray.

weiji14 added this to the 0.2.0 milestone Mar 5, 2026

weiji14 self-assigned this Mar 5, 2026

weiji14 added the enhancement New feature or request label Mar 5, 2026

weiji14 changed the title ~~Implement cog3pio backend entrypoint to read TIFFs~~ TIFF to GPU memory via cog3pio backend entrypoint Mar 5, 2026

weiji14 mentioned this pull request Mar 5, 2026

💥 Support reading into CuPy arrays via cog3pio xarray backend weiji14/cog3pio#71

Draft

5 tasks

jacobtomlinson reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TIFF to GPU memory via cog3pio backend entrypoint#81

TIFF to GPU memory via cog3pio backend entrypoint#81
weiji14 wants to merge 1 commit intomainfrom
cog3pio-backend

weiji14 commented Mar 5, 2026 •

edited

Loading

Uh oh!

jacobtomlinson Mar 5, 2026

Uh oh!

weiji14 Mar 5, 2026

Uh oh!

jacobtomlinson Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weiji14 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobtomlinson Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

weiji14 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weiji14 commented Mar 5, 2026 •

edited

Loading