Skip to content

Conversation

@ilan-gold
Copy link
Collaborator

@ilan-gold ilan-gold commented Nov 3, 2025

Calling to_native_dtype + __str__ came up as one of the only python-CPU-bound things when doing some benchmarking. My use-case is quite contrived (generating thousands of WithSubset objects) but I think it's probably worth investigating getting rid of these calls. Some observations:

1. I wonder if all getting the dtype and fill_val be wrapped up in just relying on https://docs.rs/zarrs/latest/zarrs/array/struct.Array.html#method.open and then using the values directly (there are probably other benefits of doing this) but I think this is a separate PR So it turns out the array actually doesn't have to be created yet when the pipeline is generated. So I ended up doing something similar with the metadata (making it an actual Struct and then working with that to get fill and dtype).
2. Regardless, most of this refactor is around removing Basic anyway so that chunk handling is independent of the ability. I noticed that ChunkRepresentation requires ownership over its arguments which means we copy per-chunk. Not sure what would go into making that a reference, but it's no worse than the previous situation where I think we were generating copies repeatedly, but from PyO3 calling python --> Hooray! No longer!

The benefit wasn't crazy ~5% but I think going in this direction is good (see point 1)

@ilan-gold ilan-gold marked this pull request as draft November 3, 2025 14:49
@LDeakin LDeakin mentioned this pull request Jan 1, 2026
@ilan-gold ilan-gold changed the base branch from main to ld/zarrs_0.23.0 January 3, 2026 19:08
@ilan-gold ilan-gold changed the title (feat): remove dtype + fill val handling per chunk perf: remove dtype + fill val handling per chunk Jan 5, 2026
@ilan-gold ilan-gold marked this pull request as ready for review January 5, 2026 11:28
@ilan-gold ilan-gold force-pushed the ig/refactor_chunk_handling branch from c7fe896 to 4eca1b8 Compare February 1, 2026 16:53
Base automatically changed from ld/zarrs_0.23.0 to main February 1, 2026 22:12
LDeakin and others added 3 commits February 2, 2026 15:09
`UserWarning: Array is unsupported by ZarrsCodecPipeline: incompatible fill value metadata: dtype=|V7, fill_value=null`
@ilan-gold ilan-gold requested a review from LDeakin February 2, 2026 10:30
@ilan-gold
Copy link
Collaborator Author

ilan-gold commented Feb 2, 2026

@LDeakin Up to you if we wait for 0.23.1. Thanks for the help here!

@ilan-gold
Copy link
Collaborator Author

Locally I'm getting "DataTypeMetadataV2 doesn't implement std::fmt::Display" which seems incorrect based on https://github.com/zarrs/zarrs/blob/0b36a73f861bc8189ba4cde315630d828db3de80/zarrs_metadata/src/v2/array.rs#L190. Sorry about that

@LDeakin
Copy link
Member

LDeakin commented Feb 2, 2026

Locally I'm getting "DataTypeMetadataV2 doesn't implement std::fmt::Display" which seems incorrect based on https://github.com/zarrs/zarrs/blob/0b36a73f861bc8189ba4cde315630d828db3de80/zarrs_metadata/src/v2/array.rs#L190. Sorry about that

Just run a cargo update and you'll get the latest release of zarrs_metadata

Copy link
Member

@LDeakin LDeakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Ilan! I might rerun some perf benchmarks with this after we release

@ilan-gold
Copy link
Collaborator Author

Just run a cargo update and you'll get the latest release of zarrs_metadata

Tried that out but oddly enough didn't work. I'm cycling through the other options available but at least I see this is a local-only problem now, so I'll handle it.

@ilan-gold ilan-gold merged commit 48ffaa4 into main Feb 2, 2026
17 checks passed
@ilan-gold ilan-gold deleted the ig/refactor_chunk_handling branch February 2, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants