-
Notifications
You must be signed in to change notification settings - Fork 139
Add support for opening structured dtypes as void for zarr driver
#272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for opening structured dtypes as void for zarr driver
#272
Conversation
|
Ok, I imported this locally and am looking at a few things.
|
…ussion_r2663351949` and extend test coverage
…ussion_r2663353574` and extend test coverage
My bad, I don't know how I missed the compilation issue for the spec test. I've fixed the issue and resolved your comments. Thanks for taking a look and getting back to this so fast! |
…ges/BASE..a42b6f511375dc1bd402ad525519f57210546735#r2666111665` Enforce schema validation for one-of `field` and `open_as_void`
…ussion_r2666195572` Use synthesized open_as_void field
|
Sorry for the delay responding to the latest round of feedback. I believe everything raised so far has been addressed. |
tensorstore/driver/zarr/driver.cc
Outdated
| // since we're treating the data as raw bytes regardless of the actual dtype. | ||
| // Shape is allowed to differ (handled by base class for resizing). | ||
| // Other fields like compressor, order, chunks must still match. | ||
| if (existing_metadata.dtype.bytes_per_outer_element != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could just rely on the normal validate but applied to the void metadata.
…684331789` and `https://github.com/google/tensorstore/pull/272#discussion_r2684406823` Guard cache with `absl::call_once` for thread safety
…683758437` Document and enforce `selected_field` and `open_as_void` exclusivity
…684312496` Validate schema against `open_as_void` metadata but `partial_metadata` against regular metadata
…684307471` Allow original metadata to cache pointer on first access
…684302699` Rely on normal metadata validation for void metadata
|
I think that this is getting pretty close. I have a few minor edits in my import which I can just add in. Getting jeremy to look over it. Thanks. Edits:
|
Matches the pattern from zarr v2 driver (PR google#272). When both "field" and "open_as_void" are specified in the spec, return an error since these options are mutually exclusive - field selects a specific field from a structured array, while open_as_void provides raw byte access to the entire structure.
The zarr3 URL syntax cannot represent field selection or void access mode. Following the pattern from zarr v2 driver (PR google#272), ToUrl() now returns an error when either of these options is specified instead of silently ignoring them.
…trip Following the pattern from zarr v2 driver (PR google#272), override GetBoundSpecData in ZarrDataCache to set spec.open_as_void from ChunkCacheImpl::open_as_void_. This ensures that when you open a store with open_as_void=true and then call spec(), the resulting spec correctly has open_as_void=true set. Without this fix, opening a store with open_as_void=true and then getting its spec would lose the open_as_void flag, causing incorrect behavior if the spec is used to re-open the store.
Add comprehensive tests for open_as_void functionality following the patterns from zarr v2 driver (PR google#272): Tests that PASS: - OpenAsVoidSimpleType: Verifies simple type arrays can be opened with open_as_void, gaining an extra dimension for bytes - OpenAsVoidSpecRoundtrip: Verifies open_as_void preserved in spec JSON - OpenAsVoidGetBoundSpecData: Verifies spec() on void store returns open_as_void=true (tests the GetBoundSpecData fix) - OpenAsVoidCannotUseWithField: Verifies mutual exclusivity validation - OpenAsVoidUrlNotSupported: Verifies ToUrl() rejects open_as_void - FieldSelectionUrlNotSupported: Verifies ToUrl() rejects selected_field Tests marked TODO (pending codec chain implementation): - OpenAsVoidStructuredType - OpenAsVoidWithCompression - OpenAsVoidReadWrite - OpenAsVoidWriteRoundtrip Also fixes BUILD file: adds :metadata dependency to :chunk_cache target to provide the dtype.h header that chunk_cache.h includes.
The codec chain is prepared for the original dtype and chunk shape (without the extra bytes dimension). For void access: DecodeChunk: - Strip the bytes dimension from grid's chunk_shape to get original shape - Decode using the original codec shape - Reinterpret the decoded bytes as [chunk_shape..., bytes_per_elem] EncodeChunk: - Input has shape [chunk_shape..., bytes_per_elem] of byte_t - Create a view with the original chunk shape and element_size - Encode using the original codec This follows the pattern from zarr v2 (PR google#272) where the void metadata has the chunk_layout computed to match encoded/decoded layouts.
For void access, the codec handling differs between: - Non-structured types: codec prepared for [chunk_shape] with original dtype Need to decode/encode then reinterpret bytes. - Structured types: codec already prepared for [chunk_shape, bytes_per_elem] with byte dtype. Just decode/encode directly. Add original_is_structured parameter to cache constructors to properly distinguish these cases in DecodeChunk and EncodeChunk. This follows the pattern from zarr v2 (PR google#272) where CreateVoidMetadata() creates a modified metadata for void access.
Apply changes based on feedback from google#272
Supersedes #264
Removes
bool open_as_void = falsedefault in C++ function declarations and definitions.Uses a derived DataCache implementation similar to strategy used in
neuroglancer_precomputeddriver forUnshardedDataCacheandShardedDataCache