-
Notifications
You must be signed in to change notification settings - Fork 93
Merge upgraded Type Management System #1299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
dimitri-yatsenko
wants to merge
61
commits into
datajoint:claude/spec-issue-1243-YvqmF
from
dimitri-yatsenko:spec-issue-1243-rebased
Closed
Merge upgraded Type Management System #1299
dimitri-yatsenko
wants to merge
61
commits into
datajoint:claude/spec-issue-1243-YvqmF
from
dimitri-yatsenko:spec-issue-1243-rebased
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…t' into claude/upgrade-adapted-type-1W3ap
Design document for reimplementing blob, attach, filepath, and object types as a coherent AttributeType system. Separates storage location (@store) from encoding behavior.
- Clarify OAS (object type) as distinct system - Propose storing blob@store/attach@store in OAS _external/ folder - Content-addressed deduplication via hash stored in varchar(64) - Propose <ref@store> to replace filepath@store - Add open questions and implementation phases Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- All external storage uses OAS infrastructure - Path-addressed: regular object@store (existing) - Content-addressed: _content/ folder for <djblob@store>, <attach@store> - ContentRegistry table for reference counting and GC - ObjectRef returned for all external types (lazy access) - Deduplication via SHA256 content hash Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- <djblob@store> returns Python object (fetched and deserialized) - <attach@store> returns local file path (downloaded automatically) - Only object@store returns ObjectRef for explicit lazy access - External storage is transparent - @store only affects where, not how Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Three-layer architecture: 1. MySQL types: longblob, varchar, etc. 2. Core DataJoint types: object, content (and @store variants) 3. AttributeTypes: <djblob>, <xblob>, <attach>, <xattach> New core type `content` for content-addressed storage: - Accepts bytes, returns bytes - Handles hashing, deduplication, and GC registration - AttributeTypes like <xblob> build serialization on top Naming convention: - <djblob> = internal serialized (database) - <xblob> = external serialized (content-addressed) - <attach> = internal file - <xattach> = external file Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- content type is single-blob only (no folders) - Parameterized syntax: <type@param> passes param to dtype - Add content vs object comparison table - Clarify when to use each type Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Content-addressed storage is now per-project (not per-schema)
- Deduplication works across all schemas in a project
- ContentRegistry is project-level (e.g., {project}_content database)
- GC scans all schemas in project for references
- Add migration utility for legacy ~external_* per-schema stores
- Document migration from binary(16) UUID to char(64) SHA256 hash
Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Three OAS storage regions:
1. object: {schema}/{table}/{pk}/ - PK-addressed, DataJoint controls
2. content: _content/{hash} - content-addressed, deduplicated
3. filepath: _files/{user-path} - user-addressed, user controls
Upgraded filepath@store:
- Returns ObjectRef (lazy) instead of copying files
- Supports streaming via ref.open()
- Supports folders (like object)
- Stores checksum in JSON column for verification
- No more automatic copy to local stage
Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
filepath changes: - No longer an OAS region - tracks external URIs anywhere - Supports any fsspec-compatible URI (s3://, https://, gs://, etc.) - Returns ObjectRef for lazy access via fsspec - No integrity guarantees (external resources may change) - Uses json core type for storage json core type: - Cross-database compatible (MySQL JSON, PostgreSQL JSONB) - Used by filepath and object types Two OAS regions remain: - object: PK-addressed, DataJoint controlled - content: hash-addressed, deduplicated Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Remove general URI tracker concept from filepath - filepath@store now requires a store parameter and uses relative paths - Key benefit: portability across environments by changing store config - For arbitrary URLs, recommend using varchar (simpler, more transparent) - Add comparison table for filepath@store vs varchar use cases - Update all diagrams and tables to reflect the change Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Remove "core types" concept - all storage types are now AttributeTypes - Built-in AttributeTypes (object, content, filepath@store) use json dtype - JSON stores metadata: path, hash, store name, size, etc. - User-defined AttributeTypes can compose built-in ones (e.g., <xblob> uses content) - Clearer separation: database types (json, longblob) vs AttributeTypes (encode/decode) Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Layer 1: Native database types (FLOAT, TINYINT, etc.) - backend-specific, discouraged Layer 2: Core DataJoint types (float32, uint8, bool, json) - standardized, scientist-friendly Layer 3: AttributeTypes (object, content, <djblob>, etc.) - encode/decode, composable Core types provide: - Consistent interface across MySQL and PostgreSQL - Scientist-friendly names (float32 vs FLOAT, uint8 vs TINYINT UNSIGNED) - Automatic backend translation Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
All AttributeTypes (Layer 3) now use angle bracket syntax in table definitions: - Core types (Layer 2): int32, float64, varchar(255) - no brackets - AttributeTypes (Layer 3): <object>, <djblob>, <filepath@main> - angle brackets This clear visual distinction helps users immediately identify: - Core types: direct database mapping - AttributeTypes: encode/decode transformation Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Seven-phase implementation plan covering: - Phase 1: Core type system foundation (type mappings, store parameters) - Phase 2: Content-addressed storage (<content> type, ContentRegistry) - Phase 3: User-defined AttributeTypes (<xblob>, <attach>, <xattach>, <filepath>) - Phase 4: Insert and fetch integration (type composition) - Phase 5: Garbage collection (project-wide GC scanner) - Phase 6: Migration utilities (legacy external stores) - Phase 7: Documentation and testing Estimated effort: 24-32 days across all phases Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Phase 1.1 - Core type mappings already complete in declare.py Phase 1.2 - Enhanced AttributeType with store parameter support: - Added parse_type_spec() to parse "<type@store>" into (type_name, store_name) - Updated get_type() to handle parameterized types - Updated is_type_registered() to ignore store parameters - Updated resolve_dtype() to propagate store through type chains - Returns (final_dtype, type_chain, store_name) tuple - Store from outer type overrides inner type's store Phase 1.3 - Updated heading and declaration parsing: - Updated get_adapter() to return (adapter, store_name) tuple - Updated substitute_special_type() to capture store from ADAPTED types - Store parameter is now properly passed through type resolution Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Remove AttributeAdapter class and context-based lookup from attribute_adapter.py - Simplify attribute_adapter.py to compatibility shim that re-exports from attribute_type - Remove AttributeAdapter from package exports in __init__.py - Update tests/schema_adapted.py to use @dj.register_type decorator - Update tests/test_adapted_attributes.py to work with globally registered types - Remove test_attribute_adapter_deprecated test from test_attribute_type.py Types are now registered globally via @dj.register_type decorator, eliminating the need for context-based adapter lookup. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
…ntics Core types (uuid, json, blob) now map directly to native database types without any implicit serialization. Serialization is handled by AttributeTypes like <djblob> via encode()/decode() methods. Changes: - Rename SERIALIZED_TYPES to BINARY_TYPES in declare.py (clearer naming) - Update check for default values in compile_attribute() - Clarify in spec that core blob types store raw bytes Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Major simplification of the type system to two categories: 1. Core DataJoint types (no brackets): float32, uuid, bool, json, blob, etc. 2. AttributeTypes (angle brackets): <djblob>, <object>, <attach>, etc. Changes: - declare.py: Remove EXTERNAL_TYPES, BINARY_TYPES; simplify to CORE_TYPE_ALIASES + ADAPTED - heading.py: Remove is_attachment, is_filepath, is_object, is_external flags - fetch.py: Simplify _get() to only handle uuid, json, blob, and adapters - table.py: Simplify __make_placeholder() to only handle uuid, json, blob, numeric - preview.py: Remove special object field handling (will be AttributeType) - staged_insert.py: Update object type check to use adapter All special handling (attach, filepath, object, external storage) will be implemented as built-in AttributeTypes in subsequent phases. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Core DataJoint types (fully supported, recorded in :type: comments): - Numeric: float32, float64, int64, uint64, int32, uint32, int16, uint16, int8, uint8 - Boolean: bool - UUID: uuid → binary(16) - JSON: json - Binary: blob → longblob - Temporal: date, datetime - String: char(n), varchar(n) - Enumeration: enum(...) Changes: - declare.py: Define CORE_TYPES with (pattern, sql_mapping) pairs - declare.py: Add warning for non-standard native type usage - heading.py: Update to use CORE_TYPE_NAMES - storage-types-spec.md: Update documentation to reflect core types Native database types (text, mediumint, etc.) pass through with a warning about non-standard usage. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Add content-addressed storage with deduplication for the <content> and <xblob> AttributeTypes. New files: - content_registry.py: Content storage utilities - compute_content_hash(): SHA256 hashing - build_content_path(): Hierarchical path generation (_content/xx/yy/hash) - put_content(): Store with deduplication - get_content(): Retrieve with hash verification - content_exists(), delete_content(), get_content_size() New built-in AttributeTypes in attribute_type.py: - ContentType (<content>): Content-addressed storage for raw bytes - dtype = "json" (stores metadata: hash, store, size) - Automatic deduplication via SHA256 hashing - XBlobType (<xblob>): Serialized blobs with external storage - dtype = "<content>" (composition with ContentType) - Combines djblob serialization with content-addressed storage Updated insert/fetch for type chain support: - table.py: Apply encoder chain from outermost to innermost - fetch.py: Apply decoder chain from innermost to outermost - Both pass store_name through the chain for external storage Example usage: data : <content@mystore> # Raw bytes, deduplicated array : <xblob@mystore> # Serialized objects, deduplicated Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
…lization Breaking changes: - Remove attribute_adapter.py entirely (hard deprecate) - Remove bypass_serialization flag from blob.py - blobs always serialize now - Remove unused 'database' field from Attribute in heading.py Import get_adapter from attribute_type instead of attribute_adapter. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Document function-based content storage (not registry class) - Add implementation status table - Explain design decision: functions vs database table - Update Phase 5 GC design for scanning approach - Document removed/deprecated items Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Create builtin_types.py with DJBlobType, ContentType, XBlobType - Types serve as examples for users creating custom types - Module docstring includes example of defining a custom GraphType - Add get_adapter() function to attribute_type.py for compatibility - Auto-register built-in types via import at module load Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Add <object> type for files and folders (Zarr, HDF5, etc.):
- Path derived from primary key: {schema}/{table}/objects/{pk}/{field}_{token}
- Supports bytes, files, and directories
- Returns ObjectRef for lazy fsspec-based access
- No deduplication (unlike <content>)
Update implementation plan with Phase 2b documenting ObjectType.
Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Migration utilities are out of scope for now. This is a breaking change version - users will need to recreate tables with new types. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
- Document staged_insert.py for direct object storage writes - Add flow comparison: normal insert vs staged insert - Include staged_insert.py in critical files summary Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Add remaining built-in AttributeTypes: - <attach>: Internal file attachment stored in longblob - <xattach>: External file attachment via <content> with deduplication - <filepath@store>: Reference to existing file (no copy, returns ObjectRef) Update implementation plan to mark Phase 3 complete. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
Replace pytest-managed Docker containers with external docker-compose services. This removes complexity, improves reliability, and allows running tests both from the host machine and inside the devcontainer. - Remove docker container lifecycle management from conftest.py - Add pixi tasks for running tests (services-up, test, test-cov) - Expose MySQL and MinIO ports in docker-compose.yaml for host access - Simplify devcontainer to extend the main docker-compose.yaml - Remove docker dependency from test requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix Table.table_name property to delegate to metaclass for UserTable subclasses (table_name was returning None instead of computed name) - Fix heading type loading to preserve database type for core types (uuid, etc.) instead of overwriting with alias from comment - Add original_type field to Attribute for storing the alias while keeping the actual SQL type in type field - Fix tests: remove obsolete test_external.py, update resolve_dtype tests to expect 3 return values, update type alias tests to use CORE_TYPE_SQL - Update pyproject.toml pytest_env to use D: prefix for default-only vars Test results improved from 174 passed/284 errors to 381 passed/62 errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Type system changes: - Core type `blob` stores raw bytes without serialization - Built-in type `<djblob>` handles automatic serialization/deserialization - Update jobs table to use <djblob> for key and error_stack columns - Remove enable_python_native_blobs config check (always enabled) Bug fixes: - Fix is_blob detection to include NATIVE_BLOB types (longblob, mediumblob, etc.) - Fix original_type fallback when None - Fix test_type_aliases to use lowercase keys for CORE_TYPE_SQL lookup - Allow None context for built-in types in heading initialization - Update native type warning message wording 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update settings access tests to check type instead of specific value (safemode is set to False by conftest fixtures) - Fix config.load() to handle nested JSON dicts in addition to flat dot-notation keys Test results: 417 passed (was 414) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update GraphType and LayoutToFilepathType to use <djblob> dtype (old filepath@store syntax no longer supported) - Fix local_schema and schema_virtual_module fixtures to pass connection - Remove unused imports Test results: 421 passed, 58 errors, 13 failed (was 417/62/13) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Source code fixes: - Add download_path setting and squeeze handling in fetch.py - Add filename collision handling in AttachType and XAttachType - Fix is_blob detection to check both BLOB and NATIVE_BLOB patterns - Fix FilepathType.validate to accept Path objects - Add proper error message for undecorated tables Test infrastructure updates: - Update schema_external.py to use new <xblob@store>, <xattach@store>, <filepath@store> syntax - Update all test tables to use <djblob> instead of longblob for serialization - Configure object_storage.stores in conftest.py fixtures - Remove obsolete test_admin.py (set_password was removed) - Fix connection passing in various tests to avoid credential prompts - Fix test_query_caching to handle existing directories README: - Add Developer Guide section with setup, test, and pre-commit instructions Test results: 408 passed, 2 skipped (macOS multiprocessing limitation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Override drop_quick() in Imported and Computed to also drop the associated jobs table when the main table is dropped.
Comprehensive test suite for the new per-table jobs system: - JobsTable structure and initialization - refresh() method with priority and delay - reserve() method and reservation conflicts - complete() method with keep option - error() method and message truncation - ignore() method - Status filter properties (pending, reserved, errors, ignored, completed) - progress() method - populate() with reserve_jobs=True - schema.jobs property - Configuration settings
- Remove unused `job` dict and `now` variable in refresh() - Remove unused `pk_attrs` in fetch_pending() - Remove unused datetime import - Apply ruff-format formatting changes
Replace schema-wide `~jobs` table with per-table JobsTable (Autopopulate 2.0): - Delete src/datajoint/jobs.py (old JobTable class) - Remove legacy_jobs property from Schema class - Delete tests/test_jobs.py (old schema-wide tests) - Remove clean_jobs fixture and schema.jobs.delete() cleanup calls - Update test_autopopulate.py to use new per-table jobs API The new system provides per-table job queues with FK-derived primary keys, rich status tracking (pending/reserved/success/error/ignore), priority scheduling, and proper handling of job collisions.
Now that the legacy schema-wide jobs system has been removed, rename the new per-table jobs module to its canonical name: - src/datajoint/jobs_v2.py -> src/datajoint/jobs.py - tests/test_jobs_v2.py -> tests/test_jobs.py - Update imports in autopopulate.py and test_jobs.py
- Use variable assignment for pk_section instead of chr(10) in f-string - Change error_stack type from mediumblob to <djblob> - Use update1() in error() instead of raw SQL and deprecated _update() - Remove config.override(enable_python_native_blobs=True) wrapper Note: reserve() keeps raw SQL for atomic conditional update with rowcount check - this is required for safe concurrent job reservation.
- reserve() now uses update1 instead of raw SQL - Remove status='pending' check since populate verifies this - Change return type from bool to None - Update autopopulate.py to not check reserve return value - Update tests to reflect new behavior
The new implementation always populates self - the target property is no longer needed. All references to self.target replaced with self.
- Inline the logic directly in populate() and progress() - Move restriction check to populate() - Use (self.key_source & AndList(restrictions)).proj() directly - Remove unused QueryExpression import
- Remove early jobs_table assignment, use self.jobs directly - Fix comment: key_source is correct behavior, not legacy - Use self.jobs directly in _get_pending_jobs
Method only called from one place, no need for separate function.
- Remove 'order' parameter (conflicts with priority/scheduled_time) - Remove 'limit' parameter, keep only 'max_calls' for simplicity - Remove unused 'random' import
- Update datetime pattern to allow precision (datetime(6)) - Fix jobs.delete() and jobs.drop() to handle undeclared tables - Fix jobs.ignore() to update existing pending jobs to ignore status - Rename _make_tuples to make (deprecated in AutoPopulate 2.0) - Update conftest.py jobs cleanup to handle list return type - Fix test_jobs_table_primary_key to ensure table is declared first - Broaden exception handling in fixture teardown 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Member
Author
|
Will try a different approach |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.