Skip to content

Commit 40c02ff

Browse files
Add VERSION2.md summarizing key features for 2.0 pre-release
Documents major features: - Redesigned three-layer type system - Object storage (Augmented Schema) - Autopopulate 2.0 jobs system - Semantic matching for joins - Simplified settings management - Python 3.10+ and MySQL 8.0+ requirements Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
1 parent 7bb7d47 commit 40c02ff

File tree

1 file changed

+152
-0
lines changed

1 file changed

+152
-0
lines changed

VERSION2.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# DataJoint 2.0 Pre-Release Notes
2+
3+
This document summarizes the key features and changes in DataJoint 2.0.
4+
5+
## Major Features
6+
7+
### 1. Redesigned Type System
8+
9+
A three-layer type architecture provides clarity and flexibility:
10+
11+
- **Layer 1 - Native Database Types**: Backend-specific types (`FLOAT`, `TINYINT UNSIGNED`, `LONGBLOB`)
12+
- **Layer 2 - Core DataJoint Types**: Scientist-friendly, standardized across backends (`float32`, `uint8`, `bool`, `json`, `int64`, `uint64`)
13+
- **Layer 3 - AttributeTypes**: Composable types with `encode()`/`decode()` semantics using angle bracket syntax (`<djblob>`, `<object>`, `<content>`)
14+
15+
**Key changes:**
16+
- `blob` now stores raw bytes; use `<djblob>` for serialized Python objects
17+
- New type aliases: `int8`, `int16`, `int32`, `int64`, `uint8`, `uint16`, `uint32`, `uint64`, `float32`, `float64`, `bool`
18+
- AttributeTypes are composable: `<xblob>` wraps `<content>` which wraps `<djblob>`
19+
20+
### 2. Object Storage (Augmented Schema)
21+
22+
The new `object` type provides managed file/folder storage tightly coupled with the database:
23+
24+
```python
25+
@schema
26+
class Recording(dj.Manual):
27+
definition = """
28+
subject_id : int
29+
session_id : int
30+
---
31+
raw_data : object # uses default store
32+
published : object@public # uses named store
33+
"""
34+
```
35+
36+
**Features:**
37+
- Supports both files and folders (including Zarr, TileDB)
38+
- Two insert modes: copy (small files) and staged (large objects written directly to storage)
39+
- Automatic deletion when records are deleted
40+
- Content-addressed storage for deduplication (`<content>` type)
41+
- Path-addressed storage for primary-key based organization (`<object>` type)
42+
43+
### 3. Autopopulate 2.0 Jobs System
44+
45+
Per-table job management replaces the schema-wide `~jobs` table:
46+
47+
```python
48+
# Access jobs for a specific table
49+
FilteredImage.jobs
50+
FilteredImage.jobs & 'status="error"'
51+
FilteredImage.jobs.refresh()
52+
```
53+
54+
**Improvements:**
55+
- Rich status tracking: `pending`, `reserved`, `success`, `error`, `ignore`
56+
- Per-table jobs tables with full primary key visibility (no key hashing)
57+
- Priority scheduling and delayed execution support
58+
- Worker tracking: host, PID, connection ID, version
59+
- Automatic stale job cleanup via `refresh()`
60+
61+
### 4. Semantic Matching for Joins
62+
63+
Joins now use **lineage-based matching** instead of pure name matching:
64+
65+
- Attributes are only matched when they share both the same name AND the same origin
66+
- Prevents accidental joins on unrelated attributes that happen to share names
67+
- Enables valid joins that were previously blocked
68+
- Lineage tracked via hidden `~lineage` table per schema
69+
70+
**Example:**
71+
```python
72+
# Old behavior: Would incorrectly join on 'id'
73+
Student * Course # ERROR: Student.id and Course.id have different lineage
74+
75+
# New behavior: Joins proceed when lineage matches
76+
FavoriteCourse * DependentCourse # OK: both course_id trace to Course.course_id
77+
```
78+
79+
### 5. Simplified Settings Management
80+
81+
- Pure Pydantic-based configuration
82+
- Recursive config file search (`datajoint.json`)
83+
- Secrets separation support
84+
- Removed deprecated `save()` methods
85+
86+
### 6. Modern Python and MySQL Support
87+
88+
- **Python 3.10+** required (dropped support for earlier versions)
89+
- **MySQL 8.0+** required (dropped support for earlier versions)
90+
- Source layout (`src/datajoint/`)
91+
- Pre-commit hooks with ruff linting and formatting
92+
93+
## Breaking Changes
94+
95+
### Removed Features
96+
97+
- **`~log` table**: Use Python logging instead
98+
- **Legacy `AttributeAdapter`**: Replaced by `AttributeType` system
99+
- **`blob` serialization**: `blob`/`longblob` now store raw bytes; use `<djblob>` for serialized objects
100+
- **Schema-level jobs table**: Replaced by per-table jobs (`MyTable.jobs`)
101+
- **`AutoPopulate.target` property**: Removed
102+
- **`AutoPopulate._make_tuples`**: Use `make()` method exclusively
103+
- **`dj.config.save()` methods**: Configuration is now read-only at runtime
104+
105+
### API Changes
106+
107+
- `dj.U` join is deprecated (use restriction instead)
108+
- Jobs API moved from `schema.jobs` to `Table.jobs`
109+
- External storage requires explicit store configuration
110+
111+
## Migration Notes
112+
113+
### Blob Columns
114+
115+
Existing `longblob` columns with serialized data should be migrated to `<djblob>`:
116+
117+
```python
118+
from datajoint.migrate import migrate_blob_columns
119+
migrate_blob_columns(schema)
120+
```
121+
122+
### Jobs Tables
123+
124+
The schema-wide `~jobs` table is no longer used. Jobs are now per-table and automatically created when needed. Old `~jobs` data can be manually migrated if historical error logs are needed.
125+
126+
## Configuration
127+
128+
Example `datajoint.json`:
129+
130+
```json
131+
{
132+
"database.host": "localhost",
133+
"database.user": "datajoint",
134+
"database.password": "password",
135+
"stores": {
136+
"main": {
137+
"protocol": "s3",
138+
"endpoint": "s3.amazonaws.com",
139+
"bucket": "my-bucket",
140+
"location": "datajoint-store"
141+
}
142+
}
143+
}
144+
```
145+
146+
## Version
147+
148+
```
149+
2.0.0a0
150+
```
151+
152+
This is an alpha pre-release for testing and integration purposes.

0 commit comments

Comments
 (0)