feat(schema): add rename_column to UpdateSchemaAction#2563
Conversation
|
Implemented most of the schema evolution features in #2451. |
411703e to
c42b685
Compare
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.
API shape:
action.rename(
RenameColumn::builder().name("old").new_name("new").build(),
);
Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.
Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.
Ordering in commit():
1. deletes validated
2. renames validated — runs after deletes so a rename can re-use a
name being deleted, before additions so an addition can re-use a
name being renamed away
3. additions validated + ID-assigned
4. schema tree rebuilt with renames threaded through
rebuild_fields / rebuild_field
Validation rules (match pyiceberg's UpdateSchema.rename_column):
- missing source field → PreconditionFailed
- source field also staged for deletion → PreconditionFailed
- new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
- new_name collides with a non-deleted, non-renamed sibling →
PreconditionFailed
- same field renamed twice → last rename wins
Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.
Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.
Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
c42b685 to
eca2ba4
Compare
|
Thanks for pointing at #2451, @TwinklerG — I went and read through it carefully. After studying both PRs, I think your refactor is the more architecturally sound long-term direction. The operations-buffer-then- To make that easier, I've updated this PR to use the action.rename(RenameColumn::builder().name("old").new_name("new").build())That way this PR functions as a clean stepping stone: callers adopting it today won't need a source change when #2451 lands — only the dispatch underneath We've been carrying this on a fork for a downstream consumer; happy to migrate to #2451's full API as soon as it merges. If contributing test coverage for any of your |
|
Thanks @nazq for taking time to review my PR. Your PR is a brilliant move to ensure a smooth transition. I really appreciate your validation of the architecture in #2451. |
|
Hey @TwinklerG, PR'ed the first batch as TwinklerG#1 against #2451's branch. Three of the #[ignore = "not yet implemented"] stubs now have real test bodies and pass against tests
~6 more of the ignored tests look doable against the current implementation without needing new features (add_nested_list_of_structs, the *_preserves_other_metadata family, delete_fields). The remaining ~6 are the case-insensitive variants, which depend on a flag schema_update doesn't yet thread through name resolution — that's a real implementation contribution, not just a test. I have some free time later this afternoon so drop a comment if this direction looks sound. Cheers |
Thanks @nazq for the contribution. I will review it soon. |
Which issue does this PR close?
What changes are included in this PR?
Adds column rename to
UpdateSchemaAction, completing the add/delete/rename triad for non-incompatible schema evolution intransaction::update_schema.Relationship to #2451 — the bigger refactor
Since opening this PR, @TwinklerG's #2451 has proposed a broader refactor of
UpdateSchemaActioninto an operations-buffer model that subsumes rename plusupdate_column,move, type promotion, andallow_incompatible_changes— the full PyIceberg / JavaUpdateSchemasurface that #697 calls for. I think #2451 is the architecturally sound long-term direction, mirroring what @Fokko built in pyiceberg.To make this PR a clean stepping stone rather than a divergent path, I've aligned the rename API to #2451's
RenameColumntyped-builder shape. The user-facing call is now:Identical to how rename works under #2451, so callers who adopt this PR today won't need a source change when #2451 lands — only the dispatch underneath the
.rename()method changes (from imperative-vec to operations-buffer). Field IDs are preserved across rename; only the leaf name changes.Implementation
A new
renames: Vec<(String, String)>accumulator onUpdateSchemaAction, plus a new step incommit()between deletes and additions:The renames are resolved to a
HashMap<i32, String>keyed by field ID, then threaded throughrebuild_fields/rebuild_field— same recursive walk that handles deletes and nested additions. Each rebuild call site that previously copiedfield.name.clone()now looks up the rename map first and falls back to the original name. Identifier-field handling falls out for free because Rust keysidentifier_field_idsby ID, not name — the existingwith_identifier_field_ids(base_schema.identifier_field_ids())step propagates the set unchanged.RenameColumnitself is aTypedBuilderstruct withname/new_namefields, bothsetter(into), re-exported fromiceberg::transactionalongsideAddColumn. Same shape as #2451'sRenameColumn.Semantics
Modeled on
pyiceberg.table.update.schema.UpdateSchema.rename_column:namemust exist in the current schema → otherwisePreconditionFailed.PreconditionFailed(intent is ambiguous; matches PyIceberg).new_namecannot containSCHEMA_NAME_DELIMITER→PreconditionFailed(the unqualified-name requirement; preventing it from silently looking like a move-across-structs).new_namecannot collide with a sibling that is not itself being deleted or renamed away →PreconditionFailed.Scope
Rename only. PyIceberg's
UpdateSchemahas other ops (update_column,make_column_required,move_*,set_identifier_fields) — those are #2451's scope, not this PR's. Treat this as the rename on-ramp; treat #2451 as the destination.Are these changes tested?
Yes — 10 new unit tests in
transaction::update_schema::tests, matching the style and depth of the existingtest_*_add_column*/test_*_delete_column*tests:test_rename_columntest_rename_nested_columnperson.name→person.fullname), preserves sibling fieldstest_rename_missing_column_failsPreconditionFailedtest_rename_and_delete_same_column_failsPreconditionFailedtest_rename_to_existing_sibling_failsPreconditionFailedtest_rename_path_with_dot_in_new_name_failsnew_namerejected →PreconditionFailedtest_rename_preserves_identifier_fieldtest_rename_frees_name_for_additionz→z_oldthen add newzin same actiontest_rename_same_column_twice_last_winstest_rename_self_is_noopz→zis a no-op (no schema-version bump emitted)All 18 existing
transaction::update_schematests still pass. Full iceberg lib suite: 1306/1306. Clippy + rustfmt clean.We've been carrying this on a fork branch for a downstream consumer; happy to migrate that consumer to #2451's API surface as soon as it lands.