Skip to content

[python][ray] Support partial SET and INSERT in merge_into#8085

Open
XiaoHongbo-Hope wants to merge 8 commits into
apache:masterfrom
XiaoHongbo-Hope:ray_merge_into_support_mapping
Open

[python][ray] Support partial SET and INSERT in merge_into#8085
XiaoHongbo-Hope wants to merge 8 commits into
apache:masterfrom
XiaoHongbo-Hope:ray_merge_into_support_mapping

Conversation

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

Purpose

Tests

@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the ray_merge_into_support_mapping branch 4 times, most recently from cd45904 to 30daf8f Compare June 3, 2026 12:18
Allow mapping-based update/insert specs with typed expression API:

  from pypaimon.ray import source_col, target_col, lit

  WhenMatched(update={"age": source_col("age"), "name": target_col("name")})
  WhenNotMatched(insert={"id": source_col("id"), "status": lit("new")})

- Add SourceColumnRef, TargetColumnRef, LiteralValue types
- Add source_col(), target_col(), lit() helpers
- _normalize_set_spec converts all values to typed refs
- Shorthand "s.col"/"t.col" strings still accepted
- Validate keys against target schema, reject callables
- Validate t.col refs exist, reject t.col in insert specs
- Reject empty mapping specs
- Only require referenced source columns, not all columns
- Update docs with expression API examples
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the ray_merge_into_support_mapping branch from 30daf8f to a5cd7e7 Compare June 3, 2026 12:26
- Partial insert specs auto-fill missing ON key columns from source
- Mapping specs with renamed ON keys apply on_map to SourceColumnRef
  so {"id": "s.id"} resolves correctly when on={"id": "uid"}
- Add tests for both scenarios
Raise clear error for unexpected spec value types instead of
implicit None return that would cause confusing downstream errors.
Explicit source_col("id") or "s.id" in mapping specs should refer
to the source column as written, not be silently rewritten via the
ON key rename map. The remap is only needed for update="*" expansion
and insert ON-key auto-fill, not for user-specified mappings.

Add test: source has both uid (ON key) and id columns,
update={"id": source_col("id")} should write source id=999,
not source uid=1.
source_col("id") with on={"id": "uid"} should raise ValueError
when source has no "id" column, not silently remap to "uid".
- t.col only accepted in WhenMatched update, not insert
- Omitted cols: matched preserves target, insert writes NULL
- ON key auto-filled from source in partial insert
Update non-key column (age) instead of ON key (id) to prove
source_col("id") is not remapped, without implying we support
changing join keys.
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review June 3, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant