Skip to content

[WIP][ray] Ray merge into#8028

Draft
XiaoHongbo-Hope wants to merge 10 commits into
apache:masterfrom
XiaoHongbo-Hope:ray_merge_into
Draft

[WIP][ray] Ray merge into#8028
XiaoHongbo-Hope wants to merge 10 commits into
apache:masterfrom
XiaoHongbo-Hope:ray_merge_into

Conversation

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

Purpose

Tests

Pythonic MERGE INTO on Ray Datasets, mirroring Spark/Flink merge-into.
UPSERT-flavored clauses (matched-update, not-matched-insert,
not-matched-by-source-update) supported; DELETE raises NotImplementedError
pending KeyValueDataWriter row-kind work.

API:
    from pypaimon.ray import merge_paimon
    merge_paimon(target, source, catalog_options,
                 on=[...],
                 when_matched_update={...},
                 when_not_matched_insert="*")

Algorithm: read target -> tag _side -> union -> groupby(on).map_groups
to classify matched/not-matched and apply SET; write back via write_paimon
(PK upsert through _SEQUENCE_NUMBER).

Known bugs to fix in follow-up:
- _schema_type_map referenced but never defined (NameError on call)
- for f in batch.schema iterates pa.Schema (TypeError on pyarrow >= 18)
- type-mismatch fallback to pa.null() destroys join keys
- test helper _make_pk_table_with_flag returns 1 value, test unpacks 2
- _schema_type_map called but undefined: NameError on any cross-schema merge.
- for f in batch.schema raises TypeError on pyarrow >= 18.
- type-mismatch fallback to pa.null() drops join key values.
- _make_pk_table_with_flag returned 1 value but caller unpacks 2.
…rop API

- pa.Table.drop deprecated in newer pyarrow; switch to drop_columns.
- matched branch silently produced cartesian product on multiple source rows.
- _required_target_cols_for_passthrough widened projection to all columns
  when its spec was None, defeating the projection optimization.
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python][ray] Ray merge into [WIP][ray] Ray merge into May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant