Skip to content

[python] introduce BlobConsumer mirroring Java module#8105

Merged
JingsongLi merged 3 commits into
apache:masterfrom
steFaiz:py_blob_consumer
Jun 3, 2026
Merged

[python] introduce BlobConsumer mirroring Java module#8105
JingsongLi merged 3 commits into
apache:masterfrom
steFaiz:py_blob_consumer

Conversation

@steFaiz
Copy link
Copy Markdown
Contributor

@steFaiz steFaiz commented Jun 3, 2026

Purpose

This PR mirrors the java side BlobConsumer introduced in #7074

The same restriction:

  1. If blob consumer is not NULL, temporary files will not be deleted on abortion.

Tests

@JingsongLi
Copy link
Copy Markdown
Contributor

Are there any application scenarios?

@steFaiz
Copy link
Copy Markdown
Contributor Author

steFaiz commented Jun 3, 2026

Are there any application scenarios?

I think this is currently a rather temporary solution. The existing raw data tables (the original tables) are typically built on an ODPS + OSS pipeline, and there's a highly complex downstream dependency chain—for instance, dozens of ODPS tables might depend on this single raw table.

After switching the original table to Paimon, downstream odps tables could not be replaced by paimon immediately. We need to gradually switching the whole chain:

Like:

  1. The original source odps + oss is replaced by odps + paimon:
    a. original data are double writed to both Paimon and Odps
    b. previous odps stores structured columns + oss path
    c. now odps stores structured columns + paimon Blob Descriptor
  2. downstreams odps just need to change the parse logic:
    From parsing oss path to parsing paimon BlobDescriptors
  3. Gradually replace all odps tables with paimon tables.

BlobConsumer is just for the first step: after writing a batch of paimon records, we could write the blob descriptors into odps immediately.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the Python BlobConsumer path. I left two comments about the new tests so the coverage actually protects the lifecycle behavior.

Comment thread paimon-python/pypaimon/tests/blob_table_test.py
Comment thread paimon-python/pypaimon/tests/blob_table_test.py Outdated
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The previous test-placement and abort-coverage issues are fixed now, and I verified the new BlobConsumer tests locally with python3 -m unittest pypaimon.tests.blob_table_test.BlobConsumerTest.

@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 4a71298 into apache:master Jun 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants