[python] introduce BlobConsumer mirroring Java module#8105
Conversation
|
Are there any application scenarios? |
I think this is currently a rather temporary solution. The existing raw data tables (the original tables) are typically built on an ODPS + OSS pipeline, and there's a highly complex downstream dependency chain—for instance, dozens of ODPS tables might depend on this single raw table. After switching the original table to Paimon, downstream odps tables could not be replaced by paimon immediately. We need to gradually switching the whole chain: Like:
BlobConsumer is just for the first step: after writing a batch of paimon records, we could write the blob descriptors into odps immediately. |
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks for adding the Python BlobConsumer path. I left two comments about the new tests so the coverage actually protects the lifecycle behavior.
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous test-placement and abort-coverage issues are fixed now, and I verified the new BlobConsumer tests locally with python3 -m unittest pypaimon.tests.blob_table_test.BlobConsumerTest.
|
+1 |
Purpose
This PR mirrors the java side
BlobConsumerintroduced in #7074The same restriction:
Tests