Skip to content

feat(format): partitioned bulk ingest API#4317

Draft
tokoko wants to merge 1 commit into
apache:spec-1.2.0from
tokoko:partitioned-ingest
Draft

feat(format): partitioned bulk ingest API#4317
tokoko wants to merge 1 commit into
apache:spec-1.2.0from
tokoko:partitioned-ingest

Conversation

@tokoko
Copy link
Copy Markdown
Contributor

@tokoko tokoko commented May 17, 2026

demo PR for a new partitioned ingest API.

@lidavidm lidavidm added this to the ADBC API Specification 1.2.0 milestone May 20, 2026
@lidavidm
Copy link
Copy Markdown
Member

Do you want to target this against the spec-1.2.0 branch instead?

Comment on lines +2107 to +2108
struct AdbcConnection* connection, const char* target_catalog,
const char* target_db_schema, const char* target_table, const char* mode,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we use the existing options here instead of duplicating them into the signature

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one caveat there is that existing options are statement-level in practice rather than connection-level. all these new functions are connection-level. are you also suggesting moving all (or some) to statement-level?

/// write; that happens at Commit (commit it) or Abort (drop it).
///
/// \since ADBC API revision 1.2.0
struct AdbcIngestReceipt {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to explain the receipt/handle explicitly here instead of leaving it implicit from the below definitions.

Additionally I think the comments could generally be cleaned up.

Comment on lines +2198 to +2199
/// Abort is best-effort. If cleanup is incomplete, the driver
/// returns a warning status and orphaned storage may remain; it is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "warning status"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would leverage ConnectionSetWarningHandler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant some sort of an "okish" adbc status, but warning handler sounds like a more appropriate solution. unless you think it's not necessary at all.. just documenting it as best-effort could be enough as well.

Comment on lines +2200 to +2201
/// the driver's responsibility to provide housekeeping (e.g. TTL,
/// background GC, or documented manual cleanup). Callers may also
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds more like the semantics of the backend system

Copy link
Copy Markdown
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts.

AdbcStatusCode (*ConnectionWriteIngestPartition)(
struct AdbcConnection*, const uint8_t*, size_t, struct ArrowArrayStream*,
struct AdbcIngestReceipt*, struct AdbcError*);
AdbcStatusCode (*ConnectionCommitIngestPartitions)(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating my comment from the mailing list, is this use of Commit going to be misleading around its relationship with a database transactional commit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about "complete" or maybe "apply"? I'm fine with changing to "complete" if you agree.

Comment on lines +2198 to +2199
/// Abort is best-effort. If cleanup is incomplete, the driver
/// returns a warning status and orphaned storage may remain; it is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would leverage ConnectionSetWarningHandler?

Comment on lines +2202 to +2203
/// call Abort if the coordinator crashed and was restarted without
/// the original receipts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth distinguishing the case of "Abort an unknown handle" from the case of "something went wrong during Abort" with a specific error code for the former?

Comment on lines +990 to +991
struct AdbcIngestHandle;
struct AdbcIngestReceipt;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only role these seem to play is to have an API that allows the corresponding opaque buffer to be deallocated. We're also not getting any type safety from having two of them because both are just byte arrays post-serialization. Could we simplify and generalize a little and declare a single type for this e.g. AdbcSerializableHandle?

/// point to an AdbcDriver.
///
/// \since ADBC API revision 1.2.0
#define ADBC_VERSION_1_2_0 1002000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please resubmit this as a PR against the spec-1.2.0 branch where this should already be defined?

@tokoko tokoko force-pushed the partitioned-ingest branch from 95d027c to 310e138 Compare May 29, 2026 04:11
@tokoko tokoko changed the base branch from main to spec-1.2.0 May 29, 2026 04:11
@tokoko
Copy link
Copy Markdown
Contributor Author

tokoko commented May 29, 2026

@lidavidm should I leave only the spec changes in the PR against spec branch? or postgres impl as well?

@lidavidm
Copy link
Copy Markdown
Member

Impl is fine as well, but the branch seems to have picked up a lot of extra commits

Adds the spec document and adbc.h declarations for partitioned bulk
ingest — the write-side mirror of ExecutePartitions/ReadPartition.
@tokoko tokoko force-pushed the partitioned-ingest branch from 310e138 to c264c82 Compare June 2, 2026 20:41
@tokoko tokoko changed the title Partitioned ingest API feat(format): partitioned bulk ingest API Jun 2, 2026
@tokoko
Copy link
Copy Markdown
Contributor Author

tokoko commented Jun 2, 2026

some of the core cpp adbc manager codebase differs between main and spec-1.2.0. so I left only spec changes here. kept the impl in another branch on my repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants