feat(format): partitioned bulk ingest API#4317
Conversation
|
Do you want to target this against the spec-1.2.0 branch instead? |
| struct AdbcConnection* connection, const char* target_catalog, | ||
| const char* target_db_schema, const char* target_table, const char* mode, |
There was a problem hiding this comment.
I'd rather we use the existing options here instead of duplicating them into the signature
There was a problem hiding this comment.
one caveat there is that existing options are statement-level in practice rather than connection-level. all these new functions are connection-level. are you also suggesting moving all (or some) to statement-level?
| /// write; that happens at Commit (commit it) or Abort (drop it). | ||
| /// | ||
| /// \since ADBC API revision 1.2.0 | ||
| struct AdbcIngestReceipt { |
There was a problem hiding this comment.
It would be good to explain the receipt/handle explicitly here instead of leaving it implicit from the below definitions.
Additionally I think the comments could generally be cleaned up.
| /// Abort is best-effort. If cleanup is incomplete, the driver | ||
| /// returns a warning status and orphaned storage may remain; it is |
There was a problem hiding this comment.
Perhaps this would leverage ConnectionSetWarningHandler?
There was a problem hiding this comment.
I meant some sort of an "okish" adbc status, but warning handler sounds like a more appropriate solution. unless you think it's not necessary at all.. just documenting it as best-effort could be enough as well.
| /// the driver's responsibility to provide housekeeping (e.g. TTL, | ||
| /// background GC, or documented manual cleanup). Callers may also |
There was a problem hiding this comment.
This sounds more like the semantics of the backend system
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Some initial thoughts.
| AdbcStatusCode (*ConnectionWriteIngestPartition)( | ||
| struct AdbcConnection*, const uint8_t*, size_t, struct ArrowArrayStream*, | ||
| struct AdbcIngestReceipt*, struct AdbcError*); | ||
| AdbcStatusCode (*ConnectionCommitIngestPartitions)( |
There was a problem hiding this comment.
Repeating my comment from the mailing list, is this use of Commit going to be misleading around its relationship with a database transactional commit?
There was a problem hiding this comment.
wdyt about "complete" or maybe "apply"? I'm fine with changing to "complete" if you agree.
| /// Abort is best-effort. If cleanup is incomplete, the driver | ||
| /// returns a warning status and orphaned storage may remain; it is |
There was a problem hiding this comment.
Perhaps this would leverage ConnectionSetWarningHandler?
| /// call Abort if the coordinator crashed and was restarted without | ||
| /// the original receipts. |
There was a problem hiding this comment.
Is it worth distinguishing the case of "Abort an unknown handle" from the case of "something went wrong during Abort" with a specific error code for the former?
| struct AdbcIngestHandle; | ||
| struct AdbcIngestReceipt; |
There was a problem hiding this comment.
The only role these seem to play is to have an API that allows the corresponding opaque buffer to be deallocated. We're also not getting any type safety from having two of them because both are just byte arrays post-serialization. Could we simplify and generalize a little and declare a single type for this e.g. AdbcSerializableHandle?
| /// point to an AdbcDriver. | ||
| /// | ||
| /// \since ADBC API revision 1.2.0 | ||
| #define ADBC_VERSION_1_2_0 1002000 |
There was a problem hiding this comment.
Can you please resubmit this as a PR against the spec-1.2.0 branch where this should already be defined?
95d027c to
310e138
Compare
|
@lidavidm should I leave only the spec changes in the PR against spec branch? or postgres impl as well? |
|
Impl is fine as well, but the branch seems to have picked up a lot of extra commits |
Adds the spec document and adbc.h declarations for partitioned bulk ingest — the write-side mirror of ExecutePartitions/ReadPartition.
310e138 to
c264c82
Compare
|
some of the core cpp adbc manager codebase differs between main and spec-1.2.0. so I left only spec changes here. kept the impl in another branch on my repo. |
demo PR for a new partitioned ingest API.