Skip to content

fix chunked copying of large csv rows#1062

Merged
levkk merged 2 commits into
pgdogdev:mainfrom
LnL7:fix/chunked-csv-copy
Jun 12, 2026
Merged

fix chunked copying of large csv rows#1062
levkk merged 2 commits into
pgdogdev:mainfrom
LnL7:fix/chunked-csv-copy

Conversation

@LnL7

@LnL7 LnL7 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

I ran into this issue when importing data using psql from a csv for a table with a relatively big jsonb config column.

\copy public.table from 'table.csv' WITH NULL 'CSV_NONE_SENTINEL' CSV HEADER;

For large rows a partial message seems to be forwarded to the postgres backend resulting in an error like this. I'm not sure if this patch is the best approach to handle this case but so far it seems to be working for my usecase.

[88115] ERROR:  unterminated CSV quoted field
[88115] CONTEXT:  COPY copy_csv_large_test, line 1: "1,{"value":"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee..."
[88115] STATEMENT:  COPY copy_csv_large_test (id, data) FROM STDIN (FORMAT CSV)

@levkk

levkk commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Interesting. I need to dig into this to understand the root cause (unless you have already).

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pgdog/src/backend/pool/connection/mod.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@LnL7

LnL7 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Not entirely to be honest, maybe part of the query gets sent twice once partial and once fully buffered the entire row?
The test case reproduces the scenario so that hopefully helps as a starting point.

@CLAassistant

CLAassistant commented Jun 11, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@levkk

levkk commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

I understand what's going on. The row is crossing the CooyData message boundary and we're sending it as-is to all shards instead of parsing it. Nice catch. I'll just double check the implementation of the fix one more time for performance, but otherwise... really good find!

} else {
self.send(client_request).await?;
}
self.send(&client_request.without_copy_data()).await?;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an assumption of protocol correctness here: if the client sends incomplete CopyData rows and a CopyDone, we will forward the CopyDone to the shards, causing them to close the copy operation. We should make an assertion here that the request doesn't contain both CopyData and CopyDone messages.

It would technically be the client's fault for sending an incomplete request, but we still shouldn't allow it go fly through.

@levkk levkk merged commit 41c6609 into pgdogdev:main Jun 12, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants