fix chunked copying of large csv rows#1062
Conversation
|
Interesting. I need to dig into this to understand the root cause (unless you have already). |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Not entirely to be honest, maybe part of the query gets sent twice once partial and once fully buffered the entire row? |
|
I understand what's going on. The row is crossing the CooyData message boundary and we're sending it as-is to all shards instead of parsing it. Nice catch. I'll just double check the implementation of the fix one more time for performance, but otherwise... really good find! |
| } else { | ||
| self.send(client_request).await?; | ||
| } | ||
| self.send(&client_request.without_copy_data()).await?; |
There was a problem hiding this comment.
There is an assumption of protocol correctness here: if the client sends incomplete CopyData rows and a CopyDone, we will forward the CopyDone to the shards, causing them to close the copy operation. We should make an assertion here that the request doesn't contain both CopyData and CopyDone messages.
It would technically be the client's fault for sending an incomplete request, but we still shouldn't allow it go fly through.
I ran into this issue when importing data using psql from a csv for a table with a relatively big jsonb config column.
For large rows a partial message seems to be forwarded to the postgres backend resulting in an error like this. I'm not sure if this patch is the best approach to handle this case but so far it seems to be working for my usecase.