[spark] Support FROM (query) export in COPY INTO location#8096
Open
JunRuiLee wants to merge 1 commit into
Open
[spark] Support FROM (query) export in COPY INTO location#8096JunRuiLee wants to merge 1 commit into
JunRuiLee wants to merge 1 commit into
Conversation
714e43b to
f6e903c
Compare
f6e903c to
6f27690
Compare
Contributor
|
cc @Zouxxyy to take a look. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Extend
COPY INTO <location>(export) to accept an inline query as the source, not just a table:Previously only
FROM table_namewas supported. The inline query is parsed through thesession (Paimon) parser, so it behaves exactly like the same query run via
spark.sql,including Paimon parser rules such as the v1 function rewrite.
Design notes:
parenBlockrule captures the parenthesized source with balancedparentheses, so nested parens (e.g.
WHERE id IN (1, 2)) are matched correctly. The rawtext is re-parsed by the AST builder.
SELECT,WITH ... SELECT,VALUES, ...); statements with side effects are rejected by inspecting the parsed plan forCommand/ParsedStatement/InsertIntoDirnodes. DDL/DML such asDROP,INSERT, andINSERT OVERWRITE DIRECTORYare rejected before any execution, so they cannot run.the logical command and the physical exec, so impossible states cannot be constructed.
rows_writtenis counted before the write; for a non-deterministic query itmay differ from the actual output. This is documented; the result is intentionally not
staged, so the export does not consume extra executor disk.
This is part of #8005.
Tests
CopyIntoTestBase: CSV/JSON/Parquet export from a query, aggregation, nested parentheses,OVERWRITE = TRUE,VALUESsource, empty-query rejection, and rejection ofside-effecting statements (with assertions that the source table is untouched and no files
are written).
PaimonV1FunctionTestBase: exportingFROM (SELECT <v1_function>(...))resolves correctlythrough the session parser.