[SPARK-55748][SQL] Use `DSv2` for `avro|csv|json|kafka|orc|parquet|text` by default by dongjoon-hyun · Pull Request #54547 · apache/spark

dongjoon-hyun · 2026-02-27T17:13:54Z

What changes were proposed in this pull request?

Why are the changes needed?

DSv2 is recommended instead of DSv1 in these days.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Gemini 3.1 Pro (High) on Antigravity

…xt` by default

dongjoon-hyun · 2026-02-27T17:17:16Z

cc @huaxingao as the release manager of Apache Spark 4.2.0.

peter-toth · 2026-02-27T17:22:31Z

Are DSv2 file sources on par with v1? I thought features like DPP hasn't been implemented yet.

yaooqinn · 2026-02-27T17:26:11Z

I'm not very confident about this change：）.

LuciferYang · 2026-02-27T17:27:36Z

Are DSv2 file sources on par with v1? I thought features like DPP hasn't been implemented yet.

There was previously a pr regarding DsV2 supporting DPP, but it wasn't merged

[SPARK-30628][SQL] Support Subquery partition pruning and DPP for V2 file source #37514

dongjoon-hyun · 2026-02-27T17:30:48Z

Thank you for the feedback, @peter-toth , @yaooqinn , @LuciferYang .

gengliangwang · 2026-02-27T17:31:31Z

@dongjoon-hyun When I previously worked on File Source V2, the DSV2 framework was still relatively immature.

Even today, there are still several missing pieces on the write path, including:
• Partitioned writes
• Bucketing
• Custom partition locations (e.g., ALTER TABLE ... SET LOCATION)
• Sort-before-write
• Overwrite modes
• etc.

Revisiting and completing the migration could be an interesting project now that the DSV2 framework is much more mature. In addition, LLM-assisted development could help accelerate parts of the implementation.

viirya · 2026-02-27T18:03:43Z

@dongjoon-hyun When I previously worked on File Source V2, the DSV2 framework was still relatively immature.

Even today, there are still several missing pieces on the write path, including: • Partitioned writes • Bucketing • Custom partition locations (e.g., ALTER TABLE ... SET LOCATION) • Sort-before-write • Overwrite modes • etc.

Revisiting and completing the migration could be an interesting project now that the DSV2 framework is much more mature. In addition, LLM-assisted development could help accelerate parts of the implementation.

If we don't have yet, can we open tickets for these known limitations of File Source V2 today? So we can have an explicit list to track and work on in order to achieve this goal.

dongjoon-hyun mentioned this pull request Feb 27, 2026

[SPARK-55716][SQL] Support NOT NULL constraint enforcement for V1 file source table inserts #54517

Open

dongjoon-hyun requested review from HyukjinKwon, LuciferYang, cloud-fan, gengliangwang, kiszk, maropu, peter-toth, sarutak, sunchao, viirya and yaooqinn February 27, 2026 17:15

dongjoon-hyun closed this Feb 28, 2026

dongjoon-hyun deleted the SPARK-55748 branch February 28, 2026 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55748][SQL] Use `DSv2` for `avro|csv|json|kafka|orc|parquet|text` by default#54547

[SPARK-55748][SQL] Use `DSv2` for `avro|csv|json|kafka|orc|parquet|text` by default#54547
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-55748

dongjoon-hyun commented Feb 27, 2026

Uh oh!

dongjoon-hyun commented Feb 27, 2026

Uh oh!

peter-toth commented Feb 27, 2026

Uh oh!

yaooqinn commented Feb 27, 2026

Uh oh!

LuciferYang commented Feb 27, 2026

Uh oh!

dongjoon-hyun commented Feb 27, 2026

Uh oh!

gengliangwang commented Feb 27, 2026

Uh oh!

viirya commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

dongjoon-hyun commented Feb 27, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Feb 27, 2026

Uh oh!

peter-toth commented Feb 27, 2026

Uh oh!

yaooqinn commented Feb 27, 2026

Uh oh!

LuciferYang commented Feb 27, 2026

Uh oh!

dongjoon-hyun commented Feb 27, 2026

Uh oh!

gengliangwang commented Feb 27, 2026

Uh oh!

viirya commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants