Skip to content

[SPARK-55748][SQL] Use DSv2 for avro|csv|json|kafka|orc|parquet|text by default#54547

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-55748
Closed

[SPARK-55748][SQL] Use DSv2 for avro|csv|json|kafka|orc|parquet|text by default#54547
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-55748

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR aims to use DSv2 for avro|csv|json|kafka|orc|parquet|text by default.

Why are the changes needed?

DSv2 is recommended instead of DSv1 in these days.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Gemini 3.1 Pro (High) on Antigravity

@dongjoon-hyun
Copy link
Member Author

cc @huaxingao as the release manager of Apache Spark 4.2.0.

@peter-toth
Copy link
Contributor

Are DSv2 file sources on par with v1? I thought features like DPP hasn't been implemented yet.

@yaooqinn
Copy link
Member

I'm not very confident about this change:).

@LuciferYang
Copy link
Contributor

Are DSv2 file sources on par with v1? I thought features like DPP hasn't been implemented yet.

There was previously a pr regarding DsV2 supporting DPP, but it wasn't merged

@dongjoon-hyun
Copy link
Member Author

Thank you for the feedback, @peter-toth , @yaooqinn , @LuciferYang .

@gengliangwang
Copy link
Member

@dongjoon-hyun When I previously worked on File Source V2, the DSV2 framework was still relatively immature.

Even today, there are still several missing pieces on the write path, including:
• Partitioned writes
• Bucketing
• Custom partition locations (e.g., ALTER TABLE ... SET LOCATION)
• Sort-before-write
• Overwrite modes
• etc.

Revisiting and completing the migration could be an interesting project now that the DSV2 framework is much more mature. In addition, LLM-assisted development could help accelerate parts of the implementation.

@viirya
Copy link
Member

viirya commented Feb 27, 2026

@dongjoon-hyun When I previously worked on File Source V2, the DSV2 framework was still relatively immature.

Even today, there are still several missing pieces on the write path, including: • Partitioned writes • Bucketing • Custom partition locations (e.g., ALTER TABLE ... SET LOCATION) • Sort-before-write • Overwrite modes • etc.

Revisiting and completing the migration could be an interesting project now that the DSV2 framework is much more mature. In addition, LLM-assisted development could help accelerate parts of the implementation.

If we don't have yet, can we open tickets for these known limitations of File Source V2 today? So we can have an explicit list to track and work on in order to achieve this goal.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-55748 branch February 28, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants