[SPARK-55748][SQL] Use DSv2 for avro|csv|json|kafka|orc|parquet|text by default#54547
[SPARK-55748][SQL] Use DSv2 for avro|csv|json|kafka|orc|parquet|text by default#54547dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
DSv2 for avro|csv|json|kafka|orc|parquet|text by default#54547Conversation
|
cc @huaxingao as the release manager of Apache Spark 4.2.0. |
|
Are DSv2 file sources on par with v1? I thought features like DPP hasn't been implemented yet. |
|
I'm not very confident about this change:). |
There was previously a pr regarding DsV2 supporting DPP, but it wasn't merged |
|
Thank you for the feedback, @peter-toth , @yaooqinn , @LuciferYang . |
|
@dongjoon-hyun When I previously worked on File Source V2, the DSV2 framework was still relatively immature. Even today, there are still several missing pieces on the write path, including: Revisiting and completing the migration could be an interesting project now that the DSV2 framework is much more mature. In addition, LLM-assisted development could help accelerate parts of the implementation. |
If we don't have yet, can we open tickets for these known limitations of File Source V2 today? So we can have an explicit list to track and work on in order to achieve this goal. |
What changes were proposed in this pull request?
This PR aims to use
DSv2foravro|csv|json|kafka|orc|parquet|textby default.Why are the changes needed?
DSv2is recommended instead ofDSv1in these days.Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
Generated-by:
Gemini 3.1 Pro (High)onAntigravity