Skip to content

Spark, Hive: Fix snapshot procedure for tables with Variant columns#15964

Open
nssalian wants to merge 2 commits intoapache:mainfrom
nssalian:snapshot-procedure-variant-fix
Open

Spark, Hive: Fix snapshot procedure for tables with Variant columns#15964
nssalian wants to merge 2 commits intoapache:mainfrom
nssalian:snapshot-procedure-variant-fix

Conversation

@nssalian
Copy link
Copy Markdown
Contributor

@nssalian nssalian commented Apr 13, 2026

Fixes: #14123 and #15220 too.

Summary

The snapshot procedure fails on Spark tables with Variant columns because the Hive catalog stores LazySimpleSerDe instead of ParquetHiveSerDe for these tables. The SerDe-based format detection doesn't recognize it and throws.
After fixing that, a second failure occurs when running the test provided in the issue, HiveSchemaUtil.convertToTypeString which has no case for VARIANT.

Changes

This adds a resolveFileFormat helper that falls back to table.provider() when the SerDe doesn't match a known format, and maps VARIANT to "unknown" in the Hive schema conversion, following the conversation here: #15228

Made changes in Spark v4.0 and v4.1

Test plan

  • Expanded the test in the issue to include both partitioned and unpartitioned tables in both Spark 4.0 and 4.1 - skipped for hive and spark catalog until hive 4 lands
  • Added a unit test for the HiveSchemaUtil conversion check

@nssalian nssalian marked this pull request as ready for review April 13, 2026 17:33
@nssalian
Copy link
Copy Markdown
Contributor Author

nssalian commented Apr 13, 2026

@steveloughran
Copy link
Copy Markdown
Contributor

@nssalian will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrating existing Spark tables with Variant type fails in Iceberg

2 participants