[RFC] Code changes to prevent crashes from lack of `iceberg-type` column in PyIceberg-produced SQL catalogs #2092

brodiealexander · 2026-01-29T21:50:21Z

Which issue does this PR close?

Refer to #2068

SQL Catalogs from PyIceberg do not contain the iceberg-type column in the iceberg-tables table. I did a brief test, and from what I can tell this issue doesn't affect Glue but does impact SQLite and MySQL. I haven't tested Postgres or any other backends yet.

What changes are included in this PR?

This request for comment is written to discuss how we can re-write the relevant pieces of code to detect whether this column is present and react accordingly. This is a hard problem to tackle because there's no pure-SQL way to do this that I know of.

In the attached solution, I've added a variable to SqlCatalog which tracks what backend SQLX is using under the hood, and calls a backend-specific SQL query to get a list of columns. If iceberg-type is not one of the columns, then the check that uses it is omitted from the query.

I want to know what your thoughts are on the implemented solution and if there's a better way to go about it. I know it's not ideal to have backend-specific code in the SQL catalog but I'm at a loss for how else to do this.

Are these changes tested?

Yes. I've tested this with an SQLite catalog exported from PyIceberg. I then imported the tables to a MySQL server to test the MySQL backend-specific code I added. These tests are not exhaustive but I wanted to get comments on this approach before going further.

It also passes the standard make test checks on my machine.

PS--There's some code mixed in here from #2079 because I'm not so good at git yet and because the code is necessary to get the SQL backends working for testing.

Thanks!
Brodie

brodiealexander · 2026-01-30T15:24:17Z

So, after getting a good night's rest I've worked out that for most of these cases it's better and easier to just SELECT * and then use the query results to check for column presence.

However, it's still unclear what should be done for register_table, drop_table, and rename_table.

Can there be a table and view with the same name, or are they mutually exclusive? If they're mutually exclusive, drop_table and rename_table are easily solvable.

register_table is a little more complicated. Should we keep backend-specific code for detecting iceberg_type and have two separate code paths in register_table for either case?

brodiealexander · 2026-01-30T17:33:31Z

@kevinjqliu What's your opinion on the idea of scanning either (1) the table schema using backend-specific SQL or (2) selecting * from the table and seeing if there's an iceberg_type, then using the result to set a flag for if we should query it?

I'm a little skeptical of approach 2 since there could theoretically be databases with defined schema but no tables yet.

brodiealexander · 2026-01-30T19:28:17Z

Added migration and schema versioning stuff in line with discussion in #2068

brodiealexander added 3 commits January 29, 2026 15:37

Added backend-specific SQL code to check for iceberg-type

bda5411

Removed unused import

41d59a3

Refactored many cases where backend-specific SQL code is not required

67358b4

added migration and schema version-specific handling for tables

b822d28

remove trailing whitespace

8de2a87

brodiealexander mentioned this pull request Jan 30, 2026

column "iceberg_type" does not exist #2068

Open

add concrete type for iceberg_type

ce88708

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Code changes to prevent crashes from lack of `iceberg-type` column in PyIceberg-produced SQL catalogs #2092

[RFC] Code changes to prevent crashes from lack of `iceberg-type` column in PyIceberg-produced SQL catalogs #2092

brodiealexander commented Jan 29, 2026 •

edited

Loading

Uh oh!

brodiealexander commented Jan 30, 2026

Uh oh!

brodiealexander commented Jan 30, 2026 •

edited

Loading

Uh oh!

brodiealexander commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[RFC] Code changes to prevent crashes from lack of iceberg-type column in PyIceberg-produced SQL catalogs #2092

Are you sure you want to change the base?

[RFC] Code changes to prevent crashes from lack of iceberg-type column in PyIceberg-produced SQL catalogs #2092

Conversation

brodiealexander commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

brodiealexander commented Jan 30, 2026

Uh oh!

brodiealexander commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brodiealexander commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[RFC] Code changes to prevent crashes from lack of `iceberg-type` column in PyIceberg-produced SQL catalogs #2092

[RFC] Code changes to prevent crashes from lack of `iceberg-type` column in PyIceberg-produced SQL catalogs #2092

brodiealexander commented Jan 29, 2026 •

edited

Loading

brodiealexander commented Jan 30, 2026 •

edited

Loading