Skip to content

Conversation

@brodiealexander
Copy link
Contributor

@brodiealexander brodiealexander commented Jan 29, 2026

Which issue does this PR close?

Refer to #2068

SQL Catalogs from PyIceberg do not contain the iceberg-type column in the iceberg-tables table. I did a brief test, and from what I can tell this issue doesn't affect Glue but does impact SQLite and MySQL. I haven't tested Postgres or any other backends yet.

What changes are included in this PR?

This request for comment is written to discuss how we can re-write the relevant pieces of code to detect whether this column is present and react accordingly. This is a hard problem to tackle because there's no pure-SQL way to do this that I know of.

In the attached solution, I've added a variable to SqlCatalog which tracks what backend SQLX is using under the hood, and calls a backend-specific SQL query to get a list of columns. If iceberg-type is not one of the columns, then the check that uses it is omitted from the query.

I want to know what your thoughts are on the implemented solution and if there's a better way to go about it. I know it's not ideal to have backend-specific code in the SQL catalog but I'm at a loss for how else to do this.

Are these changes tested?

Yes. I've tested this with an SQLite catalog exported from PyIceberg. I then imported the tables to a MySQL server to test the MySQL backend-specific code I added. These tests are not exhaustive but I wanted to get comments on this approach before going further.

It also passes the standard make test checks on my machine.

PS--There's some code mixed in here from #2079 because I'm not so good at git yet and because the code is necessary to get the SQL backends working for testing.

Thanks!
Brodie

@brodiealexander
Copy link
Contributor Author

So, after getting a good night's rest I've worked out that for most of these cases it's better and easier to just SELECT * and then use the query results to check for column presence.

However, it's still unclear what should be done for register_table, drop_table, and rename_table.

Can there be a table and view with the same name, or are they mutually exclusive? If they're mutually exclusive, drop_table and rename_table are easily solvable.

register_table is a little more complicated. Should we keep backend-specific code for detecting iceberg_type and have two separate code paths in register_table for either case?

@brodiealexander
Copy link
Contributor Author

brodiealexander commented Jan 30, 2026

@kevinjqliu What's your opinion on the idea of scanning either (1) the table schema using backend-specific SQL or (2) selecting * from the table and seeing if there's an iceberg_type, then using the result to set a flag for if we should query it?

I'm a little skeptical of approach 2 since there could theoretically be databases with defined schema but no tables yet.

@brodiealexander
Copy link
Contributor Author

Added migration and schema versioning stuff in line with discussion in #2068

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant