[RFC] Code changes to prevent crashes from lack of iceberg-type column in PyIceberg-produced SQL catalogs
#2092
+438
−142
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Refer to #2068
SQL Catalogs from PyIceberg do not contain the
iceberg-typecolumn in theiceberg-tablestable. I did a brief test, and from what I can tell this issue doesn't affect Glue but does impact SQLite and MySQL. I haven't tested Postgres or any other backends yet.What changes are included in this PR?
This request for comment is written to discuss how we can re-write the relevant pieces of code to detect whether this column is present and react accordingly. This is a hard problem to tackle because there's no pure-SQL way to do this that I know of.
In the attached solution, I've added a variable to SqlCatalog which tracks what backend SQLX is using under the hood, and calls a backend-specific SQL query to get a list of columns. If
iceberg-typeis not one of the columns, then the check that uses it is omitted from the query.I want to know what your thoughts are on the implemented solution and if there's a better way to go about it. I know it's not ideal to have backend-specific code in the SQL catalog but I'm at a loss for how else to do this.
Are these changes tested?
Yes. I've tested this with an SQLite catalog exported from PyIceberg. I then imported the tables to a MySQL server to test the MySQL backend-specific code I added. These tests are not exhaustive but I wanted to get comments on this approach before going further.
It also passes the standard
make testchecks on my machine.PS--There's some code mixed in here from #2079 because I'm not so good at git yet and because the code is necessary to get the SQL backends working for testing.
Thanks!
Brodie