Conversation
out of the generator proposer
DuckDB usage documentation Fixed DuckDB parquet output dump-data --output as-directory choice proposal crash fixed
- Generator writers `go_to` can cope with table names that have dots. - `dump-data --parquet` can cope with `TIMESTAMP`s - Foreign Keys to ignored tables fixed.
|
Finally this all works! It's actually fairly easy to fake parquet files now; see the |
|
Ahh nice, I'll pencil in some time next week hopefully to review. Appreciate it Tim |
stefpiatek
left a comment
There was a problem hiding this comment.
Oooh this is very fun. Thanks for working on this and getting the translation so it works ❤️
| except TypeError: | ||
| pass |
There was a problem hiding this comment.
Ooh when are we expecting this to happen, and if so do we want to log it?
| RowCounts = Counter[str] | ||
|
|
||
|
|
||
| @compiles(CreateColumn, "duckdb") |
There was a problem hiding this comment.
Pretty nasty actually. But yes, fun that this hook exists!
| if fk_bits[0] not in tables_dict: | ||
| return False | ||
| return bool(tables_dict[fk_bits[0]].get("ignore", False)) | ||
| (table, _column) = split_column_full_name(fk) |
There was a problem hiding this comment.
Ooh I'm not too sure what was happening before but I think this makes sense
There was a problem hiding this comment.
Yes, one of those "did this ever work?" moments...
| column_types = { | ||
| column: _dtype_to_sql(dtype) for column, dtype in table.dtypes.items() | ||
| } | ||
| name_pref = name[: name.rfind(".")] |
There was a problem hiding this comment.
Could we get to a point where the name doesn't have a .?
There was a problem hiding this comment.
This would be if the file doesn't have an extension such as .parquet, but you are right that this needs some sort of defense.
There was a problem hiding this comment.
Actually, it's fine. That expression works even if no dot is found.
| if last_part in table_names: | ||
| table_names.append(f"{last_part}.") |
There was a problem hiding this comment.
oh interesting that this has swapped from first to last
There was a problem hiding this comment.
It's just that previously if there was only one part it was called first_part, now it's called last_part because that's how the new split_column_full_name function does it.
DuckDB does not work without this change; it uses the PostgreSQL dialect with minor changes, but it really needs a couple more.
This change adds DuckDB as a SQLAlchemy plugin, and hooks into the SQL compilation process removing the PostgreSQL code that DuckDB does not understand.
dump-datahas also been updated to allow the dumping of all non-ignored non-vocabulary tables in one call, and also to dump the data as Parquet.So with
dump-datafor the destination and DuckDB's in-memory database for the source it is now possible to do Parquet-to-Parquet data faking without interacting directly with DuckDB at all! Seeduckdb.rstfor details.