Skip to content

Type hints overhaul#352

Open
OutSquareCapital wants to merge 25 commits intoduckdb:v1.5-variegatafrom
OutSquareCapital:expr-typing
Open

Type hints overhaul#352
OutSquareCapital wants to merge 25 commits intoduckdb:v1.5-variegatafrom
OutSquareCapital:expr-typing

Conversation

@OutSquareCapital
Copy link

@OutSquareCapital OutSquareCapital commented Feb 27, 2026

This PR provides numerous improvements regarding type hints.

This is my follow-up to our discussion here @evertlammerts:
#341 (comment)

All changes are only in the type stubs, which means that there's no impact whatsoever on any runtime logic.

Changes

  1. New _expression.pyi file to separate the Expression class, and allow circular imports and references. Leans-up a bit the __init__ file, which is nice.
  2. Two new Protocol for numpy array and types. Allow to type check those without emitting errors if the user doesn't have the library installed. Array is useful for Expression conversions, Dtype for DuckDBPyType conversions.
  3. Refactored and expanded the _ExpressionLike type alias. Renamed it to IntoExpr, and added various new type aliases covering as much situations as possible for Expression conversions.
  4. Added a few Literals to cover the ids and str conversions to DuckDBPyType, providing a nice autocompletion for arguments, and a nice interaction with pattern matching when checking the id value.
    Also, provide JSON and BIGNUM convenient instanciation as an added bonus (ATM they are absent from sqltypes constants).
  5. Added various new type aliases, to cover all paths for DuckDBPyType conversions: as python/numpy static type hints, as dict instances, or as Literal | str. This significantly improve the types hints regarding datatypes arguments, who were very often only accepting str or DuckDBTypes in the signatures.
  6. added various new Literal for files methods/functions argument options.
  7. centralized type aliases, Literals, and Protocols in a _typing.pyi file, to avoid bloating the __init__.

Notes

  • I tried to document this as best as I could with docstrings for users and "private" comments.
    I left a few observations, but what I would add is that one thing is clear, the runtime accepted types are all over the place (sometimes Mapping is ok, sometimes only dict is ok, etc...).
    As I said in Typing stubs are too strict about arguments of type Expression #341 , prioritizing collections.abc as much as possible would be the best way to go in the future.

  • Centralizing the type aliases and using them as much as possible make sense IMO, especially with an API that have repeated signatures (connexion methods vs module level function for example).

  • The next step would be to move the type definition in a concrete .py file, allowing user to import those if they want to annotate custom functions or do runtime type introspection.

…d allow circular imports between files.

- added  nested dtypes, bytesarray, and memoryview as literal, convertible python types
- PythonLiteral is a recursive type, to allow dict of list, list of list, etc...
- _ExpressionLike -> IntoExpr
- Expression | str -> IntoExprColumn
…mpy ndarray without creating unknown type errors if the library isn't installed in the venv
- Using IntoExprColumn on StarExpression
- fixed lhs type for LambdaExpression, and value type for ConstantExpression
- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation
- using Self for annotations on arguments when pertinent
…d allow circular imports between files.

- added  nested dtypes, bytesarray, and memoryview as literal, convertible python types
- PythonLiteral is a recursive type, to allow dict of list, list of list, etc...
- _ExpressionLike -> IntoExpr
- Expression | str -> IntoExprColumn
…mpy ndarray without creating unknown type errors if the library isn't installed in the venv
- Using IntoExprColumn on StarExpression
- fixed lhs type for LambdaExpression, and value type for ConstantExpression
- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation
- using Self for annotations on arguments when pertinent
- reorganized expressions/values conversions types, improved their doc
- added Literals for sqltypes ids and string conversion, and various type aliases, covering all paths.
- using aformentionned literals in _sqltypes signatures
- added various new literals for files arguments
- moved join "how" literal in _typing for centralization
- renamed IntoNestedDType -> IntoFields
- added all new literals and type aliases in the main init file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant