WIP: Extension type casts using extension registry#21071
WIP: Extension type casts using extension registry#21071paleolimbot wants to merge 19 commits intoapache:mainfrom
Conversation
| let state = SessionStateBuilder::default() | ||
| .with_canonical_extension_types()? | ||
| .with_type_planner(Arc::new(CustomTypePlanner {})) | ||
| .build(); | ||
| let ctx = SessionContext::new_with_state(state); | ||
|
|
||
| ctx.register_batch("test", batch)?; | ||
|
|
||
| let df = ctx.sql("SELECT my_uuids::VARCHAR FROM test").await?; | ||
| let batches = df.collect().await?; | ||
|
|
||
| assert_batches_eq!( | ||
| vec![ | ||
| "+--------------------------------------+", | ||
| "| test.my_uuids |", | ||
| "+--------------------------------------+", | ||
| "| 00000000-0000-0000-0000-000000000000 |", | ||
| "| 00010203-0405-0607-0809-000102030506 |", | ||
| "+--------------------------------------+", | ||
| ], | ||
| &batches | ||
| ); |
There was a problem hiding this comment.
Casting from a UUID to something else works!
| async fn create_cast_char_to_uuid() -> Result<()> { | ||
| let state = SessionStateBuilder::default() | ||
| .with_canonical_extension_types()? | ||
| .with_type_planner(Arc::new(CustomTypePlanner {})) | ||
| .build(); | ||
| let ctx = SessionContext::new_with_state(state); | ||
|
|
||
| let df = ctx | ||
| .sql("SELECT '00010203-0405-0607-0809-000102030506'::UUID AS uuid") | ||
| .await?; | ||
| let batches = df.collect().await?; | ||
| assert_batches_eq!( | ||
| vec![ | ||
| "+----------------------------------+", | ||
| "| uuid |", | ||
| "+----------------------------------+", | ||
| "| 00010203040506070809000102030506 |", | ||
| "+----------------------------------+", | ||
| ], | ||
| &batches | ||
| ); |
There was a problem hiding this comment.
We can also go the other direction!
| fn cast_from(&self) -> Result<Arc<dyn CastExtension>> { | ||
| Ok(Arc::new(DefaultExtensionCast {})) | ||
| } | ||
|
|
||
| fn cast_to(&self) -> Result<Arc<dyn CastExtension>> { | ||
| Ok(Arc::new(DefaultExtensionCast {})) | ||
| } | ||
| } | ||
|
|
||
| pub trait CastExtension: Debug + Send + Sync { | ||
| fn can_cast(&self, from: &Field, to: &Field, options: &CastOptions) -> Result<bool>; | ||
|
|
||
| // None for fallback | ||
| fn cast( | ||
| &self, | ||
| value: ArrayRef, | ||
| from: &Field, | ||
| to: &Field, | ||
| options: &CastOptions, | ||
| ) -> Result<ArrayRef>; | ||
| } |
There was a problem hiding this comment.
This is the interface an extension type can implement to define its interactions with other types
| if let Some(registry) = &execution_props.extension_types | ||
| && let Some(extension_type) = | ||
| registry.create_extension_type_for_field(&field)? | ||
| { | ||
| let cast_extension = extension_type.cast_from()?; |
There was a problem hiding this comment.
This is where the cast (to an extension type from something else) gets planned into a physical expression
| let cast_extension = extension_type.cast_to()?; | ||
| if cast_extension.can_cast(&src_field, &field, &DEFAULT_CAST_OPTIONS)? { | ||
| return expressions::cast_with_extension( |
There was a problem hiding this comment.
This is where the cast from something else to an extension type is resolved to a physical expression.
(I didn't handle the case where an extension type is getting cast to another extension type, but that should get handled if this is ever going to merge because we have to make sure the interface can handle either the right OR the left side defining this cast without erroring if the other side doesn't handle it)
| if let Some(cast_extension) = &self.cast_extension { | ||
| let from_field = self.expr.return_field(&batch.schema())?; | ||
| let to_field = self.return_field(&batch.schema())?; | ||
| match value { | ||
| ColumnarValue::Array(array) => { | ||
| Ok(ColumnarValue::Array(cast_extension.cast( | ||
| array, | ||
| &from_field, | ||
| &to_field, | ||
| &self.cast_options, | ||
| )?)) | ||
| } |
There was a problem hiding this comment.
This is where the cast is executed. I chose to tack on the CastExtension to the CastExpr but it could also be its own PhysicalExpr (maybe safer in case I haven't considered some of the things that could happen during physical optimizer passes)
Which issue does this PR close?
This PR is a proof of concept stacked on top of #20312 to demonstrate how casting to and from extension types might be supported with the registry design in that PR. All details up for grabs...most of the work here is just piping the registry so that we can resolve a cast to or from an extension type when creating a physical expression from a logical one (the work to pipe SQL and Logical plan casts to extension types was already done before this PR).
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?