Skip to content

WIP: Extension type casts using extension registry#21071

Draft
paleolimbot wants to merge 19 commits intoapache:mainfrom
paleolimbot:extension-type-registry-cast
Draft

WIP: Extension type casts using extension registry#21071
paleolimbot wants to merge 19 commits intoapache:mainfrom
paleolimbot:extension-type-registry-cast

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Mar 20, 2026

Which issue does this PR close?

This PR is a proof of concept stacked on top of #20312 to demonstrate how casting to and from extension types might be supported with the registry design in that PR. All details up for grabs...most of the work here is just piping the registry so that we can resolve a cast to or from an extension type when creating a physical expression from a logical one (the work to pipe SQL and Logical plan casts to extension types was already done before this PR).

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate common Related to common crate datasource Changes to the datasource crate ffi Changes to the ffi crate labels Mar 20, 2026
Comment on lines +94 to +115
let state = SessionStateBuilder::default()
.with_canonical_extension_types()?
.with_type_planner(Arc::new(CustomTypePlanner {}))
.build();
let ctx = SessionContext::new_with_state(state);

ctx.register_batch("test", batch)?;

let df = ctx.sql("SELECT my_uuids::VARCHAR FROM test").await?;
let batches = df.collect().await?;

assert_batches_eq!(
vec![
"+--------------------------------------+",
"| test.my_uuids |",
"+--------------------------------------+",
"| 00000000-0000-0000-0000-000000000000 |",
"| 00010203-0405-0607-0809-000102030506 |",
"+--------------------------------------+",
],
&batches
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting from a UUID to something else works!

Comment on lines +121 to +141
async fn create_cast_char_to_uuid() -> Result<()> {
let state = SessionStateBuilder::default()
.with_canonical_extension_types()?
.with_type_planner(Arc::new(CustomTypePlanner {}))
.build();
let ctx = SessionContext::new_with_state(state);

let df = ctx
.sql("SELECT '00010203-0405-0607-0809-000102030506'::UUID AS uuid")
.await?;
let batches = df.collect().await?;
assert_batches_eq!(
vec![
"+----------------------------------+",
"| uuid |",
"+----------------------------------+",
"| 00010203040506070809000102030506 |",
"+----------------------------------+",
],
&batches
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also go the other direction!

Comment on lines +74 to +94
fn cast_from(&self) -> Result<Arc<dyn CastExtension>> {
Ok(Arc::new(DefaultExtensionCast {}))
}

fn cast_to(&self) -> Result<Arc<dyn CastExtension>> {
Ok(Arc::new(DefaultExtensionCast {}))
}
}

pub trait CastExtension: Debug + Send + Sync {
fn can_cast(&self, from: &Field, to: &Field, options: &CastOptions) -> Result<bool>;

// None for fallback
fn cast(
&self,
value: ArrayRef,
from: &Field,
to: &Field,
options: &CastOptions,
) -> Result<ArrayRef>;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interface an extension type can implement to define its interactions with other types

Comment on lines +301 to +305
if let Some(registry) = &execution_props.extension_types
&& let Some(extension_type) =
registry.create_extension_type_for_field(&field)?
{
let cast_extension = extension_type.cast_from()?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the cast (to an extension type from something else) gets planned into a physical expression

Comment on lines +332 to +334
let cast_extension = extension_type.cast_to()?;
if cast_extension.can_cast(&src_field, &field, &DEFAULT_CAST_OPTIONS)? {
return expressions::cast_with_extension(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the cast from something else to an extension type is resolved to a physical expression.

(I didn't handle the case where an extension type is getting cast to another extension type, but that should get handled if this is ever going to merge because we have to make sure the interface can handle either the right OR the left side defining this cast without erroring if the other side doesn't handle it)

Comment on lines +275 to +286
if let Some(cast_extension) = &self.cast_extension {
let from_field = self.expr.return_field(&batch.schema())?;
let to_field = self.return_field(&batch.schema())?;
match value {
ColumnarValue::Array(array) => {
Ok(ColumnarValue::Array(cast_extension.cast(
array,
&from_field,
&to_field,
&self.cast_options,
)?))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the cast is executed. I chose to tack on the CastExtension to the CastExpr but it could also be its own PhysicalExpr (maybe safer in case I haven't considered some of the things that could happen during physical optimizer passes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate ffi Changes to the ffi crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants