feat: add user-facing CometUDF registration [experimental]#4387
feat: add user-facing CometUDF registration [experimental]#4387andygrove wants to merge 1 commit into
Conversation
[skip ci] Adds `CometUDFRegistry` so users can register a vectorized `CometUDF` implementation against a Spark UDF name. The `CometScalaUDF` serde now checks the registry first and emits a `JvmScalarUdf` targeting the registered class directly, bypassing the Janino codegen dispatcher. Unregistered UDFs continue through codegen when enabled, otherwise fall back to Spark. The `CometUDF` trait stays as defined on apache/main (the JVM UDF framework already handles per-task instantiation and caching via `CometUdfBridge`). `CometUDF` and `CometUDFRegistry` are marked `@org.apache.spark.annotation.Unstable`.
|
@mbutrovich Do you think this approach is still needed, or is the existing Spark UDF support just as good now? |
I'm not sure it's worth maintaining as an access path. In theory, if someone could adapt their UDF logic to be applied directly to Arrow types, then a user could elide a type change if they reimplemented as a |
Makes sense. Thanks, I'll go ahead and close this. |
Which issue does this PR close?
Part of #4193
Supersedes #4233 (closed).
Rationale for this change
apache/main already ships the underlying JVM UDF framework: the
CometUDFtrait, theJvmScalarUdfproto, native dispatch viaCometUdfBridge, and the Janino codegen dispatcher (#4267) for automaticScalaUDFhandling. What's missing is a way for end users to plug their own vectorizedCometUDFimplementation in directly, so they can hand-tune a columnar kernel for a specific function instead of going through codegen.What changes are included in this PR?
CometUDFRegistry(new): a thread-safe registry mapping a Spark UDF name to a user-suppliedCometUDFimplementation class.CometScalaUDF.convertchecks the registry first; if a registered name matches, it emits aJvmScalarUdfproto targeting the user class directly with the children expressions as args. Unregistered UDFs continue through the codegen dispatcher (when enabled).@org.apache.spark.annotation.UnstableonCometUDFandCometUDFRegistryto signal that the user-facing surface may evolve.custom_comet_udfs.mddocumenting the contract, registration, routing precedence, and cluster deployment.User-facing API:
How are these changes tested?
CometRegisteredUdfSuite:CometUDFruns on the native path end-to-end (`checkSparkAnswerAndOperator`).register/isRegistered/unregisterround-trip.[skip ci]on this commit while iterating; will drop the tag once the design is settled.