Skip to content
Merged
45 changes: 45 additions & 0 deletions docs/source/user-guide/latest/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,51 @@ Expressions that are not 100% Spark-compatible will fall back to Spark by defaul
`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is the Spark expression class name. See
the [Comet Supported Expressions Guide](expressions.md) for more information on this configuration setting.

### Array Expressions

- **ArrayContains**: Returns null instead of false for empty arrays with literal values.
[#3346](https://github.com/apache/datafusion-comet/issues/3346)
- **ArrayRemove**: Returns null when the element to remove is null, instead of removing null elements from the array.
[#3173](https://github.com/apache/datafusion-comet/issues/3173)
- **GetArrayItem**: Known correctness issues with index handling, including off-by-one errors and incorrect results
with dynamic (non-literal) index values.
[#3330](https://github.com/apache/datafusion-comet/issues/3330),
[#3332](https://github.com/apache/datafusion-comet/issues/3332)
- **ArraysOverlap**: Inconsistent behavior when arrays contain NULL values.
[#3645](https://github.com/apache/datafusion-comet/issues/3645),
[#2036](https://github.com/apache/datafusion-comet/issues/2036)
- **ArrayUnion**: Sorts input arrays before performing the union, while Spark preserves the order of the first array
and appends unique elements from the second.
[#3644](https://github.com/apache/datafusion-comet/issues/3644)

### Date/Time Expressions

- **Hour, Minute, Second**: Incorrectly apply timezone conversion to TimestampNTZ inputs. TimestampNTZ stores local
time without timezone, so no conversion should be applied. These expressions work correctly with Timestamp inputs.
[#3180](https://github.com/apache/datafusion-comet/issues/3180)
- **TruncTimestamp (date_trunc)**: Produces incorrect results when used with non-UTC timezones. Compatible when
timezone is UTC.
[#2649](https://github.com/apache/datafusion-comet/issues/2649)

### Math Expressions

- **Ceil, Floor**: Incorrect results for Decimal type inputs.
[#1729](https://github.com/apache/datafusion-comet/issues/1729)
- **Tan**: `tan(-0.0)` produces `0.0` instead of `-0.0`.
[#1897](https://github.com/apache/datafusion-comet/issues/1897)

### Aggregate Expressions

- **Corr**: Returns null instead of NaN in some edge cases.
[#2646](https://github.com/apache/datafusion-comet/issues/2646)
- **First, Last**: These functions are not deterministic. When `ignoreNulls` is set, results may not match Spark.
[#1630](https://github.com/apache/datafusion-comet/issues/1630)

### Struct Expressions

- **StructsToJson (to_json)**: Does not support `+Infinity` and `-Infinity` for numeric types (float, double).
[#3016](https://github.com/apache/datafusion-comet/issues/3016)

## Regular Expressions

Comet uses the Rust regexp crate for evaluating regular expressions, and this has different behavior from Java's
Expand Down
196 changes: 98 additions & 98 deletions docs/source/user-guide/latest/expressions.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ object QueryPlanSerde extends Logging with CometExprShim {
classOf[Sinh] -> CometScalarFunction("sinh"),
classOf[Sqrt] -> CometScalarFunction("sqrt"),
classOf[Subtract] -> CometSubtract,
classOf[Tan] -> CometScalarFunction("tan"),
classOf[Tan] -> CometTan,
classOf[Tanh] -> CometScalarFunction("tanh"),
classOf[Cot] -> CometScalarFunction("cot"),
classOf[UnaryMinus] -> CometUnaryMinus,
Expand Down
7 changes: 7 additions & 0 deletions spark/src/main/scala/org/apache/comet/serde/aggregates.scala
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,13 @@ object CometStddevPop extends CometAggregateExpressionSerde[StddevPop] with Come
}

object CometCorr extends CometAggregateExpressionSerde[Corr] {

override def getSupportLevel(expr: Corr): SupportLevel =
Incompatible(
Some(
"Returns null instead of NaN in some edge cases" +
" (https://github.com/apache/datafusion-comet/issues/2646)"))

override def convert(
aggExpr: AggregateExpression,
corr: Corr,
Expand Down
21 changes: 21 additions & 0 deletions spark/src/main/scala/org/apache/comet/serde/arrays.scala
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ object CometArrayRemove
with CometExprShim
with ArraysBase {

override def getSupportLevel(expr: ArrayRemove): SupportLevel =
Incompatible(
Some(
"Returns null when element is null instead of removing null elements" +
" (https://github.com/apache/datafusion-comet/issues/3173)"))

override def convert(
expr: ArrayRemove,
inputs: Seq[Attribute],
Expand Down Expand Up @@ -131,6 +137,13 @@ object CometArrayAppend extends CometExpressionSerde[ArrayAppend] {
}

object CometArrayContains extends CometExpressionSerde[ArrayContains] {

override def getSupportLevel(expr: ArrayContains): SupportLevel =
Incompatible(
Some(
"Returns null instead of false for empty arrays with literal values" +
" (https://github.com/apache/datafusion-comet/issues/3346)"))

override def convert(
expr: ArrayContains,
inputs: Seq[Attribute],
Expand Down Expand Up @@ -472,6 +485,14 @@ object CometCreateArray extends CometExpressionSerde[CreateArray] {
}

object CometGetArrayItem extends CometExpressionSerde[GetArrayItem] {

override def getSupportLevel(expr: GetArrayItem): SupportLevel =
Incompatible(
Some(
"Known correctness issues with index handling" +
" (https://github.com/apache/datafusion-comet/issues/3330," +
" https://github.com/apache/datafusion-comet/issues/3332)"))

override def convert(
expr: GetArrayItem,
inputs: Seq[Attribute],
Expand Down
47 changes: 46 additions & 1 deletion spark/src/main/scala/org/apache/comet/serde/datetime.scala
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,18 @@ object CometQuarter extends CometExpressionSerde[Quarter] with CometExprGetDateF
}

object CometHour extends CometExpressionSerde[Hour] {

override def getSupportLevel(expr: Hour): SupportLevel = {
if (expr.child.dataType.typeName == "timestamp_ntz") {
Incompatible(
Some(
"Incorrectly applies timezone conversion to TimestampNTZ inputs" +
" (https://github.com/apache/datafusion-comet/issues/3180)"))
} else {
Compatible()
}
}

override def convert(
expr: Hour,
inputs: Seq[Attribute],
Expand All @@ -203,6 +215,18 @@ object CometHour extends CometExpressionSerde[Hour] {
}

object CometMinute extends CometExpressionSerde[Minute] {

override def getSupportLevel(expr: Minute): SupportLevel = {
if (expr.child.dataType.typeName == "timestamp_ntz") {
Incompatible(
Some(
"Incorrectly applies timezone conversion to TimestampNTZ inputs" +
" (https://github.com/apache/datafusion-comet/issues/3180)"))
} else {
Compatible()
}
}

override def convert(
expr: Minute,
inputs: Seq[Attribute],
Expand All @@ -229,6 +253,18 @@ object CometMinute extends CometExpressionSerde[Minute] {
}

object CometSecond extends CometExpressionSerde[Second] {

override def getSupportLevel(expr: Second): SupportLevel = {
if (expr.child.dataType.typeName == "timestamp_ntz") {
Incompatible(
Some(
"Incorrectly applies timezone conversion to TimestampNTZ inputs" +
" (https://github.com/apache/datafusion-comet/issues/3180)"))
} else {
Compatible()
}
}

override def convert(
expr: Second,
inputs: Seq[Attribute],
Expand Down Expand Up @@ -402,10 +438,19 @@ object CometTruncTimestamp extends CometExpressionSerde[TruncTimestamp] {
"microsecond")

override def getSupportLevel(expr: TruncTimestamp): SupportLevel = {
val timezone = expr.timeZoneId.getOrElse("UTC")
val isUtc = timezone == "UTC" || timezone == "Etc/UTC"
expr.format match {
case Literal(fmt: UTF8String, _) =>
if (supportedFormats.contains(fmt.toString.toLowerCase(Locale.ROOT))) {
Compatible()
if (isUtc) {
Compatible()
} else {
Incompatible(
Some(
s"Incorrect results in non-UTC timezone '$timezone'" +
" (https://github.com/apache/datafusion-comet/issues/2649)"))
}
} else {
Unsupported(Some(s"Format $fmt is not supported"))
}
Expand Down
44 changes: 43 additions & 1 deletion spark/src/main/scala/org/apache/comet/serde/math.scala
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

package org.apache.comet.serde

import org.apache.spark.sql.catalyst.expressions.{Abs, Atan2, Attribute, Ceil, CheckOverflow, Expression, Floor, Hex, If, LessThanOrEqual, Literal, Log, Log10, Log2, Unhex}
import org.apache.spark.sql.catalyst.expressions.{Abs, Atan2, Attribute, Ceil, CheckOverflow, Expression, Floor, Hex, If, LessThanOrEqual, Literal, Log, Log10, Log2, Tan, Unhex}
import org.apache.spark.sql.types.{DecimalType, NumericType}

import org.apache.comet.CometSparkSessionExtensions.withInfo
Expand All @@ -38,6 +38,18 @@ object CometAtan2 extends CometExpressionSerde[Atan2] {
}

object CometCeil extends CometExpressionSerde[Ceil] {

override def getSupportLevel(expr: Ceil): SupportLevel = {
expr.child.dataType match {
case _: DecimalType =>
Incompatible(
Some(
"Incorrect results for Decimal type inputs" +
" (https://github.com/apache/datafusion-comet/issues/1729)"))
case _ => Compatible()
}
}

override def convert(
expr: Ceil,
inputs: Seq[Attribute],
Expand All @@ -58,6 +70,18 @@ object CometCeil extends CometExpressionSerde[Ceil] {
}

object CometFloor extends CometExpressionSerde[Floor] {

override def getSupportLevel(expr: Floor): SupportLevel = {
expr.child.dataType match {
case _: DecimalType =>
Incompatible(
Some(
"Incorrect results for Decimal type inputs" +
" (https://github.com/apache/datafusion-comet/issues/1729)"))
case _ => Compatible()
}
}

override def convert(
expr: Floor,
inputs: Seq[Attribute],
Expand Down Expand Up @@ -174,6 +198,24 @@ object CometAbs extends CometExpressionSerde[Abs] with MathExprBase {
}
}

object CometTan extends CometExpressionSerde[Tan] {

override def getSupportLevel(expr: Tan): SupportLevel =
Incompatible(
Some(
"tan(-0.0) produces incorrect result" +
" (https://github.com/apache/datafusion-comet/issues/1897)"))

override def convert(
expr: Tan,
inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr] = {
val childExpr = expr.children.map(exprToProtoInternal(_, inputs, binding))
val optExpr = scalarFunctionExprToProto("tan", childExpr: _*)
optExprWithInfo(optExpr, expr, expr.children: _*)
}
}

sealed trait MathExprBase {
protected def nullIfNegative(expression: Expression): Expression = {
val zero = Literal.default(expression.dataType)
Expand Down
6 changes: 6 additions & 0 deletions spark/src/main/scala/org/apache/comet/serde/structs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ object CometGetArrayStructFields extends CometExpressionSerde[GetArrayStructFiel

object CometStructsToJson extends CometExpressionSerde[StructsToJson] {

override def getSupportLevel(expr: StructsToJson): SupportLevel =
Incompatible(
Some(
"Does not support Infinity/-Infinity for numeric types" +
" (https://github.com/apache/datafusion-comet/issues/3016)"))

override def convert(
expr: StructsToJson,
inputs: Seq[Attribute],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
-- specific language governing permissions and limitations
-- under the License.

-- Config: spark.comet.expression.Corr.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true

statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
-- specific language governing permissions and limitations
-- under the License.

-- Config: spark.comet.expression.ArrayRemove.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true

statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
-- specific language governing permissions and limitations
-- under the License.

-- Config: spark.comet.expression.TruncTimestamp.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true

statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
-- specific language governing permissions and limitations
-- under the License.

-- Config: spark.comet.expression.ArrayContains.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true

-- TODO: replace map_from_arrays with map whenever map is supported in Comet
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
-- specific language governing permissions and limitations
-- under the License.

-- Config: spark.comet.expression.Ceil.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true

statement
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
-- specific language governing permissions and limitations
-- under the License.

-- Config: spark.comet.expression.Floor.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true

statement
Expand Down
Loading
Loading