-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7270] Add support for a DIVIDE_0_NULL operation #4653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bb01644 to
7fb33a6
Compare
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
7fb33a6 to
5d687ce
Compare
|
xiedeyantu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this particular module, so I'm not entirely clear on the correct approach. I've only performed a basic check—it might require someone with expertise in this area to review it.
| } | ||
|
|
||
| // nullable divide | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this empty line.
| !use post | ||
| !set outputformat mysql | ||
|
|
||
| select CAST(5 AS DOUBLE) / 0 as a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mysql result is here. Was this done intentionally here?
+------+
| a |
+------+
| NULL |
+------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is terrible, it's against the IEEE 754 rules. we can probably emulate this behavior, but this means that there is yet another DIVISION operator besides the ones I have introduced...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I understand. Are you suggesting that MySQL's behavior might be problematic? However, when this SQL is executed in PostgreSQL, it directly throws an error (ERROR: division by zero).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, MySQL is really weird.
5e0 / 0e0 does not conform to the IEEE 754 rules, where it should produce Infinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct. DuckDB also returns inf. You can use this link to check it.
duckdb> select CAST(5 AS DOUBLE) / 0 as a;
┌─────┐
│ a │
╞═════╡
│ inf │
└─────┘
duckdb> select CAST(-5 AS DOUBLE) / 0 as a;
┌──────┐
│ a │
╞══════╡
│ -inf │
└──────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, Apache Hive also returns NULL whenever you divide by zero no matter the data type.
CREATE TABLE types
(
cint INT,
cbint BIGINT,
cfloat FLOAT,
cdouble DOUBLE,
cdecimal DECIMAL(10, 2)
);
INSERT INTO types
VALUES (5, 5, 5.5, 5.5, 5.5);
SELECT cint / 0 AS cint_div_zero,
cbint / 0 AS cbint_div_zero,
cfloat / 0 AS cfloat_div_zero,
cdouble / 0 AS cdouble_div_zero,
cdecimal / 0 AS cdecimal_div_zeroNULL NULL NULL NULL NULL
| !use scott-mysql | ||
|
|
||
| # [CALCITE-7270] Add support for a DIVIDE_0_NULL operator | ||
| select 5 / 0 as a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not same as mysql result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link you submitted does not execute the same operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have fixed it. it also return NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duckdb return Inf, maybe I didn't understand what DIVIDE_0_NULL means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's supposed to return NULL for integer or decimal division by 0.
For double values it's supposed to follow IEEE 794.
But you see, there are actually lots of DIVISION operators.
Even MySQL has strict and non-strict modes.
| case MINUS: | ||
| case TIMES: | ||
| case DIVIDE: | ||
| case DIVIDE_0_NULL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use DIVIDE_ZERO_NULL?
| if (CHECKED_OPERATORS.contains(op) | ||
| || op.kind == SqlKind.DIVIDE_0_NULL | ||
| || op.kind == SqlKind.MOD_0_NULL | ||
| || op.kind == SqlKind.CHECKED_DIVIDE_0_NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to target checked arithmetic. Are DIVIDE_0_NULL and MOD_0_NULL part of that? Can we add them to CHECKED_OPERATORS?
| // sqlite, postgres, duckdb, mysql and MariaDB (non-strict mode). Note that MYSQL is actually | ||
| // dynamically-typed, so Calcite cannot implement its behavior accurately, since Calcite is | ||
| // statically-typed. | ||
| case MYSQL_5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make LENIENT support this new division behavior as Calcite's default? Because the current behavior is inconsistent with MySQL (although MySQL might be incorrect). I tried to verify this across several databases: PostgreSQL throws an error directly, while MariaDB, SQLite and MySQL behave consistently. DuckDB actually differs from both MySQL and the current implementation as well.
| !use post | ||
| !set outputformat mysql | ||
|
|
||
| select CAST(5 AS DOUBLE) / 0 as a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, Apache Hive also returns NULL whenever you divide by zero no matter the data type.
CREATE TABLE types
(
cint INT,
cbint BIGINT,
cfloat FLOAT,
cdouble DOUBLE,
cdecimal DECIMAL(10, 2)
);
INSERT INTO types
VALUES (5, 5, 5.5, 5.5, 5.5);
SELECT cint / 0 AS cint_div_zero,
cbint / 0 AS cbint_div_zero,
cfloat / 0 AS cfloat_div_zero,
cdouble / 0 AS cdouble_div_zero,
cdecimal / 0 AS cdecimal_div_zeroNULL NULL NULL NULL NULL
| /** | ||
| * Unchecked nullable version of DIVIDE, which produces NULL when dividing by zero. | ||
| */ | ||
| DIVIDE_0_NULL, | ||
|
|
||
| /** | ||
| * Checked nullable version of DIVIDE, which produces NULL when dividing by zero. | ||
| */ | ||
| CHECKED_DIVIDE_0_NULL, | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to introduce a new SqlKind entry for each division variant? It's true that we have used this pattern before but maybe its not necessary in every case. In some cases the exact division peculiarities may not be that important so it could be fine to keep the existing SqlKind.DIVIDE.
| * Most SQL dialects will produce a runtime exception on division by zero, | ||
| * but some dialects return NULL instead (e.g. sqlite). | ||
| */ | ||
| boolean nullableDivide(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this a default method otherwise this becomes a breaking change.
| */ | ||
| public static final SqlTypeTransform FORCE_NULLABLE_NON_FP = | ||
| (opBinding, typeToTransform) -> { | ||
| boolean nullable = !SqlTypeName.APPROX_TYPES.contains(typeToTransform.getSqlTypeName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For approximate types nullable becomes false which will end up making the resulting type non-nullable which I don't think its correct. I have the impression that the result of the division is always nullable and this is also unrelated to the fact that we divide by zero or not.
Example
2.5/null
null/2.5
(float) null/ (float) nullThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these results inherit the nullability of the operands, but divide_0_null is always nullable
| case DIVIDE_0_NULL: | ||
| case CHECKED_DIVIDE_0_NULL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we don't have separate handling for DIVIDE_0_NULL and CHECKED_DIVIDE_0_NULL anywhere in the code is a hint that maybe we don't need separate entries in SqlKind.
| defineBinary(CHECKED_DIVIDE_INTEGER, DivideChecked, NullPolicy.STRICT, "checkedDivide"); | ||
| defineUnary(CHECKED_UNARY_MINUS, NegateChecked, NullPolicy.STRICT, "checkedUnaryMinus"); | ||
| // nullable division | ||
| defineBinary(DIVIDE_0_NULL, Divide0Null, NullPolicy.SEMI_STRICT, "nullableDivide"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DIVIDE_0_NULL is already defined in line 836. Why do we need two implementors for the same operator?
| // Unfortunately there cannot exist two functions in the standard operator | ||
| // table with the same exact name, so we need to use a different name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this statement really true? How come we don't have the same problem for division?
| // Unfortunately there cannot exist two functions in the standard operator | ||
| // table with the same exact name, so we need to use a different name, | ||
| // although in SQL this would be shown just as MOD | ||
| SqlBasicFunction.create("NULLABLE_MOD", SqlKind.MOD_0_NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also PERCENT_REMAINDER with name % why did we pick "NULLABLE_MOD" as a name?
| // Return type is same as divisor (2nd operand) | ||
| // SQL2003 Part2 Section 6.27, Syntax Rules 9 | ||
| SqlBasicFunction.create(SqlKind.MOD, | ||
| // A rather unfortunate name for this return type strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the problem with the name?
| map.put(SqlKind.DIVIDE_0_NULL, Policy.AS_IS); | ||
| map.put(SqlKind.MOD_0_NULL, Policy.AS_IS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could override the SqlOperator.getStrongPolicyInference for the new operators and avoid the need for introducing a new SqlKind entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do something about CHECKED_DIVIDE_0_NULL?
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@calcite.apache.org list. Thank you for your contributions. |



https://issues.apache.org/jira/browse/CALCITE-7270