Skip to content

[ABA-21] test(vortex-array): repro for float compare semantics divergence#18

Open
abnobdoss wants to merge 1 commit into
developfrom
fix/aba-21-float-compare-total-order
Open

[ABA-21] test(vortex-array): repro for float compare semantics divergence#18
abnobdoss wants to merge 1 commit into
developfrom
fix/aba-21-float-compare-total-order

Conversation

@abnobdoss
Copy link
Copy Markdown
Owner

Summary

  • Adds a single #[ignore]'d test issue_aba21_float_compare_uses_ieee754_semantics to vortex-array/src/scalar_fn/fns/binary/compare.rs documenting that vortex-array's float compare path uses Arrow's totalOrder semantics rather than IEEE 754 / SQL default comparison semantics.
  • Bug is confirmed present on develop — the test fails when un-ignored.
  • This is a REPRO-ONLY PR — no fix is included, pending a design decision from upstream (see open question below).

Linear: https://linear.app/abanoubdoss/issue/ABA-21

What the test covers

Four probes against the arrow_compare_arrays path (compare.rs lines 174-181 / arrow_ord::cmp::*):

Probe IEEE 754 / SQL Current (totalOrder)
NaN == NaN false true (FAILS)
+0.0 == -0.0 true false (FAILS)
NaN < 1.0 false false (passes — both agree)
NaN > 1.0 false true (FAILS — WHERE col > 0 silently includes NaN rows)

Bug sites on develop

  • vortex-array/src/scalar_fn/fns/binary/compare.rs:174-181arrow_ord::cmp::{eq, neq, gt, gt_eq, lt, lt_eq} used for primitive floats (totalOrder via Arrow ≥ 57)
  • scalar_cmp at line 195 (partial_cmp) → PValue::partial_cmptotal_compare / is_eq via NativePType

Open question

The intentional choice was made in vortex vortex-data#808 and confirmed in vortex-data#3958 ("totalOrder is what dict/hash/sort needs"). The unresolved question for this PR:

  1. Which surface should change? The SQL-facing scalar_fn compare path, or should the entire PValue PartialOrd/PartialEq move to IEEE 754?
  2. Split semantics: Should arrow_ord remain for sort/pruning (where totalOrder is desirable) but the Operator::{Eq, Lt, Gt, …} SQL-filter path switch to IEEE 754? Or is a single semantic across all surfaces acceptable?
  3. NaN handling: Should NaN comparisons return null (SQL three-valued logic) rather than false?

🤖 Generated with Claude Code

… not IEEE 754

Add an ignored regression test documenting that vortex-array's float
compare path uses Arrow's totalOrder semantics (NaN == NaN → true,
-0.0 ≠ +0.0) rather than IEEE 754 / SQL default comparison semantics.

Four probes cover the bug surface:
- NaN == NaN → false (IEEE 754) but currently true (totalOrder)
- +0.0 == -0.0 → true (IEEE 754) but currently false (totalOrder)
- NaN < 1.0 → false (both; non-discriminating, pins the Lt contract)
- NaN > 1.0 → false (IEEE 754) but currently true (totalOrder) —
  the SQL-visible failure: WHERE col > 0 silently includes NaN rows

The test is left #[ignore]'d because the choice of semantics
(totalOrder for dict/sort/pruning vs. IEEE 754 for SQL filters)
needs an upstream design decision before a fix can land.

See: https://linear.app/abanoubdoss/issue/ABA-21

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Abanoub Doss <abanoub.doss@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant