Skip to content

HIVE-29637: Incorrect Results for IS_NULL Predicate on Partition Column#6515

Open
deniskuzZ wants to merge 3 commits into
apache:masterfrom
deniskuzZ:HIVE-29637
Open

HIVE-29637: Incorrect Results for IS_NULL Predicate on Partition Column#6515
deniskuzZ wants to merge 3 commits into
apache:masterfrom
deniskuzZ:HIVE-29637

Conversation

@deniskuzZ
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ commented May 29, 2026

…n Column

What changes were proposed in this pull request?

IcebergTableUtil.makeSpecFromName() — a new helper that builds a Hive-compatible partition spec map from Iceberg's PartitionData. When a partition field value is null, it maps it to an actual NULL instead of the literal string "null".

Why are the changes needed?

Queries with WHERE <partition_col> IS NULL on Iceberg tables returned incorrect results (empty result set).

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -Dtest=TestIcebergCliDriver -Dqfile=iceberg_isnull_partition_pruning.q

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect results for WHERE <partition_col> IS NULL on Iceberg tables by ensuring null partition values are represented as actual Java null during partition predicate evaluation, and adds a regression qtest to cover the scenario.

Changes:

  • Add IcebergTableUtil.makeSpecFromName(...) to build a partition spec map from an Iceberg partition path while translating Iceberg null partition values to Java null.
  • Update HiveIcebergStorageHandler#getPartitionsByExpr to use the new helper when constructing DummyPartition specs for expression evaluation.
  • Add a new positive qtest (iceberg_isnull_partition_pruning.q + expected output) validating IS NULL partition pruning behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java Adds a helper for parsing Iceberg partition paths into a spec map with correct null handling.
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java Switches partition spec construction in getPartitionsByExpr to the new helper to fix IS NULL results.
iceberg/iceberg-handler/src/test/queries/positive/iceberg_isnull_partition_pruning.q Adds a regression query file covering WHERE str_col IS NULL on an Iceberg-partitioned table.
iceberg/iceberg-handler/src/test/results/positive/iceberg_isnull_partition_pruning.q.out Adds the expected output for the new regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some high level comments/questions cause I am not very familiar with the modified APIs.

@deniskuzZ deniskuzZ changed the title HIVE-29637: Iceberg: Incorrect Results for NULL Predicate on Partitio… HIVE-29637: Incorrect Results for NULL Predicate on Partitio… Jun 2, 2026
@deniskuzZ deniskuzZ changed the title HIVE-29637: Incorrect Results for NULL Predicate on Partitio… HIVE-29637: Incorrect Results for NULL Predicate on Partition Column Jun 2, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 4, 2026

Copy link
Copy Markdown
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively approve assuming that we all agree that the new behavior is the one desired by all users and the old one is merely a bug.

Comment on lines +90 to +98

String partitionValue = entry.getValue();
if (partitionValue.equals(defaultPartitionName)) {
partValues.add(null); // Null for default partition.
} else {
partValues.add(ObjectInspectorConverters.getConverter(
PrimitiveObjectInspectorFactory.javaStringObjectInspector, oi)
.convert(partitionValue));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of hive.exec.default.partition.name property says the following:

"The default partition name in case the dynamic partition column value is null/empty string or any other values that cannot be escaped."

This code assumes that the value is null but according to the description it may be other things as well.

The change here will fix the IS [NOT] NULL predicate but may potentially change the behavior of some queries:

SELECT key, value, ds FROM pcr_t1 WHERE ds > 'A';
SELECT key, value, ds FROM pcr_t1 WHERE ds < 'A';

Do the queries return the same results before/after the changes in the PR for Iceberg and non-Iceberg tables ?
Which behavior should prevail?

Copy link
Copy Markdown
Member Author

@deniskuzZ deniskuzZ Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-Iceberg pcr_t1

  (NULL and '' both land in ds=__HIVE_DEFAULT_PARTITION__)
  
  ┌────────────────┬─────────────────────────────────────────┬─────────────────────────────────────────┬─────────────────┐
  │     Query      │               WITHOUT fix               │                WITH fix                 │ Changed by fix? │
  ├────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────┤
  │ ds is null     │ (none) ❌                                │ C, D → ds=NULL                          │ ✅  fixed        │
  ├────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────┤
  │ ds is not null │ A, B                                    │ A, B                                    │ no              │
  ├────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────┤
  │ ds = 'null'    │ B                                       │ B                                       │ no              │
  ├────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────┤
  │ ds > 'A'       │ C, D, B → ds=__HIVE_DEFAULT_PARTITION__ │ C, D, B → ds=__HIVE_DEFAULT_PARTITION__ │ no              │
  ├────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────┤
  │ ds < 'A'       │ A                                       │ A                                       │ no              │
  └────────────────┴─────────────────────────────────────────┴─────────────────────────────────────────┴─────────────────┘
  
  Iceberg ice_01

  (NULL → null partition; '' → real ds='')

  ┌────────────────┬─────────────┬──────────┬─────────────────┐
  │     Query      │ WITHOUT fix │ WITH fix │ Changed by fix? │
  ├────────────────┼─────────────┼──────────┼─────────────────┤
  │ ds is null     │ (none) ❌    │ C        │ ✅  fixed        │
  ├────────────────┼─────────────┼──────────┼─────────────────┤
  │ ds is not null │ A, B, D     │ A, B, D  │ no              │
  ├────────────────┼─────────────┼──────────┼─────────────────┤
  │ ds = 'null'    │ B           │ B        │ no              │
  ├────────────────┼─────────────┼──────────┼─────────────────┤
  │ ds > 'A'       │ B           │ B        │ no              │
  ├────────────────┼─────────────┼──────────┼─────────────────┤
  │ ds < 'A'       │ D, A        │ A, D     │ no (same set)   │
  └────────────────┴─────────────┴──────────┴─────────────────┘

  • The fix changes exactly one query on each table type: ds is null (previously returned nothing — the bug)
  • Range queries (>, <) are unchanged before/after on both table types → no regression.
  • Iceberg is fully SQL-correct throughout because it never conflates NULL with ''.

PR scope is NULL semantics only.

insert into pcr_t1 partition (ds) select 'B', 'V2', 'null';
insert into pcr_t1 partition (ds) select 'C', 'V3', null;

explain select key, value, ds from pcr_t1 where ds is null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something useful in the EXPLAIN output? If not we can drop those.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, filterExpr: ds is not null (type: boolean)


insert into pcr_t1 partition (ds) select 'A', 'V1', '2000-04-08';
insert into pcr_t1 partition (ds) select 'B', 'V2', 'null';
insert into pcr_t1 partition (ds) select 'C', 'V3', null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty string value is another edge case that may create problems

insert into pcr_t1 partition (ds) select 'D', 'V4', '';

explain select key, value, ds from pcr_t1 where ds is not null;
select key, value, ds from pcr_t1 where ds is not null order by key;

select key, value, ds from pcr_t1 where ds = 'null';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enriching the tests with:

SELECT key, value, ds FROM pcr_t1 WHERE ds > 'A';
SELECT key, value, ds FROM pcr_t1 WHERE ds < 'A';

@deniskuzZ deniskuzZ changed the title HIVE-29637: Incorrect Results for NULL Predicate on Partition Column HIVE-29637: Incorrect Results for IS_NULL Predicate on Partition Column Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants