HIVE-29649: Iceberg: Support Parquet DECIMAL_64 vectorization by deniskuzZ · Pull Request #6527 · apache/hive

deniskuzZ · 2026-06-05T18:55:35Z

What changes were proposed in this pull request?

Adds support for Parquet DECIMAL_64 vectorization

Why are the changes needed?

Performace

Does this PR introduce any user-facing change?

No

How was this patch tested?

parquet_decimal64.q, vectorized_iceberg_read_multitable.q

Copilot

Pull request overview

This PR aims to enable and validate Parquet DECIMAL_64 vectorization (including for Iceberg tables) by advertising DECIMAL_64 feature support from Parquet/Iceberg input formats, adding a Decimal64 read path to the Parquet vectorized reader, and extending test coverage and golden outputs to assert the new vectorization behavior.

Changes:

Advertise DECIMAL_64 as a supported vectorization feature for Parquet (and Iceberg when enabled via table properties).
Add a Decimal64ColumnVector read path to the Parquet vectorized primitive column reader (including dictionary decode support).
Add/adjust LLAP query tests and Java unit tests to exercise DECIMAL_64 vectorization and update expected outputs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
ql/src/test/results/clientpositive/llap/parquet_decimal64.q.out	New golden output validating Parquet `DECIMAL_64` vectorization plan + results.
ql/src/test/queries/clientpositive/parquet_decimal64.q	New query to assert Parquet vectorized reader engages `DECIMAL_64`.
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java	Adds wiring for physical variations + a Decimal64 read verification helper.
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java	Adds a unit test entry point for Decimal64 reads (dictionary).
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java	Adds a unit test entry point for Decimal64 reads (non-dictionary).
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java	Implements Parquet Decimal64 read + dictionary decode logic.
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java	Advertises `DECIMAL_64` support for Parquet vectorization.
iceberg/iceberg-handler/src/test/results/positive/llap/vectorized_iceberg_read_parquet.q.out	Updates plan/output expectations to show `DECIMAL_64` in use for Iceberg Parquet.
iceberg/iceberg-handler/src/test/results/positive/llap/vectorized_iceberg_read_multitable.q.out	Updates expected output (removes now-unneeded ORC-only toggling sections).
iceberg/iceberg-handler/src/test/results/positive/llap/vectorized_iceberg_read_mixed.q.out	Updates plan/output expectations for mixed ORC/Parquet Iceberg vectorization.
iceberg/iceberg-handler/src/test/queries/positive/vectorized_iceberg_read_multitable.q	Removes ORC-only toggling queries that are no longer required.
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java	Removes ORC-only parameter propagation used for prior vectorization gating.
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java	Enables advertising `DECIMAL_64` for Iceberg when table property is enabled (regardless of ORC/Parquet).
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java	Removes ORC-only parameter constant + helper methods used for vectorization gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  // INT32/INT64-backed decimals already give the unscaled value via the Parquet reader; for the
+  // (rare) byte-array-backed case reuse HiveDecimalWritable.serialize64 -- the same encoding
+  // Decimal64ColumnVector.set uses -- rather than decoding the bytes by hand.
+  private long readUnscaledLong(PrimitiveTypeName physical, short scale) {
+    return switch (physical) {
+      case INT32 -> dataColumn.readInteger();
+      case INT64 -> dataColumn.readLong();
+      default -> {
+        scratchDecimal.set(dataColumn.readDecimal(), scale);
+        yield scratchDecimal.serialize64(scale);
+      }
+    };
+  }
+
+  private long readUnscaledLongFromDict(PrimitiveTypeName physical, int dictId, short scale) {
+    return switch (physical) {
+      case INT32 -> dictionary.readInteger(dictId);
+      case INT64 -> dictionary.readLong(dictId);
+      default -> {
+        scratchDecimal.set(dictionary.readDecimal(dictId), scale);
+        yield scratchDecimal.serialize64(scale);
+      }
+    };
+  }


+    while (left > 0) {
+      readRepetitionAndDefinitionLevels();
+      if (definitionLevel >= maxDefLevel) {
+        c.vector[rowId] = readUnscaledLong(physical, c.scale);
+        c.isNull[rowId] = false;
+        c.isRepeating = c.isRepeating && (c.vector[0] == c.vector[rowId]);
+      } else {
+        setNullValue(c, rowId);
+      }
+      rowId++;
+      left--;
    }


+      if (column instanceof Decimal64ColumnVector dec64) {
+        fillDecimal64PrecisionScale(dec64);
+        PrimitiveTypeName dictPhysical = type.asPrimitiveType().getPrimitiveTypeName();
+        for (int i = rowId; i < rowId + num; ++i) {
+          if (!column.isNull[i]) {
+            dec64.vector[i] = readUnscaledLongFromDict(dictPhysical, (int) dictionaryIds.vector[i], dec64.scale);
+          }


 POSTHOOK: Input: default@tbl_ice_parquet_all_types
 #### A masked pattern was here ####
-1.1	1.2	false	4	567890123456789	6	col7	2012-10-03 19:58:08	1234-09-02	10.01
+1.1	1.2	false	4	567890123456789	6	col7	2012-10-03 19:58:08	1234-09-02	0.00


 #### A masked pattern was here ####
 1.1	1.2	false	4	567890123456789	6	col7	2012-10-03 19:58:08	1234-09-02	10.01
-5.1	6.2	true	40	567890123456780	8	col07	2012-10-03 19:58:09	1234-09-03	10.02
+5.1	6.2	true	40	567890123456780	8	col07	2012-10-03 19:58:09	1234-09-03	0.00


+  /**
+   * Decimal64 fast path: read the unscaled value straight into the long-backed vector instead of
+   * materializing a HiveDecimal per row. Only reached for decimal columns the vectorizer tagged
+   * DECIMAL_64 (precision <= 18); higher precision still uses {@link #readDecimal}.
+   */
+  private void readDecimal64(int total, Decimal64ColumnVector c, int rowId) {
+    fillDecimal64PrecisionScale(c);
+    PrimitiveTypeName physical = type.asPrimitiveType().getPrimitiveTypeName();


sonarqubecloud · 2026-06-05T19:59:42Z

Quality Gate passed

Issues
20 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

HIVE-29649: Iceberg: Support Parquet DECIMAL_64 vectorization

28d13e8

asf-ci-hive added the tests pending label Jun 5, 2026

deniskuzZ requested a review from Copilot June 5, 2026 19:49

Copilot started reviewing on behalf of deniskuzZ June 5, 2026 19:49 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

asf-ci-hive added tests unstable and removed tests pending labels Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29649: Iceberg: Support Parquet DECIMAL_64 vectorization#6527

HIVE-29649: Iceberg: Support Parquet DECIMAL_64 vectorization#6527
deniskuzZ wants to merge 1 commit into
apache:masterfrom
deniskuzZ:HIVE-29649

deniskuzZ commented Jun 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deniskuzZ commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deniskuzZ commented Jun 5, 2026 •

edited

Loading