[hive] Avoid treating empty partitioned tables as unpartitioned during migration by huangxiaopingRD · Pull Request #8100 · apache/paimon

huangxiaopingRD · 2026-06-03T07:27:23Z

Purpose

Fix Hive table migration for partitioned tables with no partition metadata, preventing empty partitioned tables from being incorrectly migrated as non-partitioned tables.

Root Cause

The previous logic used client.listPartitions(...).isEmpty() to decide whether to use the non-partitioned migration path. For a Hive table that defines partition keys but has no data, SHOW PARTITIONS returns empty, so the table was incorrectly treated as non-partitioned.

That path used BinaryRow.EMPTY_ROW, while the target Paimon table was still partitioned. When generating the bucket path, Paimon tried to read partition fields from the empty row, producing invalid partition values and eventually causing an HDFS Pathname too long error.

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): mkdirs: Pathname too long.  Limit 8000 characters, 1000 levels.
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1435)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:772)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:641)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:609)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:593)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1153)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1324)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1217)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2031)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3610)

	at org.apache.hadoop.ipc.Client.warpIOException(Client.java:1722)
	at org.apache.hadoop.ipc.Client.lambda$call$2(Client.java:1640)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at org.apache.hadoop.ipc.Client$Call.setException(Client.java:381)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1375)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1149)

	at org.apache.hadoop.ipc.Client.warpIOException(Client.java:1643)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1634)
	at org.apache.hadoop.ipc.Client.call(Client.java:1568)
	at org.apache.hadoop.ipc.Client.call(Client.java:1477)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:142)
	at jdk.proxy2/jdk.proxy2.$Proxy27.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:732)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:465)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:185)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:177)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:106)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:385)
	at jdk.proxy2/jdk.proxy2.$Proxy28.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2795)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2771)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1568)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1565)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1582)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1557)
	at org.apache.hadoop.hdfs.ForwardDistributedFileSystem.mkdirs(ForwardDistributedFileSystem.java:858)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2420)
	at org.apache.paimon.fs.hadoop.HadoopFileIO.mkdirs(HadoopFileIO.java:169)
	at org.apache.paimon.hive.migrate.HiveMigrator$MigrateTask.call(HiveMigrator.java:402)
	at org.apache.paimon.hive.migrate.HiveMigrator$MigrateTask.call(HiveMigrator.java:372)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:89)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:444)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:562)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:556)
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:630)

Changes

Use sourceHiveTable.getPartitionKeys().isEmpty() to determine whether the Hive table is non-partitioned.
Call client.listPartitions(...) only for partitioned Hive tables.
Add a regression test for migrating an empty partitioned Hive table.

Tests

Added regression test for migrating an empty partitioned Hive table.

JingsongLi · 2026-06-03T13:45:31Z

+1

[hive] Fix migration for empty partitioned tables

31a1907

huangxiaopingRD changed the title ~~[hive] Fix migration for empty partitioned tables~~ [hive] Avoid treating empty partitioned tables as unpartitioned during migration Jun 3, 2026

Merge branch 'apache:master' into fix-migration-empty-partiiton-tables

1ea527a

JingsongLi merged commit 4d0a651 into apache:master Jun 3, 2026
12 checks passed

huangxiaopingRD deleted the fix-migration-empty-partiiton-tables branch June 3, 2026 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hive] Avoid treating empty partitioned tables as unpartitioned during migration#8100

[hive] Avoid treating empty partitioned tables as unpartitioned during migration#8100
JingsongLi merged 2 commits into
apache:masterfrom
huangxiaopingRD:fix-migration-empty-partiiton-tables

huangxiaopingRD commented Jun 3, 2026

Uh oh!

JingsongLi commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huangxiaopingRD commented Jun 3, 2026

Purpose

Root Cause

Changes

Tests

Uh oh!

JingsongLi commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants