Skip to content

[hive] Avoid treating empty partitioned tables as unpartitioned during migration#8100

Merged
JingsongLi merged 2 commits into
apache:masterfrom
huangxiaopingRD:fix-migration-empty-partiiton-tables
Jun 3, 2026
Merged

[hive] Avoid treating empty partitioned tables as unpartitioned during migration#8100
JingsongLi merged 2 commits into
apache:masterfrom
huangxiaopingRD:fix-migration-empty-partiiton-tables

Conversation

@huangxiaopingRD
Copy link
Copy Markdown
Contributor

Purpose

Fix Hive table migration for partitioned tables with no partition metadata, preventing empty partitioned tables from being incorrectly migrated as non-partitioned tables.

Root Cause

The previous logic used client.listPartitions(...).isEmpty() to decide whether to use the non-partitioned migration path. For a Hive table that defines partition keys but has no data, SHOW PARTITIONS returns empty, so the table was incorrectly treated as non-partitioned.

That path used BinaryRow.EMPTY_ROW, while the target Paimon table was still partitioned. When generating the bucket path, Paimon tried to read partition fields from the empty row, producing invalid partition values and eventually causing an HDFS Pathname too long error.

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): mkdirs: Pathname too long.  Limit 8000 characters, 1000 levels.
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1435)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:772)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:641)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:609)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:593)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1153)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1324)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1217)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2031)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3610)

	at org.apache.hadoop.ipc.Client.warpIOException(Client.java:1722)
	at org.apache.hadoop.ipc.Client.lambda$call$2(Client.java:1640)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at org.apache.hadoop.ipc.Client$Call.setException(Client.java:381)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1375)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1149)

	at org.apache.hadoop.ipc.Client.warpIOException(Client.java:1643)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1634)
	at org.apache.hadoop.ipc.Client.call(Client.java:1568)
	at org.apache.hadoop.ipc.Client.call(Client.java:1477)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:142)
	at jdk.proxy2/jdk.proxy2.$Proxy27.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:732)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:465)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:185)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:177)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:106)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:385)
	at jdk.proxy2/jdk.proxy2.$Proxy28.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2795)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2771)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1568)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1565)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1582)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1557)
	at org.apache.hadoop.hdfs.ForwardDistributedFileSystem.mkdirs(ForwardDistributedFileSystem.java:858)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2420)
	at org.apache.paimon.fs.hadoop.HadoopFileIO.mkdirs(HadoopFileIO.java:169)
	at org.apache.paimon.hive.migrate.HiveMigrator$MigrateTask.call(HiveMigrator.java:402)
	at org.apache.paimon.hive.migrate.HiveMigrator$MigrateTask.call(HiveMigrator.java:372)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:89)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:444)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:562)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:556)
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:630)

Changes

Use sourceHiveTable.getPartitionKeys().isEmpty() to determine whether the Hive table is non-partitioned.
Call client.listPartitions(...) only for partitioned Hive tables.
Add a regression test for migrating an empty partitioned Hive table.

Tests

Added regression test for migrating an empty partitioned Hive table.

@huangxiaopingRD huangxiaopingRD changed the title [hive] Fix migration for empty partitioned tables [hive] Avoid treating empty partitioned tables as unpartitioned during migration Jun 3, 2026
@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 4d0a651 into apache:master Jun 3, 2026
12 checks passed
@huangxiaopingRD huangxiaopingRD deleted the fix-migration-empty-partiiton-tables branch June 3, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants