fix(pd): resolve hostname entries in IpAuthHandler allowlist#2962
fix(pd): resolve hostname entries in IpAuthHandler allowlist#2962bitflicker64 wants to merge 2 commits intoapache:masterfrom
Conversation
…at startup to avoid Netty DNS blocking
|
How I tested:
Results with pd2 as leader and hostnames only: Before this fix: After this fix: Hostnames are resolved to IPs at startup via |
There was a problem hiding this comment.
Pull request overview
Fixes PD Raft inbound connection allowlisting when the configured allowlist contains hostnames (e.g., Docker bridge service names) by resolving those hostnames to IPs up front and validating connections via a simple set lookup.
Changes:
- Resolve allowlist entries via
InetAddress.getAllByName()at handler construction time - Cache resolved IPs in an immutable
Setused byisIpAllowed() - Log a warning when an allowlist hostname can’t be resolved
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/auth/IpAuthHandler.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/auth/IpAuthHandler.java
Show resolved
Hide resolved
…r changes - Resolve allowlist hostnames to IPs using InetAddress.getAllByName - Add refresh() to update resolved IPs when Raft peer list changes - Wire refresh into RaftEngine.changePeerList() - Add IpAuthHandlerTest covering hostname resolution, refresh behavior, and failure cases
|
While working on this change, I noticed a couple of related behaviors that appear to predate this PR. The /v1/members/change endpoint relies on RestAuthentication, where Authentication.authenticate() currently validates only the username portion of Basic Auth. I also noticed that GrpcAuthentication does not appear to be wired into the gRPC server. These are outside the scope of this fix, but I thought it might be helpful to mention in case they are worth tracking separately. |
- Cache leader PeerId after waitingForLeader() and null-check to avoid NPE when leader election times out - Remove incorrect fallback that derived leader gRPC address from local node's port, causing silent misroutes in multi-node clusters - Wire config.getRpcTimeout() into RaftRpcClient's RpcOptions so Bolt transport timeout is consistent with future.get() caller timeout - Replace hardcoded 10000ms in waitingForLeader() with config.getRpcTimeout() - Remove unused RaftOptions variable and dead imports (ReplicatorGroup, ThreadId) Fixes apache#2959 Related to apache#2952, apache#2962
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @After | ||
| public void tearDown() { | ||
| Whitebox.setInternalState(IpAuthHandler.class, "instance", null); | ||
| } |
| @Test | ||
| public void testHostnameResolvesToIp() { | ||
| // "localhost" should resolve to "127.0.0.1" | ||
| IpAuthHandler handler = IpAuthHandler.getInstance( | ||
| Collections.singleton("localhost")); | ||
| Assert.assertTrue(isIpAllowed(handler, "127.0.0.1")); | ||
| } |
| @Test | ||
| public void testGetInstanceReturnsNullBeforeInit() { | ||
| // After tearDown resets singleton, no-arg getInstance returns null | ||
| Assert.assertNull(IpAuthHandler.getInstance()); | ||
| } |
|
|
||
| public class IpAuthHandlerTest { | ||
|
|
||
| @After |
There was a problem hiding this comment.
@After. Earlier classes in PDCoreSuiteTest initialize RaftEngine, which creates IpAuthHandler before this class runs; after that, IpAuthHandler.getInstance(Set) here just returns the pre-existing singleton and ignores the allowlist passed by the test. That makes the assertions depend on suite execution order instead of the test input. Please reset the singleton in @Before as well, or isolate this class from the suite so each test starts from a clean handler instance.
PDCoreSuiteTest
|
+-- earlier test -> RaftEngine.init()
| |
| +-- IpAuthHandler singleton created
|
+-- IpAuthHandlerTest
|
+-- getInstance(new allowlist)
|
+-- returns old singleton
|
v
test input ignored
| }); | ||
| latch.await(); | ||
|
|
||
| // Refresh IpAuthHandler so newly added peers are not blocked |
There was a problem hiding this comment.
RaftEngine.changePeerList() path, but peer membership can still be updated through PDService.updatePdRaft(), which calls node.changePeers() directly and never reaches this block. In that case the Raft config changes while IpAuthHandler keeps the old allowlist, so newly added hostname-based peers can still be rejected. Please centralize peer changes behind one helper that also refreshes the allowlist, or make the other update path invoke the same refresh logic on success.
peer update
|
+-- changePeerList() --------> refresh allowlist ✅
|
+-- updatePdRaft() ----------> changePeers only ❌
|
v
old allowlist remains
|
v
new peer can be blocked
Some context about the auth system in PD/Store: To put it simply, PD/Store didn't have a strict auth mechanism due to legacy design decisions. Since these and other gRPC components were meant to be internal-only, only the graph server was originally built with a formal public-facing auth interface. We later added a basic, quick-and-dirty auth layer for security, but the current implementation remains incomplete. We plan to systematically refactor this later, but for now, let's just add TODOs in the relevant places |
Purpose of the PR
IpAuthHandleronly compared the client IP with the allowlist entries directly.When the allowlist contains hostnames, connections from their resolved IPs could be rejected.
Main Changes
InetAddress.getAllByNameSet.contains()lookupVerifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need