Skip to content

fix(pd): resolve hostname entries in IpAuthHandler allowlist#2962

Open
bitflicker64 wants to merge 2 commits intoapache:masterfrom
bitflicker64:fix-ipauthhandler-dns-resolution
Open

fix(pd): resolve hostname entries in IpAuthHandler allowlist#2962
bitflicker64 wants to merge 2 commits intoapache:masterfrom
bitflicker64:fix-ipauthhandler-dns-resolution

Conversation

@bitflicker64
Copy link
Contributor

@bitflicker64 bitflicker64 commented Mar 5, 2026

Purpose of the PR

IpAuthHandler only compared the client IP with the allowlist entries directly.
When the allowlist contains hostnames, connections from their resolved IPs could be rejected.

Main Changes

  • Resolve hostname allowlist entries using InetAddress.getAllByName
  • Store resolved addresses in a set used for connection validation
  • Keep runtime validation as a simple Set.contains() lookup

Verifying these changes

Does this PR potentially affect the following parts?

  • Dependencies
  • Modify configurations
  • The public API
  • Other affects
  • Nope

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@bitflicker64
Copy link
Contributor Author

How I tested:

  1. Built a local Docker image from source with this fix applied (alongside the getLeaderGrpcAddress fix from fix(pd): add timeout and null-safety to getLeaderGrpcAddress() #2961)
  2. Removed static IP workaround from docker-compose — switched back to hostnames (pd0:8610, pd1:8610, pd2:8610) for all raft peers, gRPC hosts, and PD addresses
  3. Brought up the full 3-node cluster (3 PD + 3 Store + 3 Server) in bridge network mode with no ipam static IP config
  4. pd2 won leader election on first boot

Results with pd2 as leader and hostnames only:

partitionCount:12 on all 3 stores ✅
leaderCount:12 on all 3 stores ✅
{"graphs":["hugegraph"]} ✅
All 9 containers healthy ✅
No "Blocked connection" warnings in any PD logs ✅

Before this fix: IpAuthHandler compared raw hostname strings against incoming connection IPs — always failed in bridge mode, requiring static IPs as a workaround.

After this fix: Hostnames are resolved to IPs at startup via InetAddress.getAllByName() — static IP workaround no longer needed.

Related PR: #2952, #2961

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes PD Raft inbound connection allowlisting when the configured allowlist contains hostnames (e.g., Docker bridge service names) by resolving those hostnames to IPs up front and validating connections via a simple set lookup.

Changes:

  • Resolve allowlist entries via InetAddress.getAllByName() at handler construction time
  • Cache resolved IPs in an immutable Set used by isIpAllowed()
  • Log a warning when an allowlist hostname can’t be resolved

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…r changes

- Resolve allowlist hostnames to IPs using InetAddress.getAllByName
- Add refresh() to update resolved IPs when Raft peer list changes
- Wire refresh into RaftEngine.changePeerList()
- Add IpAuthHandlerTest covering hostname resolution, refresh behavior, and failure cases
@bitflicker64
Copy link
Contributor Author

While working on this change, I noticed a couple of related behaviors that appear to predate this PR. The /v1/members/change endpoint relies on RestAuthentication, where Authentication.authenticate() currently validates only the username portion of Basic Auth. I also noticed that GrpcAuthentication does not appear to be wired into the gRPC server. These are outside the scope of this fix, but I thought it might be helpful to mention in case they are worth tracking separately.
Also worth noting: IpAuthHandler.refresh() is wired through RaftEngine.changePeerList(), but PDService.updatePdRaft() calls node.changePeers() directly, so the refresh won't fire for that path.

@bitflicker64 bitflicker64 requested a review from imbajin March 12, 2026 07:25
bitflicker64 added a commit to bitflicker64/incubator-hugegraph that referenced this pull request Mar 12, 2026
- Cache leader PeerId after waitingForLeader() and null-check to avoid
  NPE when leader election times out
- Remove incorrect fallback that derived leader gRPC address from local
  node's port, causing silent misroutes in multi-node clusters
- Wire config.getRpcTimeout() into RaftRpcClient's RpcOptions so Bolt
  transport timeout is consistent with future.get() caller timeout
- Replace hardcoded 10000ms in waitingForLeader() with config.getRpcTimeout()
- Remove unused RaftOptions variable and dead imports (ReplicatorGroup, ThreadId)

Fixes apache#2959
Related to apache#2952, apache#2962
@imbajin imbajin requested a review from Copilot March 13, 2026 09:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +32 to +35
@After
public void tearDown() {
Whitebox.setInternalState(IpAuthHandler.class, "instance", null);
}
Comment on lines +43 to +49
@Test
public void testHostnameResolvesToIp() {
// "localhost" should resolve to "127.0.0.1"
IpAuthHandler handler = IpAuthHandler.getInstance(
Collections.singleton("localhost"));
Assert.assertTrue(isIpAllowed(handler, "127.0.0.1"));
}
Comment on lines +85 to +89
@Test
public void testGetInstanceReturnsNullBeforeInit() {
// After tearDown resets singleton, no-arg getInstance returns null
Assert.assertNull(IpAuthHandler.getInstance());
}

public class IpAuthHandlerTest {

@After
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ These tests are order-dependent because the singleton is only cleared in @After. Earlier classes in PDCoreSuiteTest initialize RaftEngine, which creates IpAuthHandler before this class runs; after that, IpAuthHandler.getInstance(Set) here just returns the pre-existing singleton and ignores the allowlist passed by the test. That makes the assertions depend on suite execution order instead of the test input. Please reset the singleton in @Before as well, or isolate this class from the suite so each test starts from a clean handler instance.

PDCoreSuiteTest
    |
    +-- earlier test -> RaftEngine.init()
    |                   |
    |                   +-- IpAuthHandler singleton created
    |
    +-- IpAuthHandlerTest
            |
            +-- getInstance(new allowlist)
                    |
                    +-- returns old singleton
                            |
                            v
                      test input ignored

});
latch.await();

// Refresh IpAuthHandler so newly added peers are not blocked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ This refresh closes the RaftEngine.changePeerList() path, but peer membership can still be updated through PDService.updatePdRaft(), which calls node.changePeers() directly and never reaches this block. In that case the Raft config changes while IpAuthHandler keeps the old allowlist, so newly added hostname-based peers can still be rejected. Please centralize peer changes behind one helper that also refreshes the allowlist, or make the other update path invoke the same refresh logic on success.

peer update
    |
    +-- changePeerList() --------> refresh allowlist   ✅
    |
    +-- updatePdRaft() ----------> changePeers only    ❌
                                       |
                                       v
                              old allowlist remains
                                       |
                                       v
                              new peer can be blocked

@imbajin
Copy link
Member

imbajin commented Mar 13, 2026

While working on this change, I noticed a couple of related behaviors that appear to predate this PR. The /v1/members/change endpoint relies on RestAuthentication, where Authentication.authenticate() currently validates only the username portion of Basic Auth. I also noticed that GrpcAuthentication does not appear to be wired into the gRPC server. These are outside the scope of this fix, but I thought it might be helpful to mention in case they are worth tracking separately. Also worth noting: IpAuthHandler.refresh() is wired through RaftEngine.changePeerList(), but PDService.updatePdRaft() calls node.changePeers() directly, so the refresh won't fire for that path.

Some context about the auth system in PD/Store:

To put it simply, PD/Store didn't have a strict auth mechanism due to legacy design decisions. Since these and other gRPC components were meant to be internal-only, only the graph server was originally built with a formal public-facing auth interface. We later added a basic, quick-and-dirty auth layer for security, but the current implementation remains incomplete. We plan to systematically refactor this later, but for now, let's just add TODOs in the relevant places

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] IpAuthHandler blocks all cross-node raft connections in Docker bridge mode — hostname vs IP mismatch

3 participants