Skip to content

Conversation

@gaozhangmin
Copy link
Contributor

@gaozhangmin gaozhangmin commented Dec 24, 2025

Motivation

During the shutdown process of Pulsar brokers, protocol handlers' channels are not properly closed, which allows new client requests to continue arriving even after the shutdown has been initiated. This can lead to:

  • Resource leaks as channels remain open during shutdown
  • Unexpected behavior where clients can still send requests to a shutting-down broker
  • Inconsistent shutdown behavior compared to the main broker channel management

The broker's closeAsync method properly handles closing all channels by calling channel.close() on all channels in listChannels. However, the same pattern is not applied to protocol handlers, leaving their channels open during the shutdown process.

Modifications

  • Added closeProtocolHandlerChannels() method to properly close all protocol handler channels before calling protocolHandler.close()
  • Modified the broker shutdown sequence to ensure protocol handler channels are closed before the protocol handler itself is closed
  • This ensures no new client connections can be established to protocol handlers during the shutdown process
  • Follows the same pattern as the existing broker channel closing mechanism in closeAsync

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as:

  • Existing broker shutdown tests will verify the proper shutdown sequence
  • Protocol handler lifecycle tests will ensure channels are properly closed
  • Connection management tests will verify no new connections are accepted during shutdown

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment - Changes broker shutdown behavior which affects deployment lifecycle

Documentation

[

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

gaozhangmin#14

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 24, 2025
@gaozhangmin gaozhangmin self-assigned this Dec 24, 2025
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocol handlers close before the ownership change in BrokerService#unloadNamespaceBundlesGracefully. Even if you close the channels before closing the protocol handler, the requests could still be sent to the broker and rejected. It's actually worse because no logs can be found at broker side.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I checked the logic again and found it's a bit different from broker's close logic,

The closing logic of broker is BrokerService#closeAsync:

  1. Unregister itself from metadata store and unload owned namespace bundles synchronously in unloadNamespaceBundlesGracefully
  2. Close built-in clients asynchronously
  3. Close event loops asynchronously
  4. After futures of 2 and 3 are done, close the listening channels

However, the closing logic of protocol handlers is a synchronous method ProtocolHandler#close.

Actually, before close is called, the protocol handler should not be treated as "closing". For example, the following implementation is legal:

    @Override
    public void close() {
        try {
            lookupClient.closeAsync().get();
        } catch (InterruptedException | ExecutionException e) {
            throw new RuntimeException(e);
        }
    }

The lookupClient field might wrap a client that established a connection to itself for topic lookup. If the listening channel is closed before that, the closeAsync future might have a chance to keep reconnecting and failing, the get method might be blocked for long.

The root cause is that the closing phase of protocol handlers is not well-defined. We should let the downstream to determine when to close these channels.

A better solution is to pass the corresponding listening channel to each protocol handler via an interface method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants