Skip to content

[ISSUE #10405] Fix orderly consumer retry message routed to wrong broker via TBW102 fallback#10408

Open
qianye1001 wants to merge 2 commits into
apache:developfrom
qianye1001:fix/issue-10405
Open

[ISSUE #10405] Fix orderly consumer retry message routed to wrong broker via TBW102 fallback#10408
qianye1001 wants to merge 2 commits into
apache:developfrom
qianye1001:fix/issue-10405

Conversation

@qianye1001

Copy link
Copy Markdown
Contributor

Summary

  • Fix ConsumeMessageOrderlyService.sendMessageBack() and ConsumeMessagePopOrderlyService.sendMessageBack() to route retry messages to brokers that actually host the original topic, instead of falling back to TBW102 which routes to all brokers in the cluster.

Root Cause

When an orderly consumer's reconsume times reach maxReconsumeTimes, sendMessageBack() sends the retry message via getDefaultMQProducer().send(newMsg) — a normal producer send. Since the %RETRY% topic doesn't exist yet, the producer falls back to TBW102 (a system topic present on ALL brokers) for broker selection. This causes the %RETRY% topic to be auto-created on brokers that don't host the original topic.

Contrast with concurrent consumers: ConsumeMessageConcurrentlyService uses consumerSendMsgBack which targets msg.getBrokerName() directly — no TBW102 fallback.

Fix

Added trySendToTopicBroker() to both orderly service classes. It uses the consumer's existing topicSubscribeInfoTable (route info for the original topic) to select candidate brokers:

  1. Prefer original broker — the broker where the message was consumed
  2. Failover to other brokers hosting the topic — if the original broker is down
  3. Final fallback to default send — only when route info is unavailable (should not happen in practice)

Impact

Scenario Before After
Original broker alive May route to any broker (TBW102) Routes to original broker ✓
Original broker down Routes to any broker Failover to other brokers hosting the topic ✓
All topic brokers down Routes to any broker Same (fallback)

No behavioral change for brokers that DO host the original topic. Backward compatible.

Test plan

  • Verify trySendToTopicBroker prefers the original broker when available
  • Verify failover to another broker when original broker send fails
  • Verify fallback to default send when route info is empty
  • Integration test: proxy cluster mode with multi-broker, orderly consumer hitting maxReconsumeTimes — %RETRY% should only appear on brokers hosting the original topic

Closes #10405

🤖 Generated with Claude Code

qianye1001 and others added 2 commits May 29, 2026 16:37
…ng broker via TBW102 fallback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 8.33333% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.92%. Comparing base (a705dbc) to head (3692d1b).

Files with missing lines Patch % Lines
...nt/impl/consumer/ConsumeMessageOrderlyService.java 0.00% 24 Missing ⚠️
...impl/consumer/ConsumeMessagePopOrderlyService.java 16.66% 18 Missing and 2 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10408      +/-   ##
=============================================
- Coverage      47.99%   47.92%   -0.07%     
+ Complexity     13271    13251      -20     
=============================================
  Files           1376     1376              
  Lines         100536   100582      +46     
  Branches       12983    12995      +12     
=============================================
- Hits           48249    48201      -48     
- Misses         46355    46439      +84     
- Partials        5932     5942      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oss-sentinel-ai oss-sentinel-ai left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Approved ✅

PR: #10408 — Fix orderly consumer retry message routed to wrong broker via TBW102 fallback
Type: Bug fix (2 files, +74/-2)

Assessment

Well-documented fix for a real routing issue in orderly consumers. The root cause analysis is clear:

Problem: When orderly consumers hit maxReconsumeTimes, retry messages were routed via getDefaultMQProducer().send() which falls back to TBW102 (present on ALL brokers), causing %RETRY% topics to be auto-created on brokers that don't host the original topic.

Solution: Added trySendToTopicBroker() to both ConsumeMessageOrderlyService and ConsumeMessagePopOrderlyService:

  1. Prefer original broker (where message was consumed)
  2. Failover to other brokers hosting the topic
  3. Final fallback to default send (backward compatible)

Verdict

✅ Clean, focused fix with proper failover logic. Consistent with how ConsumeMessageConcurrentlyService handles this case. Fixes #10405.


🤖 Automated review by oss-sentinel-ai

@RockteMQ-AI

Copy link
Copy Markdown

Review by github-manager-bot

Summary

Fixes orderly consumer retry message routing by sending to brokers that host the original topic instead of falling back to TBW102, which routes to all brokers.

Findings

  • [Warning] ConsumeMessageOrderlyService.java:414 and ConsumeMessagePopOrderlyService.java:328 — trySendToTopicBroker modifies retryMsg state on each iteration: send(retryMsg, retryMQ) may internally set properties on the message (e.g., setDelayTimeLevel or other fields). If the first broker send fails and the loop retries on a second broker, the message object may be in a partially-modified state from the failed attempt. This depends on the producer implementation but is worth verifying.
  • [Warning] ConsumeMessageOrderlyService.java:406 and ConsumeMessagePopOrderlyService.java:320 — The table reference is obtained once but getTopicSubscribeInfoTable() returns a live concurrent map. The entry for originalMsg.getTopic() could be removed between the null/empty check and iteration. This is a minor race — the worst case is a ConcurrentModificationException or empty iteration, which would fall through to the default send. Low risk but worth noting.
  • [Info] ConsumeMessageOrderlyService.java:420 and ConsumeMessagePopOrderlyService.java:334 — new MessageQueue(retryMsg.getTopic(), brokerName, 0) uses queueId 0. For retry topics, this is fine since retry topics typically have a single queue, but the hardcoded 0 could be confusing. A comment or constant would clarify intent.
  • [Info] Both files — The trySendToTopicBroker method is duplicated verbatim across the two classes. Consider extracting it to a shared utility or a base class to reduce code duplication.
  • [Info] Both files — No unit tests are included. The PR description lists test plan items as unchecked. Given the routing logic is safety-critical for message delivery, tests covering: (1) original broker preferred, (2) failover to other broker, (3) empty route fallback, are strongly recommended.

Suggestions

  1. Verify that DefaultMQProducer.send(msg, mq) doesn't mutate the Message object in a way that affects retries on subsequent brokers. If it does, consider creating a fresh copy per attempt or resetting any modified state.
  2. Extract the identical trySendToTopicBroker into a shared method (e.g., in a utility class or via a common base) to avoid the exact duplication across the two service classes.
  3. Add unit tests — at minimum, verify the broker ordering preference and fallback behavior. The logic is subtle and hard to verify manually in integration.

Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants