fix: improve reconnect recovery with async lock fix and immediate redo trigger by cxhello · Pull Request #323 · nacos-group/nacos-sdk-python

cxhello · 2026-05-21T02:38:59Z

What type of PR is this?

What does this PR do?

Two small fixes to improve reconnect recovery after gRPC connection drops:

Fix async lock bug in ConnectResetRequestHandler: The handler used synchronous with on an asyncio.Lock, which does not actually acquire the lock (asyncio.Lock.__enter__ is a no-op without await acquire()). Changed to async with to properly guard the status check and server switch when handling ConnectResetRequest from the server.
Trigger redo immediately on reconnect: After gRPC reconnection, on_connected() only set self._connected = True. The redo task polled _execute_redo_channel with a 30-second timeout, so instance re-registration could be delayed up to 30 seconds. Now on_connected() also calls the existing start_redo_task() method to signal the channel immediately, waking the redo task to re-register instances without delay.

Which issue(s) does this PR fix?

Related to alibaba/nacos#14106

How has this been tested?

Code review and static analysis of the async lock usage
Traced the redo task flow to confirm start_redo_task() → _execute_redo_channel.put(None) → _execute_redo_task wakes up → checks _connected (already True) → calls redo_task() to re-register instances
Verified no other synchronous lock usages remain in rpc_client.py

Checklist

Unit tests added/updated
Documentation updated (if needed)
No breaking changes (or described above)

…tHandler The handler used synchronous 'with' on an asyncio.Lock, which does not actually acquire the lock. Change to 'async with' to properly guard the status check and server switch when handling ConnectResetRequest. Signed-off-by: cxhello <caixiaohuichn@gmail.com>

…p to 30s After gRPC reconnection, on_connected() only set self._connected = True. The redo task polled _execute_redo_channel with a 30-second timeout, so instance re-registration could be delayed up to 30 seconds. Now on_connected() also calls start_redo_task() to signal the channel immediately, waking the redo task to re-register instances without delay. Signed-off-by: cxhello <caixiaohuichn@gmail.com>

cxhello added 2 commits May 21, 2026 10:36

cxhello requested a review from Sunrisea May 21, 2026 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve reconnect recovery with async lock fix and immediate redo trigger#323

fix: improve reconnect recovery with async lock fix and immediate redo trigger#323
cxhello wants to merge 2 commits into
nacos-group:masterfrom
cxhello:fix/improve-reconnect-recovery

cxhello commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cxhello commented May 21, 2026

What type of PR is this?

What does this PR do?

Which issue(s) does this PR fix?

How has this been tested?

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant