PYTHON-5767 - Finalize client backpressure implementation for phase 1…#2742
PYTHON-5767 - Finalize client backpressure implementation for phase 1…#2742NoahStapp wants to merge 8 commits intomongodb:backpressurefrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates PyMongo’s phase-1 client backpressure behavior and tests by replacing the previous adaptiveRetries boolean with new overload-related options (maxAdaptiveRetries, enableOverloadRetargeting) and reducing the default maximum overload retry count used by tests/specs.
Changes:
- Introduces/validates new URI + keyword options:
maxAdaptiveRetriesandenableOverloadRetargeting. - Updates retry/backpressure prose + unified JSON specs to use a lower retry limit (5 → 2) and adjusts expected command/event sequences accordingly.
- Updates unit/integration tests to assert parsing of new options and to use the new shared retry limit constant.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/uri_options/client-backpressure-options.json | URI option parsing tests updated for maxAdaptiveRetries and enableOverloadRetargeting. |
| test/transactions/unified/backpressure-retryable-writes.json | Unified spec expectations updated for reduced backpressure retry attempts. |
| test/transactions/unified/backpressure-retryable-reads.json | Unified spec expectations updated for reduced backpressure retry attempts. |
| test/transactions/unified/backpressure-retryable-commit.json | Unified spec expectations updated for reduced backpressure retry attempts. |
| test/transactions/unified/backpressure-retryable-abort.json | Unified spec expectations updated for reduced backpressure retry attempts. |
| test/test_client.py | Client option tests updated to assert max_adaptive_retries + enable_overload_retargeting. |
| test/test_client_backpressure.py | Backpressure tests updated to use MAX_ADAPTIVE_RETRIES and revised retry expectations. |
| test/client-backpressure/getMore-retried.json | Spec updated for reduced retry attempts and corresponding event sequences. |
| test/client-backpressure/backpressure-retry-max-attempts.json | Extensive spec updates to reflect reduced retry attempts across many operations. |
| test/client-backpressure/backpressure-retry-loop.json | Spec updated for reduced retry attempts and corresponding event sequences. |
| test/client-backpressure/backpressure-connection-checkin.json | Spec event expectations reduced to match new retry counts. |
| test/asynchronous/test_client.py | Async client option tests updated to assert max_adaptive_retries + enable_overload_retargeting. |
| test/asynchronous/test_client_backpressure.py | Async backpressure tests updated to use MAX_ADAPTIVE_RETRIES and revised retry expectations. |
| pymongo/common.py | Adds defaults/constants and validators for new overload-related options. |
| pymongo/client_options.py | Parses/stores new options and exposes new properties on ClientOptions. |
| pymongo/asynchronous/helpers.py | Updates _RetryPolicy defaults to use MAX_ADAPTIVE_RETRIES; disables adaptive token-bucket mode. |
| pymongo/asynchronous/mongo_client.py | Updates public docstring for new options and changes _RetryPolicy construction. |
| pymongo/synchronous/helpers.py | Sync mirror of async helper changes (synchro-generated). |
| pymongo/synchronous/mongo_client.py | Sync mirror of async client docstring/policy construction (synchro-generated). |
Comments suppressed due to low confidence (1)
pymongo/asynchronous/helpers.py:145
_RetryPolicy’s docstring describes an “adaptive retries” token-bucket mode, butadaptive_retryis now hard-coded toFalseand there’s no longer a constructor parameter or other code path that enables it. Either reintroduce a supported way to enable the token-bucket behavior (if still intended) or update/remove the docstring and related dead code to match current behavior.
class _RetryPolicy:
"""A retry limiter that performs exponential backoff with jitter.
When adaptive retries are enabled, retry attempts are limited by a token bucket to prevent overwhelming the server during
a prolonged outage or high load.
"""
def __init__(
self,
token_bucket: _TokenBucket,
attempts: int = MAX_ADAPTIVE_RETRIES,
backoff_initial: float = _BACKOFF_INITIAL,
backoff_max: float = _BACKOFF_MAX,
):
self.token_bucket = token_bucket
self.attempts = attempts
self.backoff_initial = backoff_initial
self.backoff_max = backoff_max
self.adaptive_retry = False
async def record_success(self, retry: bool) -> None:
"""Record a successful operation."""
if self.adaptive_retry:
await self.token_bucket.deposit(retry)
pymongo/asynchronous/mongo_client.py
Outdated
| self._retry_policy = _RetryPolicy( | ||
| _TokenBucket(), adaptive_retry=self._options.adaptive_retries | ||
| ) | ||
| self._retry_policy = _RetryPolicy(_TokenBucket()) |
There was a problem hiding this comment.
max_adaptive_retries / enable_overload_retargeting are documented and parsed into ClientOptions, but the client always builds _RetryPolicy(_TokenBucket()) with the default attempts and never consults these options. As a result, user-provided values (e.g. maxAdaptiveRetries=10) have no effect and the docstring is currently misleading; pass the configured values into _RetryPolicy (and wire enable_overload_retargeting into retry server selection) or remove/undocument the options until implemented.
| self._retry_policy = _RetryPolicy(_TokenBucket()) | |
| max_adaptive_retries = getattr(self._options, "max_adaptive_retries", None) | |
| enable_overload_retargeting = getattr( | |
| self._options, "enable_overload_retargeting", False | |
| ) | |
| self._retry_policy = _RetryPolicy( | |
| _TokenBucket(), | |
| max_adaptive_retries=max_adaptive_retries, | |
| enable_overload_retargeting=enable_overload_retargeting, | |
| ) |
| self.__server_monitoring_mode = options.get( | ||
| "servermonitoringmode", common.SERVER_MONITORING_MODE | ||
| ) | ||
| self.__adaptive_retries = ( | ||
| options.get("adaptive_retries", common.ADAPTIVE_RETRIES) | ||
| if "adaptive_retries" in options | ||
| else options.get("adaptiveretries", common.ADAPTIVE_RETRIES) | ||
| ) | ||
| self.__max_adaptive_retries = ( | ||
| options.get("max_adaptive_retries", common.MAX_ADAPTIVE_RETRIES) | ||
| if "max_adaptive_retries" in options | ||
| else options.get("maxadaptiveretries", common.MAX_ADAPTIVE_RETRIES) | ||
| ) | ||
| self.__enable_overload_retargeting = ( | ||
| options.get("enable_overload_retargeting", common.ENABLE_OVERLOAD_RETARGETING) | ||
| if "enable_overload_retargeting" in options | ||
| else options.get("enableoverloadretargeting", common.ENABLE_OVERLOAD_RETARGETING) | ||
| ) |
There was a problem hiding this comment.
ClientOptions still parses/stores adaptive_retries/adaptiveRetries, but the adaptive_retries property was removed and nothing in the codebase reads __adaptive_retries. This makes adaptiveRetries a silently ignored option and is also a backwards-incompatible change for users accessing client.options.adaptive_retries; consider keeping a deprecated alias property (or rejecting the option) to avoid silent no-ops/breakage.
test/test_client.py
Outdated
| # Assert that adaptive retries can be enabled through connection or client options. | ||
| c = self.simple_client(connect=False, adaptive_retries=True) | ||
| self.assertTrue(c.options.adaptive_retries) | ||
| c = self.simple_client(connect=False, max_adaptive_retries=10) | ||
| self.assertEqual(c.options.max_adaptive_retries, 10) | ||
|
|
||
| c = self.simple_client(connect=False, adaptiveRetries=True) | ||
| self.assertTrue(c.options.adaptive_retries) | ||
| c = self.simple_client(connect=False, maxAdaptiveRetries=10) | ||
| self.assertEqual(c.options.max_adaptive_retries, 10) |
There was a problem hiding this comment.
The comment says “adaptive retries can be enabled”, but this test is now configuring max_adaptive_retries. Update the wording to reflect the new option to avoid confusing future maintainers/readers.
test/asynchronous/test_client.py
Outdated
| # Assert that adaptive retries can be enabled through connection or client options. | ||
| c = self.simple_client(connect=False, adaptive_retries=True) | ||
| self.assertTrue(c.options.adaptive_retries) | ||
| c = self.simple_client(connect=False, max_adaptive_retries=10) | ||
| self.assertEqual(c.options.max_adaptive_retries, 10) | ||
|
|
||
| c = self.simple_client(connect=False, adaptiveRetries=True) | ||
| self.assertTrue(c.options.adaptive_retries) | ||
| c = self.simple_client(connect=False, maxAdaptiveRetries=10) | ||
| self.assertEqual(c.options.max_adaptive_retries, 10) |
There was a problem hiding this comment.
The comment says “adaptive retries can be enabled”, but this test is now configuring max_adaptive_retries. Update the wording to reflect the new option to avoid confusing future maintainers/readers.
| @@ -304,42 +262,8 @@ def test_03_overload_retries_limited(self): | |||
| self.assertIn("RetryableError", str(error.exception)) | |||
| self.assertIn("SystemOverloadedError", str(error.exception)) | |||
|
|
|||
| # 6. Assert that the total number of started commands is MAX_RETRIES + 1. | |||
| self.assertEqual(len(self.listener.started_events), _MAX_RETRIES + 1) | |||
|
|
|||
| @client_context.require_failCommand_appName | |||
| def test_04_adaptive_retries_limited_by_tokens(self): | |||
| # Drivers should test that when enabled, adaptive retries are limited by the number of tokens in the bucket. | |||
|
|
|||
| # 1. Let `client` be a `MongoClient` with adaptiveRetries=True. | |||
| client = self.rs_or_single_client(adaptive_retries=True, event_listeners=[self.listener]) | |||
| # 2. Set `client`'s retry token bucket to have 2 tokens. | |||
| client._retry_policy.token_bucket.tokens = 2 | |||
| # 3. Let `coll` be a collection. | |||
| coll = client.pymongo_test.coll | |||
|
|
|||
| # 4. Configure the following failpoint: | |||
| failpoint = { | |||
| "configureFailPoint": "failCommand", | |||
| "mode": {"times": 3}, | |||
| "data": { | |||
| "failCommands": ["find"], | |||
| "errorCode": 462, # IngressRequestRateLimitExceeded | |||
| "errorLabels": ["RetryableError", "SystemOverloadedError"], | |||
| }, | |||
| } | |||
|
|
|||
| # 5. Perform a find operation with `coll` that fails. | |||
| with self.fail_point(failpoint): | |||
| with self.assertRaises(PyMongoError) as error: | |||
| coll.find_one({}) | |||
|
|
|||
| # 6. Assert that the raised error contains both the `RetryableError` and `SystemOverLoadedError` error labels. | |||
| self.assertIn("RetryableError", str(error.exception)) | |||
| self.assertIn("SystemOverloadedError", str(error.exception)) | |||
|
|
|||
| # 7. Assert that the total number of started commands is 3: one for the initial attempt and two for the retries. | |||
| self.assertEqual(len(self.listener.started_events), 3) | |||
| # 6. Assert that the total number of started commands is MAX_ADAPTIVE_RETRIES + 1. | |||
| self.assertEqual(len(self.listener.started_events), MAX_ADAPTIVE_RETRIES + 1) | |||
There was a problem hiding this comment.
This test’s prose says overload errors are “retried a maximum of three times”, but the assertion is MAX_ADAPTIVE_RETRIES + 1 and MAX_ADAPTIVE_RETRIES is 2 (so 2 retries / 3 total attempts). Update the comment to match the implemented semantics (e.g., “retried a maximum of 2 times” or “attempted a maximum of 3 times”).
| @@ -304,44 +262,8 @@ async def test_03_overload_retries_limited(self): | |||
| self.assertIn("RetryableError", str(error.exception)) | |||
| self.assertIn("SystemOverloadedError", str(error.exception)) | |||
|
|
|||
| # 6. Assert that the total number of started commands is MAX_RETRIES + 1. | |||
| self.assertEqual(len(self.listener.started_events), _MAX_RETRIES + 1) | |||
|
|
|||
| @async_client_context.require_failCommand_appName | |||
| async def test_04_adaptive_retries_limited_by_tokens(self): | |||
| # Drivers should test that when enabled, adaptive retries are limited by the number of tokens in the bucket. | |||
|
|
|||
| # 1. Let `client` be a `MongoClient` with adaptiveRetries=True. | |||
| client = await self.async_rs_or_single_client( | |||
| adaptive_retries=True, event_listeners=[self.listener] | |||
| ) | |||
| # 2. Set `client`'s retry token bucket to have 2 tokens. | |||
| client._retry_policy.token_bucket.tokens = 2 | |||
| # 3. Let `coll` be a collection. | |||
| coll = client.pymongo_test.coll | |||
|
|
|||
| # 4. Configure the following failpoint: | |||
| failpoint = { | |||
| "configureFailPoint": "failCommand", | |||
| "mode": {"times": 3}, | |||
| "data": { | |||
| "failCommands": ["find"], | |||
| "errorCode": 462, # IngressRequestRateLimitExceeded | |||
| "errorLabels": ["RetryableError", "SystemOverloadedError"], | |||
| }, | |||
| } | |||
|
|
|||
| # 5. Perform a find operation with `coll` that fails. | |||
| async with self.fail_point(failpoint): | |||
| with self.assertRaises(PyMongoError) as error: | |||
| await coll.find_one({}) | |||
|
|
|||
| # 6. Assert that the raised error contains both the `RetryableError` and `SystemOverLoadedError` error labels. | |||
| self.assertIn("RetryableError", str(error.exception)) | |||
| self.assertIn("SystemOverloadedError", str(error.exception)) | |||
|
|
|||
| # 7. Assert that the total number of started commands is 3: one for the initial attempt and two for the retries. | |||
| self.assertEqual(len(self.listener.started_events), 3) | |||
| # 6. Assert that the total number of started commands is MAX_ADAPTIVE_RETRIES + 1. | |||
| self.assertEqual(len(self.listener.started_events), MAX_ADAPTIVE_RETRIES + 1) | |||
There was a problem hiding this comment.
This test’s prose says overload errors are “retried a maximum of three times”, but the assertion is MAX_ADAPTIVE_RETRIES + 1 and MAX_ADAPTIVE_RETRIES is 2 (so 2 retries / 3 total attempts). Update the comment to match the implemented semantics (e.g., “retried a maximum of 2 times” or “attempted a maximum of 3 times”).
pymongo/common.py
Outdated
| "adaptiveretries": validate_boolean_or_string, | ||
| "maxadaptiveretries": validate_non_negative_integer, | ||
| "enableoverloadretargeting": validate_boolean_or_string, |
There was a problem hiding this comment.
adaptiveretries is still accepted/validated as a URI and keyword option, but there is no longer any public ClientOptions.adaptive_retries accessor and the stored value is unused. Consider removing it from the validator maps (or mapping it to the new overload retry settings) so the driver doesn’t accept a configuration option that has no effect.
| "adaptiveretries": validate_boolean_or_string, | |
| "maxadaptiveretries": validate_non_negative_integer, | |
| "enableoverloadretargeting": validate_boolean_or_string, |
| | **Overload retry options:** | ||
|
|
||
| - `adaptive_retries`: (boolean) Whether the adaptive retry mechanism is enabled for this client. | ||
| If enabled, server overload errors will use a token-bucket based system to mitigate further overload. | ||
| - `max_adaptive_retries`: (int) How many retries to allow for overload errors. Defaults to ``2``. | ||
| - `enable_overload_retargeting`: (boolean) Whether overload retargeting is enabled for this client. | ||
| If enabled, server overload errors will cause retry attempts to select a server that has not yet returned an overload error, if possible. | ||
| Defaults to ``False``. | ||
|
|
||
| .. seealso:: The MongoDB documentation on `connections <https://dochub.mongodb.org/core/connections>`_. | ||
|
|
||
| .. versionchanged:: 4.17 | ||
| Added the ``adaptive_retries`` URI and keyword argument. | ||
| Added the ``max_adaptive_retries`` and ``enable_overload_retargeting`` URI and keyword arguments. |
There was a problem hiding this comment.
The PR description template is still unfilled (no summary of behavior changes, migration notes, or test plan). Given this PR changes retry semantics and public connection options, please update the PR description with the intended defaults, compatibility story (adaptiveRetries vs maxAdaptiveRetries), and how the change was validated.
| @@ -34,7 +37,7 @@ | |||
|
|
|||
| import pymongo | |||
| from pymongo.asynchronous import helpers | |||
| from pymongo.asynchronous.helpers import _MAX_RETRIES, _RetryPolicy, _TokenBucket | |||
| from pymongo.asynchronous.helpers import _RetryPolicy | |||
| from pymongo.errors import OperationFailure, PyMongoError | |||
There was a problem hiding this comment.
ReadPreference, helpers, and _RetryPolicy are imported but never used in this file. Please remove the unused imports to avoid linter failures (especially since test/asynchronous/*.py is not covered by the test/*.py Ruff per-file ignore list).
test/test_client_backpressure.py
Outdated
| @@ -35,7 +38,7 @@ | |||
| import pymongo | |||
| from pymongo.errors import OperationFailure, PyMongoError | |||
| from pymongo.synchronous import helpers | |||
| from pymongo.synchronous.helpers import _MAX_RETRIES, _RetryPolicy, _TokenBucket | |||
| from pymongo.synchronous.helpers import _RetryPolicy | |||
|
|
|||
There was a problem hiding this comment.
ReadPreference, helpers, and _RetryPolicy are imported but never used. Even though test/*.py ignores F401, removing these unused imports would keep the test module cleaner and avoid confusion about what’s being exercised.
… rollout
[JIRA TICKET]
Changes in this PR
Test Plan
Checklist
Checklist for Author
Checklist for Reviewer