-
Notifications
You must be signed in to change notification settings - Fork 50
(improvement)Optimize DCAware/RackAware RoundRobinPolicy with host distance caching #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This is interesting, my change has exposed this - Need to understand this better :-/ |
if host is None:
host = self._cluster.metadata.get_host_by_host_id(host_id)
if host and host.endpoint != endpoint:
log.debug("[control connection] Updating host ip from %s to %s for (%s)", host.endpoint, endpoint, host_id)
old_endpoint = host.endpoint
host.endpoint = endpoint
self._cluster.metadata.update_host(host, old_endpoint)
reconnector = host.get_and_set_reconnection_handler(None)
if reconnector:
reconnector.cancel()
self._cluster.on_down(host, is_host_addition=False, expect_host_to_be_down=True)So first we update the host with the new endpoint, then mark it as down? |
|
This fixes it for me: which also makes sense to me. |
76ee195 to
edab823
Compare
…tate Refactor `DCAwareRoundRobinPolicy` to simplify distance calculations and memory usage. Key changes: - Remove `_hosts_by_distance` and the complex caching of LOCAL hosts. - `distance()` now checks `host.datacenter` directly for LOCAL calculation, which is correct and static. - Only cache `_remote_hosts` to efficiently handle `used_hosts_per_remote_dc`. - Optimize control plane operations (`on_up`, `on_down`) to only rebuild the remote cache when necessary (when remote hosts change or local DC changes). - This removes the overhead of maintaining a redundant LOCAL cache and ensures correct behavior even if a local host is marked down. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
When a host changes its IP address, the driver previously updated the host endpoint to the new IP before calling on_down. This caused on_down to mistakenly target the new IP for connection cleanup. This change reorders the operations to ensure on_down cleans up the old IP's resources before the host object is updated and on_up is called for the new IP. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
|
I think CI failure is unrelated and is #359 |
edab823 to
1884f59
Compare
|
By using the (not amazing) benchmark from #653 , I got the following results: For master branch as a baseline: This branch (with just DC aware improvements): ** 433 -> 781 Kops/sec improvement ** With improvement to rack aware (on top of master), I got: ** 277 -> 324 Kops/sec improvement ** And on top of this branch: ** 277 -> 344 Kops/sec improvement ** And finally, for #650 : which kinda makes me suspect that branch is no good :-/ |
Sent separate PR - #654 |
|
With rack aware added (3rd commit), these are the current numbers: |
…distances Refactor `RackAwareRoundRobinPolicy` to simplify distance calculations and memory usage. Add self._remote_hosts to cache remote hosts distance, self._non_local_rack_hosts for non-local rack host distance. This improves the performance nicely, from ~290K query plans per second to ~600K query plans per second. - Only cache `_remote_hosts` to efficiently handle `used_hosts_per_remote_dc`. - Optimize control plane operations (`on_up`, `on_down`) to only rebuild the remote cache when necessary (when remote hosts change or local DC changes). Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
cc0204d to
6282e6f
Compare
Now that I also cache non-local hosts, not just remote (duh!), perf. is better: |
…y planning. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
|
Added for TokenAware as well some optimization (need to improve commit message). So reasonable improvement, at least in this micro-benchmark. |
Refactor
DCAwareRoundRobinPolicyto use a Copy-On-Write (COW) strategy for managing host distances.Key changes:
_remote_hoststo cacheREMOTEhosts, enabling O(1) distance lookups during query planning for distance.IGNOREDhosts do not need to be stored in the cache.For 'LOCAL' we do a simple comparison.
_refresh_remote_hoststo handle node changes.This is a different attempt from #650 to add caching to host distance to make query planning faster.
I'm not sure code-wise it's less code, but it certainly does make sense to do it per policy, not just for the TokenAware? Unsure.
The 1st commit is for DCAwareRoundRobinPolicy. Once it passes CI, I have an additional commit for RackAware one, and perhaps more are needed later.
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.