Optimize MapSet.symmetric_difference/2 when sizes mismatched#15471
Open
preciz wants to merge 1 commit into
Open
Optimize MapSet.symmetric_difference/2 when sizes mismatched#15471preciz wants to merge 1 commit into
preciz wants to merge 1 commit into
Conversation
By folding over the smaller set and using the larger set as the starting accumulator, the time complexity is reduced from O(large) to O(small) iterations. This provides a over 100x speedup when set sizes are mismatched.
Member
|
Hi @preciz, for documentation purposes, can you share the benchmarks you ran, alongside input sizes. |
Contributor
Author
|
It's skewed towards that mismatch size case. Mix.install([{:benchee, "~> 1.0"}])
defmodule Bench do
def old_sym_diff(map_set1 = %MapSet{map: set1}, _map_set2 = %MapSet{map: set2}) do
{small, large} = if :sets.size(set1) <= :sets.size(set2), do: {set1, set2}, else: {set2, set1}
disjointer_fun = fn elem, {small, acc} ->
if :sets.is_element(elem, small) do
{:sets.del_element(elem, small), acc}
else
{small, [elem | acc]}
end
end
{new_small, list} = :sets.fold(disjointer_fun, {small, []}, large)
%{map_set1 | map: :sets.union(new_small, :sets.from_list(list, version: 2))}
end
def new_sym_diff(map_set1 = %MapSet{map: set1}, _map_set2 = %MapSet{map: set2}) do
{small, large} = if :sets.size(set1) <= :sets.size(set2), do: {set1, set2}, else: {set2, set1}
map =
:sets.fold(
fn elem, acc ->
if :sets.is_element(elem, acc) do
:sets.del_element(elem, acc)
else
:sets.add_element(elem, acc)
end
end,
large,
small
)
%{map_set1 | map: map}
end
end
equal_small = MapSet.new(1..100)
equal_large = MapSet.new(101..200)
diff_huge1 = MapSet.new(1..100000)
diff_huge2 = MapSet.new(50000..150000)
small_1 = MapSet.new(1..10)
large_1 = MapSet.new(1..100000)
Benchee.run(
%{
"old" => fn {set1, set2} -> Bench.old_sym_diff(set1, set2) end,
"new" => fn {set1, set2} -> Bench.new_sym_diff(set1, set2) end
},
inputs: %{
"Equal Small (100)" => {equal_small, equal_large},
"Huge Overlapping (100,000)" => {diff_huge1, diff_huge2},
"Mismatched Sizes (10 vs 100,000)" => {small_1, large_1}
}
)On my noisy heat throttling machine: ##### With input Equal Small (100) #####
Name ips average deviation median 99th %
new 217.85 K 4.59 μs ±65.02% 4.37 μs 9.13 μs
old 161.03 K 6.21 μs ±110.95% 5.95 μs 9.81 μs
Comparison:
new 217.85 K
old 161.03 K - 1.35x slower +1.62 μs
##### With input Huge Overlapping (100,000) #####
Name ips average deviation median 99th %
old 65.69 15.22 ms ±11.68% 14.78 ms 23.03 ms
new 63.87 15.66 ms ±12.26% 14.83 ms 23.21 ms
Comparison:
old 65.69
new 63.87 - 1.03x slower +0.43 ms
##### With input Mismatched Sizes (10 vs 100,000) #####
Name ips average deviation median 99th %
new 973.09 K 0.00103 ms ±875.87% 0.00094 ms 0.00169 ms
old 0.0777 K 12.88 ms ±7.94% 12.70 ms 15.89 ms
Comparison:
new 973.09 K
old 0.0777 K - 12530.18x slower +12.88 ms |
Member
|
I see. For both scenarios (different sizes and similar sizes), We should probably test the cases they have half in common, most in common, and nothing. |
Contributor
|
Yes, and also we should bench with |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
By folding over the smaller set and using the larger set as the starting accumulator, the time complexity is reduced from O(large) to O(small) iterations. This provides a over 100x speedup when set sizes are mismatched.
When set sizes match the performance is the same as before.
Assisted-by: Antigravity CLI : Claude Opus 4.6 & Gemini Flash 3.5