Optimize MapSet.symmetric_difference/2 when sizes mismatched by preciz · Pull Request #15471 · elixir-lang/elixir

preciz · 2026-06-13T21:00:23Z

By folding over the smaller set and using the larger set as the starting accumulator, the time complexity is reduced from O(large) to O(small) iterations. This provides a over 100x speedup when set sizes are mismatched.

When set sizes match the performance is the same as before.

Assisted-by: Antigravity CLI : Claude Opus 4.6 & Gemini Flash 3.5

By folding over the smaller set and using the larger set as the starting accumulator, the time complexity is reduced from O(large) to O(small) iterations. This provides a over 100x speedup when set sizes are mismatched.

josevalim · 2026-06-13T21:08:42Z

Hi @preciz, for documentation purposes, can you share the benchmarks you ran, alongside input sizes.

preciz · 2026-06-13T21:17:40Z

It's skewed towards that mismatch size case.

Mix.install([{:benchee, "~> 1.0"}])

defmodule Bench do
  def old_sym_diff(map_set1 = %MapSet{map: set1}, _map_set2 = %MapSet{map: set2}) do
    {small, large} = if :sets.size(set1) <= :sets.size(set2), do: {set1, set2}, else: {set2, set1}

    disjointer_fun = fn elem, {small, acc} ->
      if :sets.is_element(elem, small) do
        {:sets.del_element(elem, small), acc}
      else
        {small, [elem | acc]}
      end
    end

    {new_small, list} = :sets.fold(disjointer_fun, {small, []}, large)
    %{map_set1 | map: :sets.union(new_small, :sets.from_list(list, version: 2))}
  end

  def new_sym_diff(map_set1 = %MapSet{map: set1}, _map_set2 = %MapSet{map: set2}) do
    {small, large} = if :sets.size(set1) <= :sets.size(set2), do: {set1, set2}, else: {set2, set1}

    map =
      :sets.fold(
        fn elem, acc ->
          if :sets.is_element(elem, acc) do
            :sets.del_element(elem, acc)
          else
            :sets.add_element(elem, acc)
          end
        end,
        large,
        small
      )

    %{map_set1 | map: map}
  end
end

equal_small = MapSet.new(1..100)
equal_large = MapSet.new(101..200)

diff_huge1 = MapSet.new(1..100000)
diff_huge2 = MapSet.new(50000..150000)

small_1 = MapSet.new(1..10)
large_1 = MapSet.new(1..100000)

Benchee.run(
  %{
    "old" => fn {set1, set2} -> Bench.old_sym_diff(set1, set2) end,
    "new" => fn {set1, set2} -> Bench.new_sym_diff(set1, set2) end
  },
  inputs: %{
    "Equal Small (100)" => {equal_small, equal_large},
    "Huge Overlapping (100,000)" => {diff_huge1, diff_huge2},
    "Mismatched Sizes (10 vs 100,000)" => {small_1, large_1}
  }
)

On my noisy heat throttling machine:

##### With input Equal Small (100) #####
Name           ips        average  deviation         median         99th %
new       217.85 K        4.59 μs    ±65.02%        4.37 μs        9.13 μs
old       161.03 K        6.21 μs   ±110.95%        5.95 μs        9.81 μs

Comparison:
new       217.85 K
old       161.03 K - 1.35x slower +1.62 μs

##### With input Huge Overlapping (100,000) #####
Name           ips        average  deviation         median         99th %
old          65.69       15.22 ms    ±11.68%       14.78 ms       23.03 ms
new          63.87       15.66 ms    ±12.26%       14.83 ms       23.21 ms

Comparison:
old          65.69
new          63.87 - 1.03x slower +0.43 ms

##### With input Mismatched Sizes (10 vs 100,000) #####
Name           ips        average  deviation         median         99th %
new       973.09 K     0.00103 ms   ±875.87%     0.00094 ms     0.00169 ms
old       0.0777 K       12.88 ms     ±7.94%       12.70 ms       15.89 ms

Comparison:
new       973.09 K
old       0.0777 K - 12530.18x slower +12.88 ms

josevalim · 2026-06-13T21:43:16Z

I see. For both scenarios (different sizes and similar sizes), We should probably test the cases they have half in common, most in common, and nothing.

sabiwara · 2026-06-13T22:34:54Z

Yes, and also we should bench with memory_time as well.
In this case it seems we're mostly reducing memory usage, looks good.

Optimize MapSet.symmetric_difference/2 when sizes mismatched

a09720d

By folding over the smaller set and using the larger set as the starting accumulator, the time complexity is reduced from O(large) to O(small) iterations. This provides a over 100x speedup when set sizes are mismatched.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize MapSet.symmetric_difference/2 when sizes mismatched#15471

Optimize MapSet.symmetric_difference/2 when sizes mismatched#15471
preciz wants to merge 1 commit into
elixir-lang:mainfrom
preciz:optimize-mapset-symmetric-difference

preciz commented Jun 13, 2026 •

edited

Loading

Uh oh!

josevalim commented Jun 13, 2026

Uh oh!

preciz commented Jun 13, 2026 •

edited

Loading

Uh oh!

josevalim commented Jun 13, 2026

Uh oh!

sabiwara commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

preciz commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Jun 13, 2026

Uh oh!

preciz commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Jun 13, 2026

Uh oh!

sabiwara commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

preciz commented Jun 13, 2026 •

edited

Loading

preciz commented Jun 13, 2026 •

edited

Loading