GEOPY-2781: Inversion stalls on tiling for large problems during redistribution of clusters by domfournier · Pull Request #365 · MiraGeoscience/simpeg-drivers

domfournier · 2026-03-24T22:25:54Z

GEOPY-2781 - Inversion stalls on tiling for large problems during redistribution of clusters

Copilot

Pull request overview

This PR updates the location-tiling behavior used to partition survey locations into tiles, apparently to remove the previous “even population” rebalancing step (which relied on linear_sum_assignment) and to adjust tests accordingly.

Changes:

Simplifies tile_locations() to always use raw KMeans cluster labels (removing the redistribution/balancing step).
Disables the tile-population balancing test by commenting it out and adding TODO notes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`tests/locations_test.py`	Comments out the population-balancing test for `tile_locations()` and adds TODO notes about a future scalable balancing approach.
`simpeg_drivers/utils/nested.py`	Removes the Hungarian-assignment-based rebalancing logic; `tile_locations()` now returns KMeans labels directly.

Comments suppressed due to low confidence (1)

simpeg_drivers/utils/nested.py:545

When sorting is provided, grid_locs is permuted before fitting KMeans, but the returned tile indices are positions in that permuted array. Downstream slicing (e.g., create_survey() filters survey.ordering[:, 2] against the provided indices) expects receiver IDs in the original indexing used by ordering (often the geoh5/receiver index, not the permuted position). Please map the clustered indices back through sorting before returning (or avoid permuting grid_locs and instead pass weights/ordering differently), so tiles reference the same index space as survey.ordering.

    cluster_id = kmeans.labels_

    tiles = []
    for tid in set(cluster_id):
        tiles += [np.where(cluster_id == tid)[0]]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T22:31:12Z

simpeg_drivers/utils/nested.py


        kmeans = KMeans(n_clusters=n_tiles, random_state=0, n_init="auto")
-        cluster_size = int(np.ceil(grid_locs.shape[0] / n_tiles))
        kmeans.fit(grid_locs)

-    if labels is not None:
-        cluster_id = kmeans.labels_
-    else:
-        # Redistribute cluster centers to even out the number of points
-        centers = kmeans.cluster_centers_
-        centers = (
-            centers.reshape(-1, 1, grid_locs.shape[1])
-            .repeat(cluster_size, 1)
-            .reshape(-1, grid_locs.shape[1])
-        )
-        distance_matrix = cdist(grid_locs, centers)
-        cluster_id = linear_sum_assignment(distance_matrix)[1] // cluster_size
+    cluster_id = kmeans.labels_


This change removes the linear_sum_assignment / cdist balancing logic, but the module still imports linear_sum_assignment and cdist at the top-level. If this repo enforces flake8/pylint for unused imports, this will start failing CI; please remove the unused imports (or keep the balancing implementation behind an optional flag).

Copilot · 2026-03-24T22:31:12Z

tests/locations_test.py

+#         )
+#         tiles = tile_locations(pts.vertices[:, :2], n_tiles=8)
+#
+#         values = np.zeros(pts.n_vertices)
+#         pop = []
+#         for ind, tile in enumerate(tiles):
+#             values[tile] = ind
+#             pop.append(len(tile))
+#
+#         pts.add_data(
+#             {
+#                 "values": {
+#                     "values": values,
+#                 }
+#             }
+#         )
+#         assert np.std(pop) / np.mean(pop) < 0.02, (
+#             "Population of tiles are not almost equal {}."
+#         )




The test_tile_locations test has been fully commented out, which removes coverage for tile_locations() when labels is not provided (and makes it easy to forget the intended behavior). Instead of leaving a large commented block, consider keeping the test enabled but updating the assertion to match the new intended tiling guarantees (e.g., all indices are covered exactly once and tiles are reasonably balanced), or mark it @pytest.mark.xfail/skip with a reason until an alternative algorithm is implemented.

Suggested change

# )

# tiles = tile_locations(pts.vertices[:, :2], n_tiles=8)

#

# values = np.zeros(pts.n_vertices)

# pop = []

# for ind, tile in enumerate(tiles):

# values[tile] = ind

# pop.append(len(tile))

#

# pts.add_data(

# {

# "values": {

# "values": values,

# }

# }

# )

# assert np.std(pop) / np.mean(pop) < 0.02, (

# "Population of tiles are not almost equal {}."

# )

def test_tile_locations():

n_points = 1000

rng = np.random.default_rng(0)

locations = rng.standard_normal((n_points, 2))

tiles = tile_locations(locations, n_tiles=8)

# All indices should be covered exactly once across tiles

all_indices = np.concatenate(tiles)

assert np.array_equal(np.sort(all_indices), np.arange(n_points))

# Tiles should be reasonably balanced in population

pop = np.array([len(tile) for tile in tiles])

assert pop.min() > 0

assert np.std(pop) / np.mean(pop) < 0.5

codecov · 2026-03-24T22:35:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.75%. Comparing base (a9a6e21) to head (b6c9c28).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #365      +/-   ##
===========================================
- Coverage    90.78%   90.75%   -0.03%     
===========================================
  Files          112      112              
  Lines         6389     6383       -6     
  Branches       787      786       -1     
===========================================
- Hits          5800     5793       -7     
  Misses         405      405              
- Partials       184      185       +1

Files with missing lines	Coverage Δ
simpeg_drivers/utils/nested.py	`95.28% <100.00%> (-0.13%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

domfournier added 3 commits March 23, 2026 16:25

Temnporary commenting out of linear_sum_assignment

f0f921f

Remove commented code

3004630

Remove test with TODO

b6c9c28

Copilot AI review requested due to automatic review settings March 24, 2026 22:25

github-actions bot changed the title ~~GEOPY-2781~~ GEOPY-2781: Inversion stalls on tiling for large problems during redistribution of clusters Mar 24, 2026

Copilot started reviewing on behalf of domfournier March 24, 2026 22:26 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Merge branch 'develop' into GEOPY-2781

40587dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEOPY-2781: Inversion stalls on tiling for large problems during redistribution of clusters#365

GEOPY-2781: Inversion stalls on tiling for large problems during redistribution of clusters#365
domfournier wants to merge 4 commits intodevelopfrom
GEOPY-2781

domfournier commented Mar 24, 2026 •

edited by github-actions bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-#         )
-#         tiles = tile_locations(pts.vertices[:, :2], n_tiles=8)
-#
-#         values = np.zeros(pts.n_vertices)
-#         pop = []
-#         for ind, tile in enumerate(tiles):
-#             values[tile] = ind
-#             pop.append(len(tile))
-#
-#         pts.add_data(
-#             {
-#                 "values": {
-#                     "values": values,
-#                 }
-#             }
-#         )
-#         assert np.std(pop) / np.mean(pop) < 0.02, (
-#             "Population of tiles are not almost equal {}."
-#         )
+def test_tile_locations():
+    n_points = 1000
+    rng = np.random.default_rng(0)
+    locations = rng.standard_normal((n_points, 2))
+    tiles = tile_locations(locations, n_tiles=8)
+    # All indices should be covered exactly once across tiles
+    all_indices = np.concatenate(tiles)
+    assert np.array_equal(np.sort(all_indices), np.arange(n_points))
+    # Tiles should be reasonably balanced in population
+    pop = np.array([len(tile) for tile in tiles])
+    assert pop.min() > 0
+    assert np.std(pop) / np.mean(pop) < 0.5

Conversation

domfournier commented Mar 24, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

domfournier commented Mar 24, 2026 •

edited by github-actions bot

Loading