Reducing complexity of implementation in order to be able to add Atlas text search token based pagination by kbuma · Pull Request #1046 · materialsproject/api

kbuma · 2025-12-19T20:05:36Z

Summary

Major changes:

remove parallelization of server requests
re-implement handling of list criteria for parameters in endpoints that do not accept lists (this was tied into the parallelization code previously)
re-implement handling of list criteria that is too large for a single request (this was tied into the parallelization code previously)

tsmathis · 2025-12-19T21:07:42Z

mp_api/client/core/client.py

-                    slice_size = num_params_min_chunk or 1
+                # If successful, continue with normal pagination
+                total_data = {"data": []}  # type: dict
+                total_data["data"].extend(data["data"])


should favor .append(...) w/itertools.chain.from_iterable(...) at the end rather than repeated calls to .extend (especially since there is a loop later).

lines: 656, 701, 732, 806

tsmathis · 2025-12-19T21:22:57Z

mp_api/client/core/client.py

+                    for i in range(0, len(split_values), batch_size):
+                        batch = split_values[i : i + batch_size]


Might be a ways off for being the minimum py version, but in 3.12 itertools introduced batched. I've used the approximate implementation from the docs before:

def batched(iterable, n, *, strict=False): # batched('ABCDEFG', 2) → AB CD EF G if n < 1: raise ValueError('n must be at least one') iterator = iter(iterable) while batch := tuple(islice(iterator, n)): if strict and len(batch) != n: raise ValueError('batched(): incomplete batch') yield batch

tsmathis

Not really much to say on my end, I am be curious though about the performance/execution time of this implementation vs. the parallel approach.

mp_api/client/core/client.py

esoteric-ephemera · 2026-01-06T22:09:34Z

mp_api/client/core/client.py

-                r -= 1
+            except MPRestError as e:
+                # If we get 422 or 414 error, or 0 results for comma-separated params, split into batches
+                if "422" in str(e) or "414" in str(e) or "Got 0 results" in str(e):


any(trace in str(e) for trace in ("422","414","Got 0 results"))

esoteric-ephemera · 2026-01-06T22:13:55Z

mp_api/client/core/client.py

-            ]
+                    # Batch the split values to reduce number of requests
+                    # Use batches of up to 100 values to balance URL length and request count
+                    batch_size = min(100, max(1, len(split_values) // 10))


Should the batch size be chosen according to the limits we (may) impose on a Query? Or alternatively, should there be a check on the length of a batch after fixing the batch size? That way excessively long queries get rejected (e.g., I query for 1M task IDs, 100 batches would still give me an overly-long list of task IDs)

kbuma added 5 commits December 18, 2025 13:53

skip alloys test if lib is missing

4f44708

remove parallel calls in client

460196f

added back logic to split some requests but do not parallelize.

ca8ddd0

Merge branch 'main' into search-pagination

67f337a

lint

8af7860

kbuma requested review from esoteric-ephemera, tschaume and tsmathis December 19, 2025 20:10

tsmathis reviewed Dec 19, 2025

View reviewed changes

esoteric-ephemera reviewed Jan 6, 2026

View reviewed changes

mp_api/client/core/client.py Outdated Show resolved Hide resolved

esoteric-ephemera reviewed Jan 6, 2026

View reviewed changes

esoteric-ephemera added 2 commits January 21, 2026 12:17

Merge remote-tracking branch 'upstream/main' into search-pagination

d6ae5c2

resolve merge conflicts

28e74e1

esoteric-ephemera force-pushed the search-pagination branch from 77913b3 to 28e74e1 Compare January 21, 2026 21:02

merge conflicts

50ab68d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination#1046

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination#1046
kbuma wants to merge 8 commits intomaterialsproject:mainfrom
kbuma:search-pagination

kbuma commented Dec 19, 2025

Uh oh!

tsmathis Dec 19, 2025

Uh oh!

tsmathis Dec 19, 2025

Uh oh!

tsmathis left a comment

Uh oh!

Uh oh!

esoteric-ephemera Jan 6, 2026

Uh oh!

esoteric-ephemera Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		for i in range(0, len(split_values), batch_size):
		batch = split_values[i : i + batch_size]

Conversation

kbuma commented Dec 19, 2025

Summary

Uh oh!

tsmathis Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

tsmathis Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

tsmathis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

esoteric-ephemera Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

esoteric-ephemera Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants