feat: make async in-flight task cap configurable#699
Conversation
Review: PR #699 — feat: make async task cap configurableSummaryPromotes the previously hard-coded async scheduler submission cap to a public FindingsCorrectness — looks right
Behavior change worth calling outThe default jumps from 256 → 1024 (4×). Users who never set this knob will see significantly more concurrent task leases. The PR notes this as intended (so model-heavy runs don't get throttled by the old floor), but consider:
Not blocking, but a release note / changelog entry beyond the PR body would help downstream users notice. Naming / minor consistency
Test coverage — solid
One gap: there's no test that exercises the floor behavior — i.e., a user setting Style / project conventions
Security / performance
VerdictApprove-with-nits. The wiring is clean, tests cover the happy path and validation, and the layering respects the config → engine direction. Suggested non-blocking follow-ups:
|
Greptile SummaryThis PR exposes the async scheduler's in-flight task cap as a public
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/run_config.py | Adds max_in_flight_tasks field (default 1024, ge=1) to RunConfig; straightforward pydantic field addition with validation. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/scheduling/task_admission.py | Introduces DEFAULT_IN_FLIGHT_TASK_CAPACITY = 1024 and updates the default submission_capacity accordingly; no logic changes to admission control. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py | Renames max_submitted_tasks to max_in_flight_tasks, removes the never-used local resource limit entry, and simplifies queue_guard to max_in_flight_tasks * 4; changes are coherent and behaviorally safe. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Wires run_config.max_in_flight_tasks into the scheduler; removes the aggregate-based max_model_task_admission calculation in favour of setting it equal to max_in_flight_tasks. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py | Renames all max_submitted_tasks kwargs to max_in_flight_tasks throughout tests; adds new regression test for the simplified queue_guard formula. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py | Asserts that the model registry aggregate query is no longer called and that max_in_flight_tasks and max_model_task_admission are both set to the run-config value. |
| packages/data-designer/tests/interface/test_data_designer.py | Extends the persistence test to verify max_in_flight_tasks round-trips correctly through set_run_config. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[RunConfig\nmax_in_flight_tasks=1024] -->|read by| B[DatasetBuilder\n_prepare_async_run]
B -->|max_in_flight_tasks| C[AsyncTaskScheduler.__init__]
B -->|max_model_task_admission = max_in_flight_tasks| C
C -->|submission_capacity = max_in_flight_tasks| D[TaskAdmissionConfig]
C -->|resource_limits llm_wait = max_model_task_admission| D
C -->|model_group_limit_cap = max_model_task_admission| E[TaskSchedulingResolver]
C -->|queue_guard = max_in_flight_tasks x 4| F[_adaptive_row_group_block_reason]
D -->|controls| G[TaskAdmissionController\nleases up to submission_capacity tasks]
Reviews (4): Last reviewed commit: "feat: make async in-flight task cap conf..." | Re-trigger Greptile
1b75986 to
5981281
Compare
1bcb049 to
5ae93df
Compare
nabinchha
left a comment
There was a problem hiding this comment.
Thanks for putting this together, @eric-tramel — clean, well-scoped change.
Summary
Adds RunConfig.max_in_flight_tasks (default 1024, ge=1) so users can tune the async scheduler's task-lease capacity, raises the default from 256 to 1024, and wires the value end-to-end through DatasetBuilder into AsyncTaskScheduler (including the model-stage admission floor and capacity reporting). The implementation matches the stated intent in the PR description and keeps the engine/config layering intact — config grows a new field, engine consumes it.
Findings
Warnings — Worth addressing
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py:1064-1068 — max_model_task_admission floor can surprise users tuning down
- What: When a user explicitly lowers
max_in_flight_tasks(e.g.64), the floor still pinsmax_model_task_admissiontomax(DEFAULT_TASK_POOL_SIZE=1024, 64, 2 * aggregate)=1024. So thellm_waitresource keeps a 1024-slot ceiling even though the user asked for a much smaller cap. - Why: This is consistent with the pre-PR behavior (the floor was 256 before), so it isn't a regression — but combined with the new public knob, it could read as "I set the cap to 64, why is the LLM stage still letting through 1024?" The
localandsubmissionresources do drop with the user's value; onlyllm_waitkeeps the floor. The new docstring onmax_in_flight_tasksdoesn't hint at that asymmetry. - Suggestion: Either (a) drop the
DEFAULT_TASK_POOL_SIZEterm from themax(...)(usemax(max_in_flight_tasks, 2 * aggregate)) so the user's setting always governs, or (b) add a one-line note to themax_in_flight_tasksdocstring that the model-stage admission floor stays at the engine default. (a) is cleaner if there's no specific reason to keep the floor when the user has opted in to a smaller cap.
Suggestions — Take it or leave it
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py:80 — Two names for the same constant
- What:
DEFAULT_TASK_POOL_SIZEis now defined asDEFAULT_IN_FLIGHT_TASK_CAPACITY. Both names refer to the same1024value but live in two modules. - Why: It's preserved as an alias (likely for downstream callers and tests that import it), but two names for one concept costs a bit of reading time when navigating the scheduler.
- Suggestion: As a follow-up, consider collapsing to a single name (
DEFAULT_IN_FLIGHT_TASK_CAPACITYreads well in context) once you're comfortable touching the in-tree import sites. Not blocking — and worth doing separately so this PR stays focused.
packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py:194 — Test name vs. what it now asserts
- What:
test_prepare_async_run_enables_request_pressure_advisorynow also assertsmax_in_flight_tasks/max_model_task_admissionpropagation. - Why: Searching for "what tests cover the new knob" won't surface this test by name. Both assertions are valid checks of the same code path, but the test name only describes the first.
- Suggestion: Either split into a focused
test_prepare_async_run_propagates_max_in_flight_tasksor rename to something liketest_prepare_async_run_propagates_scheduler_kwargs. Tiny ergonomics nit.
packages/data-designer-config/src/data_designer/config/run_config.py:175-182 — Field(description=...) duplicates the Attributes: docstring
- What: The class docstring already has a full
max_in_flight_tasks:entry, andField(description=...)repeats a shorter version of the same text. - Why: Other simple numeric fields in
RunConfig(e.g.buffer_size,non_inference_max_parallel_workers) don't carry adescription=. The redundancy is harmless but slightly inconsistent with the surrounding pattern. - Suggestion: Either drop the
description=(keep justField(default=1024, ge=1)) or, if you'd rather keep schema-level descriptions, leave it as-is. Pure style call.
What Looks Good
- The new knob's docstring is the standout — it explicitly disambiguates
max_in_flight_tasksfrommax_parallel_requests, which is exactly the confusion this change could otherwise introduce. - Coverage is layered well: config-level validation (
test_run_config_*), interface persistence (test_run_config_setting_persists), and builder propagation (the integration test). Each layer asserts the right thing without overlap. - The capacity-source flip from
dataset_builder→run_configonsubmission_capacityis a small but accurate provenance change, and it's still type-safe underCapacityValueSource. - Renaming
max_submitted_tasks→max_in_flight_tasksis a real readability win — "submission" was just one phase, and the new name captures what the cap actually governs.
Verdict
Ship it (with nits) — the one Warning is non-blocking (a behavior nuance worth either tweaking or documenting), and the rest are pure style. Nothing critical here.
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
9c7d4d2 to
eccce53
Compare
📋 Summary
Adds a public
RunConfig.max_in_flight_taskssetting for the async scheduler's task-lease capacity and raises the default from 256 to 1024. The configured value is now the singular scheduler work cap: model-stage task admission and adaptive row-group queue backpressure follow it, while model API request concurrency remains controlled separately bymax_parallel_requestsand request admission.🔗 Related Issue
N/A
🔄 Changes
RunConfig.max_in_flight_taskswith a default of 1024 and validation that it is at least 1.AsyncTaskScheduler.max_in_flight_tasksdirectly, without the old default floor or aggregate request headroom.DEFAULT_IN_FLIGHT_TASK_CAPACITYfor this concept.max_in_flight_tasks, rather than an inflated model-stage admission value.run_config.🧪 Testing
make testpasses (not run; focused tests below)Focused checks run after rebasing onto latest
origin/main:uv run --group dev pytest packages/data-designer-config/tests/config/test_run_config.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py::test_run_config_default_non_inference_max_parallel_workers packages/data-designer/tests/interface/test_data_designer.py::test_run_config_setting_persists(109 passed)uv run ruff format --check ...on touched filesuv run ruff check --output-format=full ...on touched filesgit diff --check✅ Checklist