feat(sync[parallel]) Sync N repos in parallel via --jobs#546
Draft
feat(sync[parallel]) Sync N repos in parallel via --jobs#546
Conversation
…(8, CPU*2)) why: vcspull sync is sequential today -- a 50-repo workspace of already-up-to-date repos still pays N x ~0.7s of subprocess + network overhead. The plan-build phase already runs concurrent status checks under DEFAULT_PLAN_CONCURRENCY; generalise the execution phase to match so the batch's wallclock scales with the slowest few repos rather than the sum of all of them. ~5-10x wallclock speedup on real workspaces. what: - New --jobs N / -j N CLI flag + VCSPULL_JOBS env. Default min(8, CPU*2) -- same heuristic as DEFAULT_PLAN_CONCURRENCY but capped at 8 to stay polite to per-IP rate limits. Pass --jobs 1 to force the legacy serial UX. - _run_parallel_sync_loop_async: asyncio.Semaphore(jobs) + asyncio.as_completed over per-task daemon threads bridging libvcs's synchronous update_repo into the loop. Daemon threads (instead of asyncio.to_thread) avoid the default ThreadPoolExecutor atexit-join footgun documented in _sync_repo_with_watchdog: a wedged libvcs subprocess at interpreter shutdown would otherwise hang the process. - SyncStatusIndicator multi-slot mode: N spinner rows in a fixed active region; release_slot(final_line=...) queues the permanent line for the next render tick to scroll into scrollback above the active region (cargo / pueue trick: write above so a \\n from the viewport bottom scrolls one row out). slots=1 keeps today's single- spinner UX bit-for-bit. - The 3-line live-trail panel is disabled when --jobs > 1 (a shared deque with N concurrent writers reads as noise); each slot's most- recent libvcs progress message becomes the per-row suffix instead. - JSON / NDJSON events emit in completion order via asyncio.as_completed -- matches the streaming model, constant memory. - --exit-on-error in parallel mode sets a stop event so queued tasks short-circuit before starting, but in-flight tasks are allowed to complete so their output is captured. Mirrors the serial promise that the user sees results for repos that had already started. - Shared per-result emission (_emit_repo_result, _emit_worktree_results) factored out so serial and parallel paths agree on summary keys, event shape, and permanent-line formatting. - tests/conftest.py: autouse fixture pins VCSPULL_JOBS=1 in tests so pre-existing order-dependent --exit-on-error fixtures keep their serial ordering. New parallel-mode tests override the env var inside their own scope. - Tests cover slot allocation, oversubscription, multi-row render, pending-permanents scroll-out, _resolve_jobs precedence, and a 10-repos x 4-jobs / 20-repos x 3-jobs dispatcher pass that asserts the semaphore caps in-flight count.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vcspull syncexecution: defaultmin(8, CPU*2)workers, opt-out via--jobs 1.asyncio.Semaphore+asyncio.as_completed) over per-task daemon threads bridging libvcs's synchronousupdate_repo. Daemon threads avoid the defaultThreadPoolExecutoratexit-join hang we documented for_sync_repo_with_watchdog.✓ Synced ...lines scroll into scrollback above as repos finish (cargo / pueue trick).--jobs 1keeps today's single-spinner UX bit-for-bit.--jobs > 1(a shared deque with N concurrent writers reads as noise); each slot's most-recent libvcs progress chunk becomes its row suffix instead.--exit-on-errorin parallel mode short-circuits queued tasks but lets in-flight tasks complete so their output is captured.This is a follow-up to #fix-sync-hang-on-credential-prompts. Targeted at that branch so the panel + verbosity + watchdog work lands first.
Test plan
uv run --no-sync ruff check . --fix --show-fixesuv run --no-sync ruff format .uv run --no-sync mypy(no issues, 86 source files)uv run --no-sync py.test --reruns 0(1033 passed)cd docs && uv run --no-sync sphinx-build -b dirhtml . _build/html(build succeeded)vcspull sync --workspace ~/study/otel/ --all-- expect 4-8 spinner rows, permanents scroll above as repos finish, ~5x faster than--jobs 1.vcspull sync --workspace ~/study/otel/ --all --json | head-- events arrive as-completed.SIG_DFLre-raise (echo $? -> 130in bash).--exit-on-errorwith mixed good/bad repos: in-flight repos complete, queued ones short-circuit, summary + non-zero exit.Caveats (documented in code comments)
VCSPULL_JOBS=2for big bursts.asyncio.to_threaddeliberately not used -- its defaultThreadPoolExecutorhas the atexit-join footgun we already worked around in_sync_repo_with_watchdog.