Skip to content

Cap jixia concurrency to avoid OOM in load step#14

Merged
Gabrielebattimelli merged 1 commit into
mainfrom
add-agent-skill
Jun 8, 2026
Merged

Cap jixia concurrency to avoid OOM in load step#14
Gabrielebattimelli merged 1 commit into
mainfrom
add-agent-skill

Conversation

@Gabrielebattimelli

Copy link
Copy Markdown
Member

Why

With the toolchain fix (#13), jixia indexes correctly, but the Load jixia data into PostgreSQL step was cancelled twice, deterministically at exactly 85 modules (~12 min in), while processing the most Mathlib-heavy modules (Relativity/Tensors, PauliMatrices).

Signature: runner lost, post-steps skipped, no error logged, 0 jixia subprocess failures. That's a silent OOM kill — jixia_py defaults to CPUs + 4 parallel workers (8 on the runner), each loading ~2-3 GB of Mathlib, blowing past the 16 GB runner.

What

  • Make jixia worker count configurable via JIXIA_MAX_WORKERS.
  • Set JIXIA_MAX_WORKERS: '2' in the workflow so peak memory stays well under 16 GB.

Trade-off: the load step runs longer (fewer parallel workers) but stays within the 6-hour job limit. Local runs are unaffected (defaults to full parallelism when the env var is unset).

The 'Load jixia data into PostgreSQL' step was OOM-killed deterministically
~85 modules in (runner lost, post-steps skipped, no error logged), while
processing the most Mathlib-heavy modules. jixia_py defaults to CPUs+4
parallel workers; each loads ~2-3 GB of Mathlib, exceeding the 16 GB runner.

Make the worker count configurable via JIXIA_MAX_WORKERS and set it to 2
in the weekly index workflow.
Copilot AI review requested due to automatic review settings June 8, 2026 11:10
@Gabrielebattimelli Gabrielebattimelli merged commit d079277 into main Jun 8, 2026
2 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent CI out-of-memory kills during the “Load jixia data into PostgreSQL” step by making jixia’s parallelism configurable and capping it in the weekly indexing workflow.

Changes:

  • Read JIXIA_MAX_WORKERS in database/jixia_db.py and pass it to the jixia batch runner.
  • Set JIXIA_MAX_WORKERS: '2' in the weekly GitHub Actions workflow environment to reduce peak memory usage.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
database/jixia_db.py Adds env-driven worker cap and forwards it into the jixia batch execution call.
.github/workflows/weekly-index.yml Caps jixia worker concurrency in CI to avoid runner OOM during the load step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread database/jixia_db.py
Comment on lines +182 to +183
max_workers_env = os.environ.get("JIXIA_MAX_WORKERS")
max_workers = int(max_workers_env) if max_workers_env else None
Comment thread database/jixia_db.py
Comment on lines 186 to 191
results = project.batch_run_jixia(
base_dir=d,
prefixes=prefixes,
plugins=["module", "declaration", "symbol"],
max_workers=max_workers,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants