-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I'm opening this RFC to address a fundamental conflict in our dependency management, which leads to a poor user experience and maintenance challenges. There was some internal discussion in June but I forgot to follow up then.
The Conflict: "Live at the Head" vs. User Reproducibility
We have two core, conflicting goals:
- "Live at the Head" (Maintainer Goal): The cookbook should act as CI for the lab's packages, ensuring recipes work with the latest versions of
chemiscope,metatensor, etc. - Reproducibility (User Goal): A user who downloads a recipe
.zipmust get a working environment. They should not get "weird shit" just because a new, unrelated package broke a specific recipe.
Our current noxfile.py setup fails at both:
- It fails reproducibility: The
examplesession installs dependencies (sphinx-gallery,chemiscope, etc.) after creating the environment fromenvironment.yml. This "clobbers" any pins in theenvironment.yml(as Guillaume noted, we are resolving twice). Theenvironment.ymlin the user-facing.zipis therefore incomplete and misleading. - It fails "at the head" testing: By silently clobbering pins, we don't get a clear signal when a recipe becomes truly incompatible with new packages. We just force an update. This leads to recipe rot (like Update or remove the periodic-hamiltonian example #180, which is pinned to old packages) and doesn't actually guarantee the recipe logic works with the newest versions.
The Proposal: A Dual Workflow with conda-lock
I propose we use lockfiles to separate these two goals. We will commit these files for each recipe.
This creates two distinct workflows: one for the user (stability) and one for CI (liveness).
1. The User Workflow (Guaranteed Reproducibility)
- The
conda-lock.ymlfile for each recipe will be added to the downloadable.zipfile (viapost_process_galleryinnoxfile.py). - The
INSTALLING.rstfile (also in the zip) will be updated to instruct users to create their environment from the lockfile:
conda env create -f conda-lock.yml - The
noxexamplesession itself will also be modified to install directly from theconda-lock.yml.
Benefit: The user always gets a 100% reproducible, last-known-good environment. No more "clobbering," no more "it works on CI but not for me."
2. The Maintainer Workflow (Automated "Live at the Head" Testing)
We should create a new, separate CI job (e.g., run weekly and on-demand) to handle "living at the head." This job will:
- Attempt to Resolve "at Head": For each recipe, it will delete the existing
conda-lock.yml. - Solve: It will then try to solve a complete environment from scratch, using the recipe's
environment.yml+ the "gallery" dependencies (sphinx-gallery,chemiscope, etc.). This tests the recipe against the latest available packages. - On Success: If it solves successfully, it generates a new
conda-lock.ymland opens a PR (or auto-commits) to update the lockfile. This "blesses" the new package versions as the "last-known-good" set. - On Failure (e.g., Update or remove the periodic-hamiltonian example #180): If the solve fails (like
periodic-hamiltonianwould), the job fails explicitly. It does not update the lockfile.
How This Solves Our Problems
This dual system gives us the best of both worlds:
- Users are protected: They always get the last-known-good lockfile from the
mainbranch. A "live at the head" failure in our CI does not break the recipe for the user. - Maintenance is explicit: When the "at head" CI job fails, we get an immediate, actionable signal. We can then:
- Fix the recipe to make it compatible with the new packages.
- Pin a dependency in the recipe's
environment.ymlif a new package is truly broken. - Formally mark the recipe as "deprecated" on the website if it's no longer maintainable (like Update or remove the periodic-hamiltonian example #180).
This stops recipe rot, makes maintenance a clear and explicit process, and delivers a reproducible, working example for our users.