-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Bundler performance optimizations: 2x faster installs #9316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Split the ParallelInstaller into two distinct phases: Phase 1 (Download): Download ALL gems in parallel with a dedicated worker pool. Since .gem files are just archives, no dependency ordering is needed - all downloads can happen concurrently. Phase 2 (Install): Install gems with dependency-aware ordering, but with all gems already cached locally. Pure Ruby gems (no native extensions) are installed immediately without waiting for dependencies, since they don't execute code during installation. Only gems with native extensions wait for their dependencies. Additional changes in this commit: - Add Source::Rubygems#download() as standalone download method - Add has_native_extensions?() detection for install ordering - Add global gem cache at $XDG_CACHE_HOME/bundler/gems/ with hardlink-to-local-cache strategy for cross-Ruby-version sharing - Add early satisfaction check: skip entire pipeline when nothing changed (inspired by uv's SatisfiesResult::Fresh) - Make Definition#install_needed? public for the early check - Cache lockfile_exists? and use O(1) hash lookups in converge_specs - Memoize cached_gem and installed? to avoid redundant stat calls - Guard against unbounded growth of caches array - Short-circuit lockfile write when nothing changed
Use a hierarchical file copy strategy inspired by uv's linker: 1. clonefile (macOS APFS copy-on-write) via cp -cR - nearly instant 2. hardlink tree - shares inodes, no data copied 3. regular copy - fallback for cross-device or unsupported filesystems Reduce unnecessary filesystem operations: - Consolidate triple stat in strict_rm_rf to single lstat call - Read compact index info files once (was reading twice: once for MD5 checksum, once for data) - Skip mkdir_p for compact index cache dirs that already exist - Reorder cached! to check flag before File.exist? Add optional IO tracing via BUNDLER_IO_TRACE=1 environment variable for profiling filesystem operations during bundle install.
Inspired by uv's version.rs, pack gem versions into 64-bit integers: [16-bit major][16-bit minor][16-bit patch][16-bit extra]. Integer comparison is O(1) with zero allocations, replacing Gem::Version#<=> which splits strings and allocates arrays on every comparison. ~90% of real-world versions (those with <= 4 numeric segments, each <= 65535, no prerelease tags) use the fast integer path. Prerelease and unusual versions transparently fall back to Gem::Version. The resolver performs thousands of version comparisons during dependency resolution, so this reduces both CPU time and GC pressure.
Before starting PubGrub resolution, eagerly populate the spec cache for all known dependency names from both Gemfile requirements and lockfile transitive dependencies. This triggers the compact index's parallel fetching to batch network requests upfront rather than fetching specs one-by-one as the resolver discovers them. Also add a gem info cache to CompactIndex that memoizes fetched gem info by name, preventing redundant network requests when the resolver retries or multiple sources overlap. Both patterns are inspired by uv's batch_prefetch.rs and OnceMap.
Optimize data structures in frequently-called code paths: - LockfileParser: replace send() dispatch with case statement, add @specs_by_name hash for O(1) dependency-to-spec lookup - CompactIndexClient::Parser: reduce allocations in versions parsing by using index()+slice instead of split() - Index#empty?: direct @specs.empty? instead of Enumerable iteration - SpecSet: O(1) name checks in validate_deps, O(1) find_by_name_and_platform via lookup hash, cached reverse dependency map in what_required, O(1) rake lookup in sorted - LazySpecification: inline lock_name to avoid NameTuple allocation - LockfileGenerator: hash-based dedup instead of array scan - Gem::NameTuple: cache full_name, lock_name, and hash values - Gem::Installer: skip write_cache_file when cache already exists - Gem::Specification: memoize runtime_dependencies (called thousands of times during resolution, each creating a new filtered array)
The pool was hardcoded at 5 but the download phase uses 8+ workers, creating a bottleneck. Now scales to max(jobs, 8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Opt-in setting (BUNDLE_IGNORE_RUBY_UPPER_BOUNDS=true) that filters out upper-bound Ruby version requirements from gem metadata. Useful when gems haven't updated their metadata for newer Ruby versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CompactVersion.compare() and versions_equal?() class methods for direct Gem::Version comparison using 64-bit packed integers. Apply in resolver sort, group_by, and GemVersionPromoter filter_versions to avoid expensive Gem::Version#<=> in hot paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Blake2b256 is ~3x faster than MD5 for hashing. Use it for local-only operations (cache path generation, etag paths) via fast_hexdigest. Falls back to MD5 when OpenSSL doesn't support Blake2. Protocol-level compact index checksums still use MD5 as required by the server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cache parsed compact index info arrays in Marshal format at info-binary/<name>.bin. On subsequent runs, load binary cache if the compact index checksum matches, skipping text parsing (string splitting, object allocation). Non-fatal on cache read/write failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split installation into download -> install phases with inline extract+finalize per gem. No batch barrier between extraction and installation — each gem extracts and finalizes in one step. After downloading, a quick scan reads .gem metadata to detect native extensions WITHOUT extracting. This lets the install phase prioritize native ext gems first so compilation starts ASAP and overlaps with pure Ruby gem installation. Key changes: - extract_to_temp/finalize_with[out]_extensions in RubyGemsGemInstaller - extract_gem/finalize_gem in Source::Rubygems - Streaming install in ParallelInstaller: native ext gems enqueued first - scan_native_extensions: peek at .gem metadata to detect extensions early - uv-inspired ProgressReporter with spinner, aligned counts, slow item display - Git sources participate in parallel download phase - Worker threads silenced to prevent UI corruption (thread-local fix) - Native extension detection from real spec after extraction (not LazySpec) - Global extension cache in XDG_CACHE_HOME keyed by Ruby ABI - Global gem cache path aligned with rubygems#7249 convention - Pre-filter installed_specs by lockfile gem names for targeted lookup - Incremental Gem::Specification.add_spec instead of double reset Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove Bundler.ui.silence wrapper (per-worker silence suffices) - Remove stock "Installing X" / "Fetching X" / "Using X" UI messages from source/rubygems.rb — progress reporter handles all display - Remove scan_native_extensions pass that opened every .gem an extra time — native extensions detected inline during install from real spec - Single pass: download → install (no intermediate scan) - Fix progress reporter: guard write_header when no phase active, flush after finish_phase to keep summaries in scrollback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Could you split this large PR to small PRs? |
|
This is not reviewable. |
Key changes: - Global extracted gem cache (~/.cache/gem/extracted/) avoids re-extracting .gem files across projects/installs. Cache hit loads marshaled spec + hardlinks files into GEM_HOME in ~0.01s. - Single-pass .gem extraction reads tar once, piping data.tar.gz to native `tar` for zero-Ruby-allocation decompression. - Hardlink-first file placement (pure Ruby, no subprocess) with clonefile and cp_r fallbacks. - Hardlink .gem files into GEM_HOME/cache/ instead of copying. - Shallow git clones (--depth 1) for git source gems. - Hide cursor during parallel download/install phases. - Show current gem name in progress reporter. - Reset Gem::Specification only when actually compiling extensions.
Benchmark update — global extracted gem cacheAdded a global extracted gem cache ( Benchmark resultsEach iteration runs a paired cold→warm sequence: nuke all caches →
Geometric mean speedup vs stock cold:
What this means
The "rails" scenario uses the default |
|
Hey, thanks for sending this through @tobi Some of these optimisations overlap with things we've already got in-flight or on our radar, which is encouraging. As Kou mentioned though, the scope makes it pretty tough to review as one PR. We're going to dig through these and try to pull out the most impactful changes and look to land them incrementally so we can validate properly and make sure we're not breaking compatibility. Thanks! |
Summary
This branch implements ~35 performance optimizations to Bundler's installation pipeline, inspired by tenderlove's blog post and by studying uv's actual source code architecture. The result is a Bundler that is 1.5-2.5x faster across different workloads.
Benchmark Results (Sequential, 3 iterations each)
Cold Cache (fresh install, no gems cached)
Warm Cache (gems already downloaded, re-install)
Complete Optimization Table
Wave 1: Core Architecture (27 optimizations)
parallel_installer.rb,source/rubygems.rbparallel_installer.rbrubygems_gem_installer.rbmajor.minor.patch.extrainto one u64 for O(1) comparisoncompact_version.rb(new),resolver/candidate.rb$XDG_CACHE_HOME/gem/gems/across Ruby versionssource/rubygems.rbresolver.rbparallel_installer.rbinstaller.rb,definition.rbfetcher/compact_index.rbcaseinstead ofsend(@parse_method)lockfile_parser.rb@specs_by_namehash instead offindlockfile_parser.rblockfile_exists?— avoid repeatedFile.exist?callsdefinition.rbconverge_specs—deps_by_name+@gems_to_unlockas hashdefinition.rbindex()+ slice instead ofsplit(" ", 3)compact_index_client/parser.rbIndex#empty?fast path — direct@specs.empty?instead of iterationindex.rbvalidate_depsspec_set.rblookup[name]narrows candidatesspec_set.rb@reverse_depshash inwhat_requiredspec_set.rb@specs.findinsortedspec_set.rbspec_set.rblock_name— avoidGem::NameTupleallocationlazy_specification.rb{}instead of[]+include?lockfile_generator.rbfull_name,lock_name,hashrubygems/name_tuple.rbwrite_cache_filerubygems/installer.rbruntime_dependencies— memoize instead of filtering on every callrubygems/specification.rbWave 2: Advanced Optimizations (10 optimizations)
fetcher/gem_remote_fetcher.rbignore_ruby_upper_boundssetting — opt-in filter for<and<=Ruby constraintsmatch_metadata.rb,settings.rbcompare()andversions_equal?()class methodscompact_version.rb,resolver.rb,gem_version_promoter.rbshared_helpers.rb,compact_index_client/cache.rbcompact_index_client/parser.rbparallel_installer.rb,gem_installer.rb,rubygems_gem_installer.rb,source/rubygems.rbparallel_installer.rb$XDG_CACHE_HOME/gem/extensions/keyed by Ruby ABIsource/rubygems.rbstubs_for(name)instead of full globrubygems_integration.rb,source/rubygems.rbinstaller/progress_reporter.rb(new),parallel_installer.rbAdditional Fixes
install_needed?was private but called externallydefinition.rbcachesarray grew unbounded with duplicate entriessource/rubygems.rbcompact_index_client/cache.rbparallel_installer.rb(thread-local silence)LazySpecificationlacksextensions— native ext detection failedparallel_installer.rb(detect from real spec)source/git.rbGem::Specification.add_specinstead of double resetrubygems_gem_installer.rbArchitecture
Test plan
bundle installcold cache on synthetic workloads (small, chain, wide, medium)bundle installwarm cache on synthetic workloadsbundle execworks after installignore_ruby_upper_boundssetting works when enabled🤖 Generated with Claude Code