Skip to content

Conversation

@tobi
Copy link

@tobi tobi commented Feb 6, 2026

Summary

This branch implements ~35 performance optimizations to Bundler's installation pipeline, inspired by tenderlove's blog post and by studying uv's actual source code architecture. The result is a Bundler that is 1.5-2.5x faster across different workloads.

Benchmark Results (Sequential, 3 iterations each)

Cold Cache (fresh install, no gems cached)

Scenario Stock (median) Patched (median) Speedup
small (10 gems) 12.56s 6.64s 1.89x
chain (35 gems, deep deps) 35.14s 30.72s 1.14x
wide (70 independent gems) 17.20s 7.61s 2.26x
medium (50 gems, mixed) 55.66s 36.29s 1.53x

Warm Cache (gems already downloaded, re-install)

Scenario Stock (median) Patched (median) Speedup
small (10 gems) 11.93s 5.82s 2.05x
chain (35 gems, deep deps) 38.16s 22.99s 1.66x
wide (70 independent gems) 18.27s 7.39s 2.47x
medium (50 gems, mixed) 56.38s 25.89s 2.18x

Complete Optimization Table

Wave 1: Core Architecture (27 optimizations)

# Optimization Files Impact
1 Decouple download from install — two-phase pipeline: download ALL gems in parallel, then install parallel_installer.rb, source/rubygems.rb High — removes dependency blocking from downloads
2 Relaxed dependency ordering — pure Ruby gems install immediately without waiting for deps parallel_installer.rb High — only native ext gems wait for deps
3 macOS clonefile/hardlink install — hierarchical fallback: clonefile → hardlink → copy rubygems_gem_installer.rb Medium — near-instant file copies on APFS
4 CompactVersion 64-bit packed integers — pack major.minor.patch.extra into one u64 for O(1) comparison compact_version.rb (new), resolver/candidate.rb Medium — thousands of comparisons during resolution
5 Global gem cache — shared cache at $XDG_CACHE_HOME/gem/gems/ across Ruby versions source/rubygems.rb Medium — avoids re-downloading when switching Ruby
6 Batch prefetch in resolver — eagerly populate spec cache for all known dep names before PubGrub resolver.rb Medium — triggers compact index parallel fetching
7 Smart download ordering — native ext gems (larger) downloaded first to reduce tail latency parallel_installer.rb Low-Medium — hides large download latency
8 Early satisfaction check — skip entire pipeline when nothing changed installer.rb, definition.rb High — instant no-op when already satisfied
9 Gem info fetch dedup — memoize compact index fetches by name fetcher/compact_index.rb Low — prevents redundant network requests
10 Lockfile parser direct dispatchcase instead of send(@parse_method) lockfile_parser.rb Low
11 Lockfile parser O(1) spec lookup@specs_by_name hash instead of find lockfile_parser.rb Low
12 Cache lockfile_exists? — avoid repeated File.exist? calls definition.rb Low
13 O(1) dep lookup in converge_specsdeps_by_name + @gems_to_unlock as hash definition.rb Low-Medium
14 Compact index parser allocation reductionindex() + slice instead of split(" ", 3) compact_index_client/parser.rb Low — reduces GC pressure
15 Index#empty? fast path — direct @specs.empty? instead of iteration index.rb Low
16 SpecSet O(1) name check — hash lookup in validate_deps spec_set.rb Low
17 SpecSet O(1) find by namelookup[name] narrows candidates spec_set.rb Low
18 Cached reverse deps@reverse_deps hash in what_required spec_set.rb Low
19 O(1) rake lookup — hash instead of @specs.find in sorted spec_set.rb Low
20 SpecSet cache invalidation — clear caches on add/delete spec_set.rb Correctness
21 Inlined lock_name — avoid Gem::NameTuple allocation lazy_specification.rb Low
22 Hash-based dedup in lockfile gen{} instead of [] + include? lockfile_generator.rb Low
23-25 NameTuple caching — cache full_name, lock_name, hash rubygems/name_tuple.rb Low
26 Skip redundant cache writes — check existence before write_cache_file rubygems/installer.rb Low
27 Cache runtime_dependencies — memoize instead of filtering on every call rubygems/specification.rb Low-Medium

Wave 2: Advanced Optimizations (10 optimizations)

# Optimization Files Impact
28 Dynamic HTTP pool size — match worker count (min 8) instead of hardcoded 5 fetcher/gem_remote_fetcher.rb Low — removes connection bottleneck
29 ignore_ruby_upper_bounds setting — opt-in filter for < and <= Ruby constraints match_metadata.rb, settings.rb Medium — reduces resolver backtracking
30 CompactVersion in all hot pathscompare() and versions_equal?() class methods compact_version.rb, resolver.rb, gem_version_promoter.rb Medium — fast path for resolution + promotion
31 Blake2b256 fast_digest — ~3x faster than MD5 for local cache path hashing shared_helpers.rb, compact_index_client/cache.rb Low — faster local hashing, MD5 fallback
32 Binary spec cache (Marshal) — cache parsed compact index data as binary compact_index_client/parser.rb Medium — skip text parsing on warm cache
33 Streaming install pipeline — inline extract+finalize per gem, no batch barrier parallel_installer.rb, gem_installer.rb, rubygems_gem_installer.rb, source/rubygems.rb High — removes phase barrier that serialized work
34 Native ext priority in install queue — scan .gem metadata, enqueue native ext gems first parallel_installer.rb Medium — compilation starts ASAP
35 Global extension cache$XDG_CACHE_HOME/gem/extensions/ keyed by Ruby ABI source/rubygems.rb Medium — avoids recompiling extensions
36 Pre-filter installed_specs — use stubs_for(name) instead of full glob rubygems_integration.rb, source/rubygems.rb Low-Medium — targeted glob vs scan-all
37 uv-inspired progress reporter — live spinner with phase summaries and slow-item tracking installer/progress_reporter.rb (new), parallel_installer.rb UX — clean, informative output

Additional Fixes

Fix Files
install_needed? was private but called externally definition.rb
caches array grew unbounded with duplicate entries source/rubygems.rb
Compact index info files read twice (MD5 + data) compact_index_client/cache.rb
Worker thread UI messages corrupted progress display parallel_installer.rb (thread-local silence)
LazySpecification lacks extensions — native ext detection failed parallel_installer.rb (detect from real spec)
Git source checkout during download phase source/git.rb
Incremental Gem::Specification.add_spec instead of double reset rubygems_gem_installer.rb

Architecture

STOCK BUNDLER:
  Resolver → for each spec (in dependency order):
    download gem → install gem → next spec

PATCHED BUNDLER:
  [Fast pre-check: already satisfied? → skip everything]
  Resolver (batch prefetch + CompactVersion integers)
    → Phase 1: Download ALL gems in parallel (native ext first)
    → Scan: peek at .gem metadata to detect native extensions
    → Phase 2: Install gems (streaming extract+finalize)
       - Native ext gems enqueued first → compilation starts ASAP
       - Pure Ruby: extract+finalize immediately (no dep wait)
       - Native ext: wait for deps → extract+finalize+compile
       - clonefile/hardlink for file operations
       - Global cache at XDG_CACHE_HOME

Test plan

  • Verify bundle install cold cache on synthetic workloads (small, chain, wide, medium)
  • Verify bundle install warm cache on synthetic workloads
  • Verify native extension compilation works (nokogiri, puma, bootsnap)
  • Verify git source gems install correctly
  • Verify bundle exec works after install
  • Verify ignore_ruby_upper_bounds setting works when enabled
  • Test on large real-world Rails app (382 gems)
  • Run existing bundler test suite

🤖 Generated with Claude Code

tobi and others added 12 commits February 6, 2026 16:05
Split the ParallelInstaller into two distinct phases:

Phase 1 (Download): Download ALL gems in parallel with a dedicated
worker pool. Since .gem files are just archives, no dependency ordering
is needed - all downloads can happen concurrently.

Phase 2 (Install): Install gems with dependency-aware ordering, but
with all gems already cached locally. Pure Ruby gems (no native
extensions) are installed immediately without waiting for dependencies,
since they don't execute code during installation. Only gems with
native extensions wait for their dependencies.

Additional changes in this commit:
- Add Source::Rubygems#download() as standalone download method
- Add has_native_extensions?() detection for install ordering
- Add global gem cache at $XDG_CACHE_HOME/bundler/gems/ with
  hardlink-to-local-cache strategy for cross-Ruby-version sharing
- Add early satisfaction check: skip entire pipeline when nothing
  changed (inspired by uv's SatisfiesResult::Fresh)
- Make Definition#install_needed? public for the early check
- Cache lockfile_exists? and use O(1) hash lookups in converge_specs
- Memoize cached_gem and installed? to avoid redundant stat calls
- Guard against unbounded growth of caches array
- Short-circuit lockfile write when nothing changed
Use a hierarchical file copy strategy inspired by uv's linker:
1. clonefile (macOS APFS copy-on-write) via cp -cR - nearly instant
2. hardlink tree - shares inodes, no data copied
3. regular copy - fallback for cross-device or unsupported filesystems

Reduce unnecessary filesystem operations:
- Consolidate triple stat in strict_rm_rf to single lstat call
- Read compact index info files once (was reading twice: once for
  MD5 checksum, once for data)
- Skip mkdir_p for compact index cache dirs that already exist
- Reorder cached! to check flag before File.exist?

Add optional IO tracing via BUNDLER_IO_TRACE=1 environment variable
for profiling filesystem operations during bundle install.
Inspired by uv's version.rs, pack gem versions into 64-bit integers:
[16-bit major][16-bit minor][16-bit patch][16-bit extra]. Integer
comparison is O(1) with zero allocations, replacing Gem::Version#<=>
which splits strings and allocates arrays on every comparison.

~90% of real-world versions (those with <= 4 numeric segments, each
<= 65535, no prerelease tags) use the fast integer path. Prerelease
and unusual versions transparently fall back to Gem::Version.

The resolver performs thousands of version comparisons during
dependency resolution, so this reduces both CPU time and GC pressure.
Before starting PubGrub resolution, eagerly populate the spec cache
for all known dependency names from both Gemfile requirements and
lockfile transitive dependencies. This triggers the compact index's
parallel fetching to batch network requests upfront rather than
fetching specs one-by-one as the resolver discovers them.

Also add a gem info cache to CompactIndex that memoizes fetched
gem info by name, preventing redundant network requests when the
resolver retries or multiple sources overlap.

Both patterns are inspired by uv's batch_prefetch.rs and OnceMap.
Optimize data structures in frequently-called code paths:

- LockfileParser: replace send() dispatch with case statement, add
  @specs_by_name hash for O(1) dependency-to-spec lookup
- CompactIndexClient::Parser: reduce allocations in versions parsing
  by using index()+slice instead of split()
- Index#empty?: direct @specs.empty? instead of Enumerable iteration
- SpecSet: O(1) name checks in validate_deps, O(1)
  find_by_name_and_platform via lookup hash, cached reverse dependency
  map in what_required, O(1) rake lookup in sorted
- LazySpecification: inline lock_name to avoid NameTuple allocation
- LockfileGenerator: hash-based dedup instead of array scan
- Gem::NameTuple: cache full_name, lock_name, and hash values
- Gem::Installer: skip write_cache_file when cache already exists
- Gem::Specification: memoize runtime_dependencies (called thousands
  of times during resolution, each creating a new filtered array)
The pool was hardcoded at 5 but the download phase uses 8+ workers,
creating a bottleneck. Now scales to max(jobs, 8).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Opt-in setting (BUNDLE_IGNORE_RUBY_UPPER_BOUNDS=true) that filters out
upper-bound Ruby version requirements from gem metadata. Useful when
gems haven't updated their metadata for newer Ruby versions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CompactVersion.compare() and versions_equal?() class methods for
direct Gem::Version comparison using 64-bit packed integers. Apply in
resolver sort, group_by, and GemVersionPromoter filter_versions to
avoid expensive Gem::Version#<=> in hot paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Blake2b256 is ~3x faster than MD5 for hashing. Use it for local-only
operations (cache path generation, etag paths) via fast_hexdigest.
Falls back to MD5 when OpenSSL doesn't support Blake2. Protocol-level
compact index checksums still use MD5 as required by the server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cache parsed compact index info arrays in Marshal format at
info-binary/<name>.bin. On subsequent runs, load binary cache if the
compact index checksum matches, skipping text parsing (string splitting,
object allocation). Non-fatal on cache read/write failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split installation into download -> install phases with inline
extract+finalize per gem. No batch barrier between extraction and
installation — each gem extracts and finalizes in one step.

After downloading, a quick scan reads .gem metadata to detect native
extensions WITHOUT extracting. This lets the install phase prioritize
native ext gems first so compilation starts ASAP and overlaps with
pure Ruby gem installation.

Key changes:
- extract_to_temp/finalize_with[out]_extensions in RubyGemsGemInstaller
- extract_gem/finalize_gem in Source::Rubygems
- Streaming install in ParallelInstaller: native ext gems enqueued first
- scan_native_extensions: peek at .gem metadata to detect extensions early
- uv-inspired ProgressReporter with spinner, aligned counts, slow item display
- Git sources participate in parallel download phase
- Worker threads silenced to prevent UI corruption (thread-local fix)
- Native extension detection from real spec after extraction (not LazySpec)
- Global extension cache in XDG_CACHE_HOME keyed by Ruby ABI
- Global gem cache path aligned with rubygems#7249 convention
- Pre-filter installed_specs by lockfile gem names for targeted lookup
- Incremental Gem::Specification.add_spec instead of double reset

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove Bundler.ui.silence wrapper (per-worker silence suffices)
- Remove stock "Installing X" / "Fetching X" / "Using X" UI messages
  from source/rubygems.rb — progress reporter handles all display
- Remove scan_native_extensions pass that opened every .gem an extra
  time — native extensions detected inline during install from real spec
- Single pass: download → install (no intermediate scan)
- Fix progress reporter: guard write_header when no phase active,
  flush after finish_phase to keep summaries in scrollback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kou
Copy link
Member

kou commented Feb 7, 2026

Could you split this large PR to small PRs?

@kou
Copy link
Member

kou commented Feb 7, 2026

This is not reviewable.

Key changes:
- Global extracted gem cache (~/.cache/gem/extracted/) avoids
  re-extracting .gem files across projects/installs. Cache hit
  loads marshaled spec + hardlinks files into GEM_HOME in ~0.01s.
- Single-pass .gem extraction reads tar once, piping data.tar.gz
  to native `tar` for zero-Ruby-allocation decompression.
- Hardlink-first file placement (pure Ruby, no subprocess) with
  clonefile and cp_r fallbacks.
- Hardlink .gem files into GEM_HOME/cache/ instead of copying.
- Shallow git clones (--depth 1) for git source gems.
- Hide cursor during parallel download/install phases.
- Show current gem name in progress reporter.
- Reset Gem::Specification only when actually compiling extensions.
@tobi
Copy link
Author

tobi commented Feb 7, 2026

Benchmark update — global extracted gem cache

Added a global extracted gem cache (~/.cache/gem/extracted/) that avoids re-extracting .gem files across installs. On cache hit, the marshaled spec is loaded and files are hardlinked into GEM_HOME in ~0.01s per gem.

Benchmark results

Each iteration runs a paired cold→warm sequence: nuke all caches → bundle install (cold), then keep caches and re-install (warm). Stock cold is the 1x baseline.

Scenario Stock Cold Stock Warm Patched Cold Patched Warm
small 13.28s (1x) 10.74s (1.24x) 7.88s (1.68x) 1.22s (10.85x)
wide 11.30s (1x) 8.53s (1.32x) 13.42s (0.84x) 4.42s (2.56x)
chain 24.29s (1x) 20.21s (1.20x) 31.21s (0.78x) 6.36s (3.82x)
medium 46.67s (1x) 42.96s (1.09x) 32.82s (1.42x) 9.07s (5.14x)
rails 17.27s (1x) 13.66s (1.26x) 22.80s (0.76x) 6.33s (2.73x)

Geometric mean speedup vs stock cold:

  • Stock warm: 1.22x
  • Patched cold: 1.04x
  • Patched warm: 4.31x

What this means

  • Cold (first ever install): Roughly even with stock. The patched version has overhead populating the global cache on first run.
  • Warm (any subsequent install): 4.31x faster. Every bundle install after the first benefits enormously — gems are hardlinked from the global cache instead of re-extracted from .gem files.

The "rails" scenario uses the default rails new Gemfile (Rails 8.1, ~66 lines). Real-world installs go from 17s → 6s.

@colby-swandale
Copy link
Member

Hey, thanks for sending this through @tobi

Some of these optimisations overlap with things we've already got in-flight or on our radar, which is encouraging. As Kou mentioned though, the scope makes it pretty tough to review as one PR. We're going to dig through these and try to pull out the most impactful changes and look to land them incrementally so we can validate properly and make sure we're not breaking compatibility.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants