Skip to content

Enable Paged Optimizer Support for XPU#1898

Open
jiqing-feng wants to merge 14 commits intobitsandbytes-foundation:mainfrom
jiqing-feng:bmg
Open

Enable Paged Optimizer Support for XPU#1898
jiqing-feng wants to merge 14 commits intobitsandbytes-foundation:mainfrom
jiqing-feng:bmg

Conversation

@jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Mar 12, 2026

Summary

Add paged optimizer support for Intel XPU devices using SYCL Unified Shared Memory (USM), enabling PagedAdamW, PagedAdam, and PagedLion on XPU. This brings feature parity with CUDA's paged optimizer implementation based on cudaMallocManaged.

Changes

C++ (csrc/pythonInterface.cpp)

  • Implement cget_managed_ptr, cprefetch, cfill_fp32, cfill_uint8 for XPU using SYCL USM APIs (sycl::malloc_shared, queue.prefetch, queue.fill)

Python

  • bitsandbytes/cextension.py: Add XpuBNBNativeLibrary class to properly set ctypes return types for the new XPU symbols
  • bitsandbytes/functional.py: Make device synchronization device-agnostic (CUDA/XPU) and rename cuda_ptrmanaged_ptr
  • bitsandbytes/backends/triton/ops.py: Fix device context in optimizer wrappers to use g.device instead of state1.device (paged state tensors appear as CPU tensors)
  • tests/test_optim.py: Remove XPU paged optimizer skip

Examples (examples/xpu/)

  • paged_xpu_training.py: Real training case with LLaMA + Alpaca dataset
  • benchmark_paged_memory.py: Memory benchmark showing ~65% GPU memory reduction with paged optimizers

Test Results

Paged optimizer reduces GPU memory by 65.9% (2524 MB → 861 MB) on a ~220M parameter LLaMA model by offloading optimizer states to USM shared memory.

=====================================================================================
  RESULTS
=====================================================================================
                                             AdamW         AdamW8bit        PagedAdamW    PagedAdamW8bit
  ------------------------------  ----------------  ----------------  ----------------  ----------------
  Peak GPU Memory                        2524.7 MB         1287.4 MB          861.3 MB          867.8 MB
  Optimizer State on GPU                 1658.2 MB          421.3 MB            0.2 MB            6.8 MB
  Optimizer State on CPU (USM)              0.0 MB            0.0 MB         1658.0 MB          414.5 MB
  ------------------------------  ----------------  ----------------  ----------------  ----------------
  GPU Memory Saved vs AdamW               baseline  1237.4 MB (49.0%)  1663.5 MB (65.9%)  1657.0 MB (65.6%)
=====================================================================================

How to Verify

# Build with XPU backend
cmake -DCOMPUTE_BACKEND=xpu -S . && make
pip install -e .

# Run existing paged optimizer tests (previously skipped on XPU)
pytest tests/test_optim.py -k "paged"


# Run memory benchmark
python examples/xpu/benchmark_paged_memory.py

# Run training example
python examples/xpu/paged_xpu_training.py --compare

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as draft March 12, 2026 05:47
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as ready for review March 12, 2026 08:40
@jiqing-feng
Copy link
Contributor Author

Hi @matthewdouglas . I have enabled paged optimizer for XPU. The xpu legend could be updated to full support after this PR is merged. Please review it. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@matthewdouglas matthewdouglas added this to the v0.50.0 milestone Mar 12, 2026
@github-actions
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants