Enable Paged Optimizer Support for XPU by jiqing-feng · Pull Request #1898 · bitsandbytes-foundation/bitsandbytes

jiqing-feng · 2026-03-12T05:36:39Z

Summary

Add paged optimizer support for Intel XPU devices using SYCL Unified Shared Memory (USM), enabling PagedAdamW, PagedAdam, and PagedLion on XPU. This brings feature parity with CUDA's paged optimizer implementation based on cudaMallocManaged.

Changes

C++ (csrc/pythonInterface.cpp)

Implement cget_managed_ptr, cprefetch, cfill_fp32, cfill_uint8 for XPU using SYCL USM APIs (sycl::malloc_shared, queue.prefetch, queue.fill)

Python

bitsandbytes/cextension.py: Add XpuBNBNativeLibrary class to properly set ctypes return types for the new XPU symbols
bitsandbytes/functional.py: Make device synchronization device-agnostic (CUDA/XPU) and rename cuda_ptr → managed_ptr
bitsandbytes/backends/triton/ops.py: Fix device context in optimizer wrappers to use g.device instead of state1.device (paged state tensors appear as CPU tensors)
tests/test_optim.py: Remove XPU paged optimizer skip

Examples (examples/xpu/)

paged_xpu_training.py: Real training case with LLaMA + Alpaca dataset
benchmark_paged_memory.py: Memory benchmark showing ~65% GPU memory reduction with paged optimizers

Test Results

Paged optimizer reduces GPU memory by 65.9% (2524 MB → 861 MB) on a ~220M parameter LLaMA model by offloading optimizer states to USM shared memory.

=====================================================================================
  RESULTS
=====================================================================================
                                             AdamW         AdamW8bit        PagedAdamW    PagedAdamW8bit
  ------------------------------  ----------------  ----------------  ----------------  ----------------
  Peak GPU Memory                        2524.7 MB         1287.4 MB          861.3 MB          867.8 MB
  Optimizer State on GPU                 1658.2 MB          421.3 MB            0.2 MB            6.8 MB
  Optimizer State on CPU (USM)              0.0 MB            0.0 MB         1658.0 MB          414.5 MB
  ------------------------------  ----------------  ----------------  ----------------  ----------------
  GPU Memory Saved vs AdamW               baseline  1237.4 MB (49.0%)  1663.5 MB (65.9%)  1657.0 MB (65.6%)
=====================================================================================

How to Verify

# Build with XPU backend
cmake -DCOMPUTE_BACKEND=xpu -S . && make
pip install -e .

# Run existing paged optimizer tests (previously skipped on XPU)
pytest tests/test_optim.py -k "paged"


# Run memory benchmark
python examples/xpu/benchmark_paged_memory.py

# Run training example
python examples/xpu/paged_xpu_training.py --compare

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-03-12T08:41:40Z

Hi @matthewdouglas . I have enabled paged optimizer for XPU. The xpu legend could be updated to full support after this PR is merged. Please review it. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions · 2026-03-12T14:12:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jiqing-feng added 4 commits March 12, 2026 12:50

enable xpu paged optimizer

314b58a

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

add examples for xpu

b5dcb92

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

add 8bit paged optimizer tests

d6aec21

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix current stream device

7d42452

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as draft March 12, 2026 05:47

jiqing-feng added 8 commits March 12, 2026 13:57

fix 8bit paged optimizer tests

cf1f0fe

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix cfill_uint8

db94fb6

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

add 8bit

80e72d7

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

add 8bit for example

4b33aaa

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix example

8c8ea49

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update example

1b60619

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update tests

b1b9313

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

restore

f40d954

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as ready for review March 12, 2026 08:40

jiqing-feng added 2 commits March 12, 2026 16:43

update example

0fd3f86

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update example

d0a903d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

matthewdouglas added this to the v0.50.0 milestone Mar 12, 2026

matthewdouglas added the Intel label Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Paged Optimizer Support for XPU#1898

Enable Paged Optimizer Support for XPU#1898
jiqing-feng wants to merge 14 commits intobitsandbytes-foundation:mainfrom
jiqing-feng:bmg

jiqing-feng commented Mar 12, 2026 •

edited

Loading

Uh oh!

jiqing-feng commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jiqing-feng commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Results

How to Verify

Uh oh!

jiqing-feng commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiqing-feng commented Mar 12, 2026 •

edited

Loading