Repository for the academic paper on Bayesian Optimization-driven auto-tuning of GPU kernels for performance portability between CUDA and SYCL.
This repository contains the implementation and experimental pipeline for comparative analysis of CUDA and SYCL kernel performance, with automated tuning via Bayesian Optimization (BO) and Random Search (RS). The project evaluates:
- Performance Portability: Quantifying the cost of porting from vendor-specific CUDA to open-standard SYCL
- Auto-tuning Strategies: Comparing Bayesian Optimization (TPE algorithm) vs. Random Search
- Computational Kernels: General Matrix Multiplication (GEMM) and 2D Stencil operations
- Juan F. Rojas de la H.
- Diego A. Arévalo Q.
- Sergio A. Gélvez C.
- Luis A. Torres N.
- Carlos J. Barrios H.
- Python 3.8+
- CUDA Toolkit 11.8+
- AdaptiveCpp (SYCL toolchain)
- GCC 9.0+ with C++17 support
- Clone the repository:
git clone https://github.com/SC3UIS/Bayesian_Auto-Tuning_for_Performance_Portability.git
cd Bayesian_Auto-Tuning_for_Performance_Portability- Install Python dependencies:
python3 -m pip install optuna numpy scipy matplotlib pandas- Verify CUDA and SYCL toolchains:
nvcc --version
acpp --version- Compile kernels (optional - auto-compilation during execution):
cd src
make clean
makecd src
python3 run_statistical_experiments.pyThis runs:
- Problem sizes: 1024³, 2048³, 4096³ (GEMM) / 1024² × 512, 2048² × 512, 4096² × 512 (Stencil)
- Kernels: Both GEMM and Stencil
- Backends: Both CUDA and SYCL
- Tuning budget: 20 trials per scenario
- Statistical runs: 10 independent trials with different random seeds
Change problem sizes:
python3 run_statistical_experiments.py --M 2048 --N 2048 --K 2048Select specific kernels:
python3 run_statistical_experiments.py --kernels matmul stencilSelect specific backends:
python3 run_statistical_experiments.py --backends cuda syclAdjust tuning configuration:
python3 run_statistical_experiments.py \
--num-runs 10 \
--trials 20 \
--bench-runs 10 \
--warmup-runs 5Specify custom output directory:
python3 run_statistical_experiments.py --output /path/to/resultsFull customization example:
python3 run_statistical_experiments.py \
--kernels matmul \
--backends cuda sycl \
--M 4096 --N 4096 --K 4096 \
--num-runs 5 \
--trials 30 \
--output ./custom_resultsGenerate statistical analysis:
python3 statistical_analysis.py \
--input /path/to/results \
--output /path/to/analysisCreate visualizations:
python3 analyze_results.py \
--input /path/to/results \
--plots convergence efficiency speedupThe data/ directory contains:
- Convergence Analysis: Tracking optimization algorithm progress across iterations
- Statistical Results: JSON files with execution times and efficiency metrics
- Performance Summaries: CSV exports of key performance indicators
- Speedup Tables: Comparative performance metrics (tuned vs. baseline)
All experimental data is organized by timestamp. To analyze the latest results:
cd data/results_20260522_152743
ls -laComprehensive documentation for each source file is available in the docs/ directory
We gratefully acknowledge:
- CAGE Research Group for research support and guidance
- Universidad Industrial de Santander for computational resources via the GUANE cluster
- Universidad de Cartagena for access to the PACCA supercomputing infrastructure