SOL-ExecBench: Speed-of-Light GPU Kernel Benchmarking

⭐⭐⭐⭐⭐ | arXiv:2603.19173 | Submitted: 19 Mar 2026

GPU Benchmark NVIDIA CUDA AI

Summary

NVIDIA presents SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models, targeting NVIDIA Blackwell GPUs. The benchmark measures performance against analytically derived Speed-of-Light (SOL) bounds rather than software baselines.

Key Points

235 CUDA kernels from 124 production AI models (language, diffusion, vision, audio, video, hybrid)
Target: NVIDIA Blackwell GPUs with BF16, FP8, and NVFP4 support
SOL Score: Measures how much a kernel closes the gap between release baseline and hardware SOL bound
Sandboxed evaluation: GPU clock locking, L2 cache clearing, isolated subprocess execution, anti reward-hacking checks
Authors include 30+ NVIDIA engineers (Tianqi Chen, Luis Ceze, etc.)

        Core Insight: "SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light."
    

Why This Matters

As AI systems become capable of generating and optimizing GPU kernels, existing benchmarks reward speedup over software baselines rather than proximity to hardware-efficient execution. SOL-ExecBench provides a fixed target for hardware-efficient optimization.

SOL-ExecBench: Speed-of-Light GPU Kernel Benchmarking

Summary

Key Points

Why This Matters

Links