SOL-ExecBench: Speed-of-Light GPU Kernel Benchmarking
GPU
Benchmark
NVIDIA
CUDA
AI
Summary
NVIDIA presents SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models, targeting NVIDIA Blackwell GPUs. The benchmark measures performance against analytically derived Speed-of-Light (SOL) bounds rather than software baselines.
Key Points
- 235 CUDA kernels from 124 production AI models (language, diffusion, vision, audio, video, hybrid)
- Target: NVIDIA Blackwell GPUs with BF16, FP8, and NVFP4 support
- SOL Score: Measures how much a kernel closes the gap between release baseline and hardware SOL bound
- Sandboxed evaluation: GPU clock locking, L2 cache clearing, isolated subprocess execution, anti reward-hacking checks
- Authors include 30+ NVIDIA engineers (Tianqi Chen, Luis Ceze, etc.)
Core Insight: "SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light."
Why This Matters
As AI systems become capable of generating and optimizing GPU kernels, existing benchmarks reward speedup over software baselines rather than proximity to hardware-efficient execution. SOL-ExecBench provides a fixed target for hardware-efficient optimization.