Rust threads on the GPU

Source: vectorware.com • Date: 2026-03-24 • Rating: ⭐⭐⭐⭐⭐

Rust GPU Concurrency Warp Parallelism

Overview

VectorWare announces the first implementation of Rust's std::thread on GPU. This milestone enables writing complex GPU programs using familiar Rust abstractions, unlocking the existing Rust ecosystem (rayon, tokio) for GPU computing.

        World First: Standard Rust code using thread::spawn now runs unchanged on GPU, with each warp behaving like a CPU thread.
    

The Problem: Mismatch Between Programming and Execution Models

CPU Model

Programs begin on a single thread and spawn additional threads as needed. Each thread runs independently.

GPU Model

GPU programs consist of kernels launched with thousands of instances running in parallel. The entry point is a function that "looks like it runs once but runs thousands of times."

// CUDA kernel - looks like a function, runs like thousands
__global__ void scale(float* data) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    data[i] *= 2.0f;
}

Why This Matters

The mismatch makes GPU programming hard because:

Concurrency is implicit, not explicit
Programmer must manually uphold invariants (indexing,避免竞态)
Compiler cannot enforce GPU-specific invariants

Why Not Map std::thread to GPU Lanes?

GPU "threads" (lanes) within warps are not CPU threads:

GPU lane = SIMD lane, not independent execution context
Lanes within a warp advance together in lockstep
Mapping thread::spawn to lanes would cause divergence
Hardware would serialize different paths, negating concurrency benefit

Solution: Map to Warps, Not Lanes

Key insight: Warps can behave like CPU threads:

Each warp has its own program counter
Each warp has its own register file
GPU scheduler switches between warps (like OS scheduler)
Anything a CPU thread can do, a warp can do

What This Unlocks

Supporting std::thread enables major portions of Rust ecosystem on GPU:

Rayon - widely-used thread pools
Tokio - async runtime (previously demonstrated async on GPU)
Many parallelism libraries that depend on std::thread

Significance

This is a significant step toward "GPU-native software" where GPU code looks like ordinary Rust. The goal is for the Rust compiler to reason about the same invariants in both CPU and GPU environments, with concurrency explicit rather than implicit.

Original Article