Rust threads on the GPU

Source: vectorware.comDate: 2026-03-24Rating: ⭐⭐⭐⭐⭐
Rust GPU Concurrency Warp Parallelism

Overview

VectorWare announces the first implementation of Rust's std::thread on GPU. This milestone enables writing complex GPU programs using familiar Rust abstractions, unlocking the existing Rust ecosystem (rayon, tokio) for GPU computing.

World First: Standard Rust code using thread::spawn now runs unchanged on GPU, with each warp behaving like a CPU thread.

The Problem: Mismatch Between Programming and Execution Models

CPU Model

Programs begin on a single thread and spawn additional threads as needed. Each thread runs independently.

GPU Model

GPU programs consist of kernels launched with thousands of instances running in parallel. The entry point is a function that "looks like it runs once but runs thousands of times."

// CUDA kernel - looks like a function, runs like thousands
__global__ void scale(float* data) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    data[i] *= 2.0f;
}

Why This Matters

The mismatch makes GPU programming hard because:

Why Not Map std::thread to GPU Lanes?

GPU "threads" (lanes) within warps are not CPU threads:

Solution: Map to Warps, Not Lanes

Key insight: Warps can behave like CPU threads:

What This Unlocks

Supporting std::thread enables major portions of Rust ecosystem on GPU:

Significance

This is a significant step toward "GPU-native software" where GPU code looks like ordinary Rust. The goal is for the Rust compiler to reason about the same invariants in both CPU and GPU environments, with concurrency explicit rather than implicit.


Original Article