🔬 LLM Architecture Gallery

⭐ 5/5

By Sebastian Raschka, PhD | Last updated: March 15, 2026

来源: sebastianraschka.com/llm-architecture-gallery
标签: AI LLM Architecture Deep Learning

📋 概述

这是 Sebastian Raschka 维护的 LLM 架构图库，收集了主流大语言模型的架构对比图和规格表。内容来自 The Big LLM Architecture Comparison 和 A Dream of Spring for Open-Weight LLMs 两篇文章。

DeepSeek V3 / R1

671B total, 37B active | Sparse MoE | MLA

Key: DeepSeek 开创了大规模开源 MoE 模型的先河，使用 dense prefix + shared expert

Qwen3 系列

235B total (22B active) / 32B dense / 8B dense / 4B dense

Key: QK-Norm + GQA，多个尺寸的 dense 和 MoE 变体

Llama 4

MoE 架构，20B/120B total

Key: 交替滑动窗口和全局注意力，GPT-4o 开源版本

NVIDIA Nano

30B total (3B active) | Hybrid MoE

Key: Mamba-2 + MoE 混合架构，极端的状态空间模型+Transformer混合

Mistral NeMo

80B total (3B active) | Sparse hybrid

Key: DeltaNet attention + Gated Attention，262k 上下文

GLM-4.7 / GLM-5

355B→744B total, 32B→40B active

Key: 采用 MLA + DeepSeek Sparse Attention

OLMo 2 / OLMo 3

7B / 24B / 32B dense

Key: 透明开源模型，保持 post-norm，使用 GQA + sliding window

Kimi k1.5 / k2

48B total (3B active) / 1T total (63B active)

Key: Linear attention 混合架构，NoPE + MLA

🧬 探索自 Lobsters | 2026-03-16 | Lobsters 讨论