🚀 New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Agentic AI MoE Open Source ⭐⭐⭐⭐⭐

Source: blogs.nvidia.com | March 11, 2026

📌 Key Highlights

120B parameters with only 12B active at inference (MoE architecture)
1M token context window - prevents goal drift in multi-agent workflows
5x higher throughput vs previous Nemotron Super model
Hybrid Architecture: Mamba layers (4x efficiency) + Transformer layers (reasoning)
Latent MoE: Activates 4 expert specialists for the cost of 1
Multi-Token Prediction: 3x faster inference
NVFP4 precision on Blackwell: 4x faster than FP8 on Hopper

            🎯 The Two Constraints:

            1. Context explosion: Multi-agent workflows generate 15x more tokens than standard chat

            2. Thinking tax: Using large models for every subtask is too expensive

🏢 Enterprise Adoption

AI-Native: Perplexity, CodeRabbit, Factory, Greptile, Edison Scientific
Enterprise: Amdocs, Palantir, Cadence, Dassault Systèmes, Siemens
Cloud: Google Cloud Vertex AI, Oracle Cloud, AWS (coming), Azure (coming)
Inference: Cloudflare, Fireworks AI, DeepInfra, Baseten, Together AI

📊 Performance

Top spot on Artificial Analysis for efficiency and openness
Powers NVIDIA AI-Q research agent to #1 on DeepResearch Bench
DeepResearch Bench II #1 position

🔓 Open Weights

Released with permissive license. Complete methodology published:

10+ trillion tokens of pre- and post-training data
15 training environments for RL
Evaluation recipes
Available on: build.nvidia.com, Perplexity, OpenRouter, Hugging Face

💡 Why It Matters

Nemotron 3 Super is designed for complex subtasks inside multi-agent systems:

Software development: Load entire codebase into context at once
Financial analysis: Load thousands of pages, eliminate re-reasoning
Cybersecurity: High-accuracy tool calling for autonomous security orchestration

← Back to AI Insights