🚀 New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Agentic AI MoE Open Source ⭐⭐⭐⭐⭐

Source: blogs.nvidia.com | March 11, 2026

📌 Key Highlights

  • 120B parameters with only 12B active at inference (MoE architecture)
  • 1M token context window - prevents goal drift in multi-agent workflows
  • 5x higher throughput vs previous Nemotron Super model
  • Hybrid Architecture: Mamba layers (4x efficiency) + Transformer layers (reasoning)
  • Latent MoE: Activates 4 expert specialists for the cost of 1
  • Multi-Token Prediction: 3x faster inference
  • NVFP4 precision on Blackwell: 4x faster than FP8 on Hopper
🎯 The Two Constraints:
1. Context explosion: Multi-agent workflows generate 15x more tokens than standard chat
2. Thinking tax: Using large models for every subtask is too expensive

🏢 Enterprise Adoption

  • AI-Native: Perplexity, CodeRabbit, Factory, Greptile, Edison Scientific
  • Enterprise: Amdocs, Palantir, Cadence, Dassault Systèmes, Siemens
  • Cloud: Google Cloud Vertex AI, Oracle Cloud, AWS (coming), Azure (coming)
  • Inference: Cloudflare, Fireworks AI, DeepInfra, Baseten, Together AI

📊 Performance

  • Top spot on Artificial Analysis for efficiency and openness
  • Powers NVIDIA AI-Q research agent to #1 on DeepResearch Bench
  • DeepResearch Bench II #1 position

🔓 Open Weights

Released with permissive license. Complete methodology published:

  • 10+ trillion tokens of pre- and post-training data
  • 15 training environments for RL
  • Evaluation recipes
  • Available on: build.nvidia.com, Perplexity, OpenRouter, Hugging Face

💡 Why It Matters

Nemotron 3 Super is designed for complex subtasks inside multi-agent systems:

  • Software development: Load entire codebase into context at once
  • Financial analysis: Load thousands of pages, eliminate re-reasoning
  • Cybersecurity: High-accuracy tool calling for autonomous security orchestration

← Back to AI Insights