Source: Microsoft Research Blog | Date: March 2026

AgentRx: Systematic Debugging for AI Agents

⭐⭐⭐⭐⭐ 5/5 — Essential reading for AI agent developers

🎯 Core Problem: Debugging AI agent failures is incredibly hard because trajectories are long, stochastic, and often multi-agent — the true root cause gets buried.

The Challenge

Modern AI agents are:

Introducing AgentRx

AgentRx (Agent Diagnosis) treats agent execution like a system trace that needs validation. Instead of relying on a single LLM to "guess" the error, AgentRx uses a structured, multi-stage pipeline:

Pipeline Stages:
  • Trajectory normalization: Convert heterogeneous logs into common intermediate representation
  • Constraint synthesis: Generate executable constraints from tool schemas and domain policies
  • Guarded evaluation: Evaluate constraints step-by-step, producing auditable validation logs
  • LLM-based judging: Use LLM judge to identify Critical Failure Step

Failure Taxonomy (9 Categories)

CategoryDescription
Plan Adherence FailureIgnored required steps / did extra unplanned actions
Invention of New InformationAltered facts not grounded in trace/tool output (hallucination)
Invalid InvocationTool call malformed / missing args / schema-invalid
Misinterpretation of Tool OutputRead tool output incorrectly; acted on wrong assumptions
Intent–Plan MisalignmentMisread user goal/constraints and planned wrongly
Under-specified User IntentCould not proceed because required info wasn't available
Intent Not SupportedNo available tool can do what's being asked
Guardrails TriggeredExecution blocked by safety/access restrictions
System FailureConnectivity/tool endpoint failures

Key Results

Why It Matters

AgentRx allows developers to move beyond trial-and-error prompting and toward systematic agentic engineering. By providing the "why" behind a failure through an auditable log, it's a prerequisite for real-world agent deployment.

🔗 Original Article | GitHub Repo