Anthropic Research ⭐⭐⭐⭐⭐

Measuring AI Agent Autonomy in Practice

Anthropic • February 18, 2026

Executive Summary

Anthropic analyzed millions of human-agent interactions across Claude Code and public API to understand: How much autonomy do people grant agents? The findings reveal a significant deployment overhang—models are capable of more autonomy than they exercise in practice.

Key Findings

Claude Code is working autonomously for longer.
The longest-running sessions nearly doubled: from under 25 minutes to over 45 minutes (99.9th percentile). This increase is smooth across model releases—suggesting it's not purely about model capability.

User Experience Patterns

  • New users: ~20% of sessions use full auto-approve
  • Experienced users (750+ sessions): >40% use full auto-approve
  • But experienced users also interrupt more often—intervening only when needed
  • Claude Code pauses for clarification more often than humans interrupt it—2x more on complex tasks

Risk Profile

  • Most agent actions are low-risk and reversible
  • Software engineering accounts for nearly 50% of agentic activity
  • Emerging usage in healthcare, finance, and cybersecurity
  • Agents are used in risky domains, but not yet at scale

Methodology

  • Definition: An agent is "an AI system equipped with tools that allow it to take actions"
  • Studied both Claude Code (depth) and public API (breadth)
  • Privacy-preserving infrastructure (CLIO)

Critical Insight: Deployment Overhang

Central Conclusion: The latitude granted to models in practice lags behind what they can handle.

METR estimates Claude Opus 4.5 can complete tasks with 50% success rate that would take a human nearly 5 hours. But the 99.9th percentile turn duration in practice is ~42 minutes—far below capability.

Internal Results

  • From August to December: Success rate doubled on most challenging tasks
  • Average human interventions per session: 5.4 → 3.3
  • Users achieving better outcomes while needing to intervene less often

Recommendations

  • Model developers: Need new forms of post-deployment monitoring
  • Product developers: New human-AI interaction paradigms
  • Policymakers: Understand how autonomy and risk are managed together

Classification

AI Agents Claude Code Human-AI Interaction Autonomy Deployment