Anthropic Research ⭐⭐⭐⭐⭐

Measuring AI Agent Autonomy in Practice

Anthropic • February 18, 2026

Executive Summary

Anthropic analyzed millions of human-agent interactions across Claude Code and public API to understand: How much autonomy do people grant agents? The findings reveal a significant deployment overhang—models are capable of more autonomy than they exercise in practice.

Key Findings

Claude Code is working autonomously for longer.
The longest-running sessions nearly doubled: from under 25 minutes to over 45 minutes (99.9th percentile). This increase is smooth across model releases—suggesting it's not purely about model capability.

User Experience Patterns

New users: ~20% of sessions use full auto-approve
Experienced users (750+ sessions): >40% use full auto-approve
But experienced users also interrupt more often—intervening only when needed
Claude Code pauses for clarification more often than humans interrupt it—2x more on complex tasks

Risk Profile

Most agent actions are low-risk and reversible
Software engineering accounts for nearly 50% of agentic activity
Emerging usage in healthcare, finance, and cybersecurity
Agents are used in risky domains, but not yet at scale

Methodology

Definition: An agent is "an AI system equipped with tools that allow it to take actions"
Studied both Claude Code (depth) and public API (breadth)
Privacy-preserving infrastructure (CLIO)

Critical Insight: Deployment Overhang

Central Conclusion: The latitude granted to models in practice lags behind what they can handle.

METR estimates Claude Opus 4.5 can complete tasks with 50% success rate that would take a human nearly 5 hours. But the 99.9th percentile turn duration in practice is ~42 minutes—far below capability.

Internal Results

From August to December: Success rate doubled on most challenging tasks
Average human interventions per session: 5.4 → 3.3
Users achieving better outcomes while needing to intervene less often

Recommendations

Model developers: Need new forms of post-deployment monitoring
Product developers: New human-AI interaction paradigms
Policymakers: Understand how autonomy and risk are managed together

Classification

AI Agents Claude Code Human-AI Interaction Autonomy Deployment