Back to Frontend

Autonomy Score

Agent Behavior

36 sessions with data in past 30d (100 total)

Count
36
Average
1.90
↓ 0.20 vs prior 30d
P10
0.40
↓ 0.11 vs prior 30d
P50
1.45
↓ 0.10 vs prior 30d
P90
3.00
↓ 1.26 vs prior 30d

Trend

Distribution

0.0–2.0
26
2.0–4.0
7
4.0–6.0
1
6.0–8.0
0
8.0–10.0
2

Notable Sessions

Highest
feat/feature-flags9.70
feat/keyboard-shortcuts8.30
optimize/redis-pool4.80
Lowest
optimize/cold-start0.20
fix/ws-memory-leak0.30
refactor/test-fixtures0.40

About This Metric

Autonomy Score

What It Measures

The ratio of assistant messages to human messages across sessions correlated to a PR. This measures how much work the agent does independently for each human intervention.

Why It Matters

A core promise of agentic coding is that the developer provides high-level direction while the agent handles implementation details. Autonomy Score quantifies this: a score of 5.0 means the agent produces 5 messages for every 1 human message, indicating the agent is executing multi-step workflows with minimal hand-holding.

Low autonomy scores suggest the developer is micromanaging — issuing individual commands rather than letting the agent plan and execute. This could indicate trust issues, poor prompting habits, or tasks that genuinely require tight human oversight.

Tracking autonomy over time reveals whether developers are learning to delegate effectively to the agent.

How It's Calculated

autonomy_score = assistant_message_count / human_message_count

Summed across all sessions correlated to the PR. Returns null if there are no human messages.

Unlike Iteration Depth (which counts human turns as a raw number), Autonomy Score normalizes against the agent's work output — a session with 3 human turns and 30 agent turns is very different from 3 human turns and 5 agent turns.

Data Sources Required

  • Claude Code session data — Human message count and assistant message count per session.