AI Coding Productivity: 10%, Not 10x
The Number Nobody Wants to Hear
A few weeks ago, I wrote about running my entire workday through an AI agent [1] — meetings, research, CRM, content creation. Eight hours of productive work, not a single line of code. The response was overwhelmingly positive. But one comment stuck with me: “If AI agents are this good, why isn’t my team shipping 10x more?”
The answer is now backed by data from multiple independent studies — and it’s not what the vendor pitches suggest.
The Evidence
The DX longitudinal study tracked 40 companies from November 2024 to February 2026 [2]. AI usage increased by an average of 65%. PR throughput? Up by 9.97%. Not 10x. Not 2x. Ten percent.
This isn’t an outlier. Every major study converges on the same range:
| Study | Finding |
|---|---|
| DX (40 companies, 2024–2026) | PR throughput +9.97% despite 65% more AI usage [2] |
| METR (experienced OSS developers) | Initially 19% slower with AI; later data shows modest speedup [3] |
| Google DORA (5,000 professionals) | Self-reported productivity up, but software delivery instability increased [4] |
| Multitudes (500+ developers) | 27% more PRs merged, but 19.6% more out-of-hours work [5] |
| Anthropic (internal study) | Engineers using AI scored 17% lower on knowledge quizzes [6] |
One developer in the DX study put it perfectly: “The easy tasks are a little easier. The tedious tasks are a little less annoying. A four-day task might take three. But that doesn’t mean I’m shipping 3x more PRs.”
Why 10% and Not 10x?

AI accelerates the pit stop. The race is the 57 laps in between.
The fundamental reason: writing code was never the bottleneck.
Software delivery is planning, alignment, scoping, architecture decisions, code review, testing, deployment, and handoffs. The human coordination parts. AI coding assistants accelerate the typing — which was already the fastest part of the process.
It’s like giving a Formula 1 team a faster pit stop crew. Helpful? Yes. Race-changing? Only if pit stops were the bottleneck. They’re not — it’s the 57 laps in between.
To be more precise: developers spend roughly 30-50% of their time writing code. AI accelerates that portion significantly. But the other 50-70% — planning, review, coordination, deployment — remains largely untouched. A major improvement on half the work explains a modest improvement overall.
AI accelerates the green block. The red blocks — where most time goes — remain untouched.
The Harness report (March 2026) found exactly this: AI is accelerating code production, but DevOps maturity isn’t keeping pace [7]. More code, same deployment pipeline, same review process. The bottleneck just moved.
The Hidden Costs
The productivity story gets worse when you look at second-order effects:
Working hours expand, not contract. The Multitudes study found a 19.6% rise in out-of-hours commits alongside the 27% increase in PRs. AI didn’t free up time — it raised expectations. The HBR study at a US tech company found employees began using AI during lunch, breaks, and meetings [8].
Skills atrophy. Anthropic’s own research found engineers using AI scored 17% lower on knowledge quizzes about the software they worked with — with the biggest gap in debugging questions [6]. You could argue this is rational delegation, like offloading arithmetic to calculators. But debugging isn’t memorization — when AI-generated code breaks, the developer needs to understand it to fix it. That comprehension gap is the real risk.
Instability increases. Google’s DORA report found that software delivery instability — rollbacks and patches after release — increased with AI use [4]. More code shipped faster, but not necessarily better code.
Executives overestimate gains. A March 2026 study found 89% of executives say AI boosts productivity, yet the actual net time saved is 16 minutes per week — because they spend 4 hours and 20 minutes validating AI-generated outputs against the 4.6 hours they believe they save [9].
The Variance Is the Story
The averages mask enormous variance. A 6x output gap exists between top-quartile AI users and everyone else. Low-performing teams benefit 4x more from AI than high-performing ones [10]. The difference isn’t the tool — it’s organizational enablement: training, process adaptation, and identifying where AI actually helps.
As Rob Bos commented on Sam Newman’s viral post about this data [11]: “Companies hand out licenses without training or process changes, then wonder about low ROI. AI just highlights the standard issues in your internal practices, just like Agile did, and DevOps.”
Where the Real Gains Are
Here’s the contrarian take: the biggest productivity gains from AI aren’t in coding at all. They’re in the non-coding work that surrounds it.
In my own experience [1], the highest-value agent use cases are: meeting preparation (pulling context from Slack, CRM, LinkedIn), research synthesis (turning 10 sources into a structured answer), content creation (drafting from notes), and administrative automation (expense reports, activity logging). These are tasks where AI replaces hours of context-gathering, not seconds of typing.
But even within coding, the 10% average hides a crucial distinction: vibe coding vs spec-driven development [13].

The human designs; the machine builds. Spec-driven development changes the equation.
Most teams hand an AI agent a vague prompt and hope for the best — “build me a login page.” That’s vibe coding, and it produces the mediocre results the studies measure. The teams seeing outsized gains do something different: they break work into precise specifications, define acceptance criteria upfront, and use the agent as an executor, not an architect. IBM calls this emerging discipline “Agentic Engineering” [14] — the human becomes the architect and orchestrator; the agent handles the boilerplate.
Vibe coding loops endlessly. Spec-driven development validates and ships.
This is where spec-driven development changes the equation. When you carefully prepare and decompose work for an agent — clear inputs, expected outputs, edge cases defined — the agent can execute reliably. The real bottleneck today isn’t the agent’s capability; it’s the human attention required to prepare, review, and steer. That preparation cost is real, but it’s a skill that improves with practice. The DX study’s 6x gap between top and bottom quartile users likely reflects this: the top quartile has learned how to work with agents effectively.
To be transparent: there’s no controlled study yet directly comparing spec-driven vs vibe coding productivity. The argument is logical and supported by practitioner reports, but the rigorous measurement is still missing. What we do know is that the variance between users is enormous — and the differentiator appears to be how they structure work for the agent, not which agent they use.
Beyond Efficiency: The Expansion Effect

The flywheel: more builders → more experiments → more learning → better tools → more builders.
But here’s what the productivity studies miss entirely: they only measure efficiency of existing work. They don’t measure the new work that wasn’t possible before.
When the effort to get from idea to first working prototype drops from weeks to hours, something fundamental changes. More experiments get run. More ideas get tested. People who couldn’t code before can now build functional tools. GitHub added 36 million new developers in 2025 alone — one every second [15]. 80% of low-code platform users come from non-IT backgrounds [16]. The barrier to creation is dropping, and the volume of software being built is expanding.
This isn’t just “more code” — it’s a flywheel.
The expansion flywheel — productivity studies don’t capture this.
More software built → more learning → better patterns → more ambitious projects → more software built. The value isn’t that existing developers ship 10% more PRs. It’s that the total surface area of what gets built expands dramatically — and from that expansion, genuinely great things emerge that wouldn’t have been attempted otherwise.
The expansion has a dark side too: GitGuardian reported an 81% surge in leaked secrets as less experienced developers build without security training [17]. More builders without guardrails means more risk. But the net effect — dramatically more people able to turn ideas into working software — is a structural shift that productivity studies don’t capture.
The Sia Partners analysis frames this as the “vibe coding productivity paradox” [12]: AI systems can generate code at unprecedented speed, but organizational productivity gains remain modest until companies redesign workflows, trust mechanisms, and governance to operate with autonomous AI systems.
Is 10% Actually an Underestimate?
The 10% finding deserves scrutiny from the other direction too. There are reasons to believe it understates the real impact:
The studies measure the wrong thing. PR throughput captures one output channel. If AI helps developers write better-scoped PRs, produce fewer bugs, or spend less time on code review — none of that shows up in PR count. Anthropic’s internal data shows 67% more PRs per engineer [6] — a company that’s optimized its workflows for AI-assisted development. That’s not 10%.
The measurement window captures the learning curve, not the steady state. The METR study went from -19% (early 2025) to +18% (late 2025) — a 37-percentage-point swing in months. Agentic coding tools matured in late 2025. The DX study window (Nov 2024–Feb 2026) is measuring adoption, not proficiency.
Non-coding productivity is invisible. Every study focuses on coding output. But developers spend 50-70% of their time on non-coding work — and AI is increasingly effective there too. If an agent saves 2 hours/day on meeting prep, research, and documentation, that’s a significant productivity gain that no developer study captures.
The average masks the top quartile. The 6x gap between top and bottom quartile users means the best teams are seeing dramatically higher gains. The 10% average is dragged down by organizations that distributed licenses without changing workflows — the equivalent of buying a gym membership and never going.
So is it 10%? For the average team, measuring PR throughput, in early 2026 — yes. For teams that have adapted their workflows, measuring total delivery impact, with mature tools — likely much higher. The question is how fast the average catches up to the top quartile.
What This Means for Leaders
If you’re evaluating AI coding tools for your team:
- Expect 10%, plan for 10%. Budget and ROI calculations based on 2-10x gains will disappoint. 10% is real and valuable — but it’s an incremental improvement, not a transformation.
- Measure delivery, not output. More PRs ≠ more value. Track cycle time, deployment frequency, and change failure rate — not just lines of code or PR count.
- Invest in enablement, not just licenses. The variance between top and bottom quartile users is 6x. Training and process adaptation matter more than tool selection.
- Watch for skill atrophy. Especially for junior developers. The 17% knowledge gap is a leading indicator of future debugging and maintenance problems.
- Look beyond coding. The biggest ROI may be in the non-coding parts of the SDLC — and in knowledge work more broadly.
💬 What’s your team’s actual experience with AI coding tools? Are you seeing 10% or something different?
Sources:
[1] My earlier post on running an SA workday through an AI agent — “The Coding Agent That Doesn’t Code”: schristoph.online
[2] DX — “AI productivity gains are 10%, not 10x” (March 2026): newsletter.getdx.com
[3] METR — “Developer Productivity Experiment Update” (February 2026): metr.org
[4] Google DORA Report 2025 — software delivery instability findings
[5] Multitudes — Developer productivity study (late 2025)
[6] Anthropic — “AI Assistance and Coding Skills” (January 2026): anthropic.com
[7] Harness — “AI Coding Accelerates Development, DevOps Maturity Isn’t Keeping Pace” (March 2026): prnewswire.com
[8] HBR — “AI Doesn’t Reduce Work, It Intensifies It” (February 2026): hbr.org
[9] Research on executive AI productivity perception (March 2026): prnewswire.com
[10] GlobeNewsWire — “AI Helps Low-Performing Engineering Teams 4x More” (March 2026): globenewswire.com
[11] Sam Newman LinkedIn post on DX study (March 2026): linkedin.com
[12] Sia Partners — “Fixing the Vibe Coding Productivity Paradox” (March 2026): sia-partners.com
[13] Turing Post — “From Vibe Coding to Spec-Driven Development” (March 2026): turingpost.substack.com
[14] IBM — “What is Agentic Engineering?” (March 2026): ibm.com
[15] GitHub / InfoQ — “36 million new developers joined GitHub in 2025” (March 2026): infoq.com
[16] SQ Magazine — “80% of low-code platform users come from non-IT backgrounds” (March 2026): sqmagazine.co.uk
[17] GitGuardian — “81% surge in AI-service leaked secrets” (March 2026): citybiz.co