Two developers on the same team get access to the same coding agent.
Same editor. Same model. Same repo access. Same company announcement about how this is going to change productivity.
Six months later, one of them is shipping work they could not have attempted alone. The other is producing more code and creating more review drag than before.
That difference is not explained by tool access. It is not explained by prompt tricks either.
AI agents amplify the developer's judgment loop. If you know how to frame the problem, load the right context, inspect the output, and test the failure modes, the agent gives you leverage. If you mostly accept plausible code, the agent gives you volume.
Volume and value are not the same thing.
The multiplier is not flat
There is enough evidence now to say AI can make developers faster in some settings. GitHub's Copilot research reported that developers using Copilot completed a controlled programming task 55% faster than developers without it.
That is real. It also does not settle the question that matters in production work.
Faster at what? Under whose judgment? With what verification loop? Inside what codebase? With what cost to the next person who has to maintain the result?
The honest read is not "AI makes every developer faster." It is that AI changes the shape of software work. Some tasks get compressed. Some review burden gets moved. Some mistakes become easier to generate at scale.
METR's 2025 study of experienced open-source developers found that, in that specific setting, early-2025 AI tools made participants take 19% longer on their assigned tasks. That does not prove AI makes experienced developers slower. It proves the impact is not a flat multiplier you can paste into a planning spreadsheet.
The leverage depends on the operator and the work.
The agent can write the route. It cannot know whether the route should exist.
Take a simple content API route.
An agent can generate a route that accepts a payload, writes a Markdown file, returns a 200, and passes a smoke test. That might be useful. It might also be dangerously incomplete.
A valuable developer does not stop at "it works." They ask the questions the demo will not answer:
- Is the write idempotent?
- What happens if the client retries after a timeout?
- Are draft, review, approval, and publish states protected?
- Is authorization mandatory on every mutation path?
- Are errors explicit enough for a caller to know whether the write happened?
- Does an update need to trigger downstream effects, cache invalidation, or search indexing?
- Can the route fail half-successfully and leave content in an ambiguous state?
The agent can help implement those decisions. It can also help test them. But it does not know which failures have already cost your team a weekend, a rollback, or a migration cleanup.
That is where the value gap opens.
A weaker workflow says: "Build me the route." Then it reviews whether the route exists.
A stronger workflow says: "Here is the state machine. Here are the invariants. Here are the failure modes I care about. Propose the route, then generate tests that try to break those invariants. Now explain which parts of this patch are risky."
Same agent. Different developer. Different result.
Prompting is the visible part. Judgment is the real part.
Prompting matters, but treating this as a prompt-engineering problem is too shallow.
The useful developer is not just better at phrasing requests. They know what context is missing. They know when the model is solving the wrong problem cleanly. They know when a clever abstraction is going to be a maintenance tax. They know when a passing test is only testing the happy path.
That knowledge shows up in the prompt, but it did not come from the prompt.
It came from building systems, breaking systems, debugging systems, reviewing other people's code, living with previous architectural decisions, and learning what "reasonable" code can hide.
Agents are excellent at reasonable answers. That is why judgment matters more, not less.
AI can raise the floor without raising everyone equally
I do think agents raise the floor.
A junior developer can generate examples, compare patterns, ask for explanations, and get unstuck faster. That is a real advantage. Used well, an agent is a curriculum that reacts to the work in front of you.
The danger is using the agent as an escape hatch from understanding.
A junior developer using AI well asks:
- Why did you choose this approach?
- What are the alternatives?
- What edge cases does this miss?
- What tests would fail if this assumption is wrong?
- Explain this patch like I have to maintain it next month.
A junior developer using AI poorly accepts a patch they cannot explain.
The issue is not junior status. The issue is whether the workflow builds judgment or skips it.
That distinction matters because the old learning path had friction built in. You got stuck. You read the docs. You traced the stack. You made a bad abstraction and paid for it later. You learned taste by colliding with consequences.
Agents remove a lot of that friction. That can accelerate learning, or it can remove the very pressure that would have built the next layer of skill.
The middle gets squeezed
The most uncomfortable group may not be absolute beginners. It may be competent mid-level developers who can ship features but have not yet developed strong engineering taste.
They know enough to move quickly. With agents, they move even faster. But speed can hide a plateau.
If the agent handles the hard parts before the developer understands them, the developer gets output without the learning event. They feel productive because they are productive. But the work that would have pushed them toward senior judgment gets outsourced too early.
That is the squeeze: more code, not necessarily more capability.
The senior developer's advantage is different. A strong senior does not merely ask the agent to produce code. They use it to explore alternatives, pressure-test assumptions, generate adversarial cases, and compress implementation after the direction is clear.
The agent saves time, but the senior's value is not the saved time. It is the quality of the decisions wrapped around that time.
Teams should evaluate the judgment loop, not just the output
This matters for engineering leads because AI-assisted work can make weak signals look strong.
A pull request can be larger. A demo can arrive faster. A backlog can move. None of that tells you whether the developer improved the system or increased future drag.
DORA's 2024 report is useful here because it separates local productivity from system outcomes. The report found AI adoption was associated with gains in individual productivity and flow, while also reporting tradeoffs around software delivery stability and throughput. That is the tension teams need to take seriously.
The question is not "Did AI help this developer produce more?"
The better questions are:
- Did they frame the problem correctly?
- Did they give the agent the right constraints?
- Did they inspect the generated code against the system's invariants?
- Did they catch the failure modes before review?
- Did they make the codebase easier or harder to reason about?
- Can they explain the tradeoffs in the patch without hiding behind the tool?
That is what AI-assisted developer performance should be measured against.
The valuable developer becomes more valuable
AI does not erase the developer skill gap. It makes the judgment gap easier to see.
The developers who compound with agents are not necessarily the ones using the most AI. They are the ones who redirect the saved time into higher-value work: understanding the system, sharpening constraints, testing assumptions, reviewing more carefully, and learning faster from each generated draft.
If you want more value from agents, improve the review loop before you chase a better prompt library.
They do not treat the agent as a replacement for taste. They treat it as a force multiplier for taste.
That is the core shift.
The market does not need developers who can generate the most code. It needs developers who can turn generated code into reliable systems.
Those were already different skills. Agents just made the difference harder to ignore.