Fully autonomous coding agents are gaining traction. You will increasingly hear claims like “this AI wrote 100x more lines of code than a human engineer in the same time.” While those numbers may sound impressive, they remind me of an older debate in software development.

Lines of code (LOC) were never a great metric for evaluating engineers. More code does not always mean better software. If anything, the opposite is often true.

I have long believed this:

Every new line of code adds to your tech debt. Every line of code removed brings peace of mind.

That is the lens I use when thinking about coding agents today.

Why LOC is the wrong benchmark

Comparing a coding agent to a human engineer based on LOC is like comparing writers based on word count. It says nothing about:

  • What the code does
  • How maintainable it is
  • Whether it solves the right problem
  • How it fits into the larger system

Yes, an AI might produce more lines of code faster. But is that really what we want?

What good engineers actually do

In my experience, the best engineers are not code churners. They optimize for understanding, not just output. And they share a few traits:

  • Context: They see the bigger picture including the user, the product, and the long-term impact of their choices. Their context carries across projects, teams, and domains.
  • Elegance: Their solutions are not just correct, they are clean. Not over-engineered. Not clever for the sake of it. Just simple and timeless.
  • Learning: They absorb lessons from past projects. They pick up new concepts when needed. And they grow by adapting to real-world feedback.

Applying the same lens to coding agents

If coding agents are to move beyond novelty and become real engineering partners, they must be evaluated on similar dimensions.

1. Context

Current models can handle context windows of over a million tokens. That is impressive. But compare that to a human with five years of domain knowledge and shared project history.

The goal should not just be expanding the context window. It should be about intelligently managing and compressing context. Surfacing what matters and skipping what does not.

2. Elegance

This is where agents still struggle. When faced with complex requirements, it mostly produces overly elaborate or over-engineered solutions. Essentially the coding agents mirror the complexity of the ask, it is trying to prove a point that it’s a good assistant, “show-off”, instead of contradicting the ask.

3. Learning

If I have discussed a pattern or a decision just a few days ago, I do not want to repeat myself. Good engineers remember decisions. They do not solve the same problem twice. They don’t make me repeat myself.

Agents need a way to remember and build upon prior context across sessions and prompts. Not just retrieve snippets, but learn, reason, and reuse.


So, are coding agents useful?

Absolutely.

But if we expect them to replace good engineers, then the evaluation criteria need to shift. Speed and LOC are not enough. We need to ask:

  • Did the agent understand the problem?
  • Was the solution clean and maintainable?
  • Did it improve over time with feedback (not just based on the models pre/post-training data)?

Until then, agents are useful tools. Not yet my team members.