Is it really "the end of software engineering"?

No, software engineering is not ending

A paper landed on my feed titled "The End of Software Engineering." I opened it ready to roll my eyes.

My answer to its question is no. Software engineering is not ending. It is moving, from writing the decision logic by hand to specifying, orchestrating, and verifying the systems that generate it.

Here is the one number that settles it. On isolated coding tasks, AI agents score above 80 percent. On the continuous, real-world evolution of a codebase, they fall to at most 38 percent.

IMPORTANT

That gap, above 80 percent down to 38 percent, is the whole story. Agents are real as augmentation. They are not yet real as autonomy.

The paper's own title change gives away the real story

The best evidence is the paper's revision history. The author got there before I did.

Version 1, arXiv:2606.05608v1 dated 4 June 2026, was "The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm."

Version 2, arXiv:2606.05608v2 dated 10 June 2026, became "Agentic Software: How AI Agents Are Restructuring the Software Paradigm." It dropped the two words doing all the dramatic work: "End" and "Fundamentally."

Same author. Same evidence. Same formal models. Softer claim.

The conclusion moved too. Version 1 ended "The old software engineering is ending; the new one has already begun." Version 2 ended "The old software engineering is not ending; it is growing into something larger."

NOTE

When a paper walks back its strongest framing within a week, the revision is not noise. It is a signal about how much the author trusted the headline once it was on the page.

The useful idea: capability outlasts code

Strip the drama and one genuinely useful idea remains. The durable asset is shifting from the code to the capability that writes it.

In traditional software, code carries the decisions. A human decides what the system should do, writes that logic down, and the system runs it. Every change means finding the right rule and editing it by hand.

An agent works the other way. The model is the reasoning engine, and code becomes scaffolding. The agent generates code to solve the task in front of it, runs it, and throws it away. What persists is the agent's capability, not the code it emitted along the way.

That is the shift worth keeping. Not "agents replace software," but "code stops being the thing you bank on."

If you want the formal version, the paper offers it. Traditional software is a triple:

$S = (C, D, E)$

$C$ is compute, $D$ is the fixed decision rules in the source code, and $E$ is the runtime. The load-bearing word is fixed: $D$ is set before any input arrives.

An agent is a different bundle:

$A = (M_{\text{LLM}},\ T,\ M_{\text{mem}},\ \Pi)$

The model $M_{\text{LLM}}$ is the reasoning engine, $T$ is a set of callable tools, $M_{\text{mem}}$ is a memory subsystem, and $\Pi$ is a planner. It runs as a loop: pick an action from the current state and memory, execute it, observe the new state, repeat.

TIP

Keep one sentence and you have the paper's spine: The durable asset stops being the code and becomes the agent that writes it.

The evidence: strong in isolation, weak in sustained work

The argument is only as good as its numbers, and the paper brings two that matter.

The first is solid. On SWE-bench Verified, a benchmark of real GitHub issues, an open model, Lingma SWE-GPT 72B, resolves 30.20 percent, against GPT-4o at 31.80 percent. That is genuine progress on scoped, isolated tasks.

The second is where the paper is at its most honest, and it is the part the title buried. A benchmark called EvoClaw (Deng et al., arXiv:2603.13428) tests agents on continuous evolution: sustained work across a commit history, where errors pile up and every change has to preserve what already worked.

The result is a cliff.

xychart-beta
    title "EvoClaw: isolated tasks vs continuous evolution"
    x-axis ["Isolated tasks", "Continuous evolution"]
    y-axis "Success rate (%)" 0 --> 100
    bar [82, 38]

Show Mermaid source

xychart-beta
    title "EvoClaw: isolated tasks vs continuous evolution"
    x-axis ["Isolated tasks", "Continuous evolution"]
    y-axis "Success rate (%)" 0 --> 100
    bar [82, 38]

Why does continuous work break agents? The paper names four reasons, and every engineer will recognize them:

Context drift, as the codebase grows past what the agent can hold in view.
Error propagation, where one early mistake compounds downstream.
Technical-debt blindness, where the agent optimizes for finishing, not for living with the result.
Verification gaps, where the agent passes the tests while quietly shipping a semantic bug.

CAUTION

The paper also reports softer numbers, like a pilot with a "93 percent reduction in root-cause time." I take those as the paper reports them, not as settled fact. The hard benchmark, not the vendor anecdote, is what the cliff rests on.

Where the paper says this is heading

Given that cliff, the paper sketches a four-stage roadmap from today's assistants to fully autonomous ecosystems. It is a useful map, as long as you read the dates as forecasts rather than facts.

Stage	Era	Agent capability	Human role	Representative systems
I. Tool-augmented	2023-2025	Code completion, single-issue fixes	Author and reviewer	GitHub Copilot, Claude Code
II. Single-task autonomous	2025-2027	End-to-end feature building and debugging	Intent architect and auditor	Devin, OpenHands
III. Multi-agent teams	2026-2029	Coordinated swarms managing the full lifecycle	PM, architect, and auditor	LangChain orchestration, MetaGPT
IV. Self-evolving ecosystems	2028+	Autonomous discovery, learning, and adaptation	Goal setter and ethics governor	AGI assistants (prospective)

Stage I is where we actually live. Agents are strong assistants on well-scoped tasks, but the human still decomposes the problem, designs the architecture, and checks the result.

Stages II and III are plausible extrapolations, not finished products. Devin and OpenHands show agents can own a scoped task end to end, and multi-agent setups mirror how human teams split work across a product manager, an architect, and a reviewer.

WARNING

Notice where the bold capability sits: stage IV, "self-evolving ecosystems," is dated 2028 and beyond. That is the part doing the rhetorical heavy lifting, and it is exactly the part with no evidence behind it yet. The EvoClaw cliff is precisely why the jump from stage I to stage IV is not a straight line.

What this means for engineers and managers

If even half of the agentic framing holds, two things change for teams and one does not.

First, specification quality becomes a first-class skill. When the agent generates the logic, your leverage moves to how precisely you can state the goal, the constraints, and what "good" looks like. Vague tickets used to produce slow humans. Now they produce confident, wrong agents.

Second, evaluation and observability stop being optional. You cannot supervise a system whose reasoning you cannot see. Tracing an agent's steps and catching the test-passing bug are now platform concerns, sitting where logging and monitoring landed a decade ago.

What does not change yet is the need for humans on sustained, multi-commit work. The 80-to-38 gap says so plainly. Architecture, quality calibration, and governance stay ours.

That is still engineering. It just sits higher up the stack than typing the logic by hand.

How to read claims like this

This paper is a good reminder to separate three things that usually get blended.

There are verified facts, like the SWE-bench numbers, which trace to a primary source. There are forecasts, like roadmaps that put "self-evolving ecosystems" in 2028 and beyond, which are arguments about the future, not evidence about the present. And there are vendor-flavored claims, like a self-patching agent, which deserve a raised eyebrow until someone independent checks them.

It also helps to read the provenance. This is a single-author position paper from an author whose affiliation is an investment firm, not a software-research lab. None of that makes it wrong. It just tells you to weigh the framing as a sharp opinion, not as consensus.

The work is moving, not disappearing

So, is it really the end of software engineering? No. The author answered that himself, six days later, by changing "is ending" to "is not ending; it is growing into something larger."

The work is not disappearing. It is relocating. Less of it is writing every rule by hand. More of it is specifying intent, orchestrating the systems that generate the rules, and verifying what they produce.

The numbers say the same thing. Above 80 percent on isolated tasks, 38 percent on continuous evolution. Agents are real as augmentation, not yet as autonomy.

The paper made one genuinely honest move, and it was not a claim or a benchmark. It was the title. The strong version did not survive contact with its own author. That is usually the tell.