Opinion April 2026 · 5 min read

Code Is Easy Now. Intent Isn't.

Written by Anass R. · Sr. QA Automation Engineer · DoQALand

What happens when writing code is easy but knowing what to write is not?

An architect's drafting table viewed from above: blueprints, a triangle ruler, a graphite pencil, and rolled scrolls frame an empty parchment sheet in the center.

First, what changed?

For decades, the hard problem in software engineering was simple to describe: writing code correctly, at speed, at scale.

Senior engineers were expensive because they carried something rare — judgment. They knew which corner to cut and which to never touch. They knew “we don’t do it that way here” without being told.

Then AI coding agents arrived. Tools like GitHub Copilot, Claude Code, and OpenAI Codex can now generate thousands of lines of working code in minutes. The execution bottleneck largely dissolved.

But the complexity didn’t disappear. It moved.

Where did the complexity go?

Before: how do we write this correctly?
Now: how do we specify this precisely enough that an agent writes it correctly — and how do we know when it didn’t?

That second question is structurally harder. And most teams haven’t caught up to it yet.

When a human developer writes a bug, the bug is in the code. Visible. Traceable. Fixable. When an AI agent misunderstands your intent, something different happens. The agent executes perfectly — against the wrong target. The tests pass. Everything looks green.

Until it doesn’t.

A bug in code is visible. A bug in your specification is silent. That asymmetry is the core challenge of the AI era.

What is a “harness”?

You hire a brilliant contractor. Fast, technically excellent. But they don’t know your codebase or what “good” means in your context. What saves you isn’t micromanaging them — it’s the environment you put them in: your onboarding docs, your review process, your team norms, your CI pipeline.

That environment is the harness.

In AI agent terms, the harness is everything that shapes the agent’s behavior except the model itself.¹ It has two sides:

Guides (before the agent acts) — rules, conventions, and context fed upfront. An AGENTS.md with your coding standards. Architecture descriptions. Instructions on how to write tests for your system. These increase the probability the agent gets it right the first time.¹

Sensors (after the agent acts) — automated checks that catch problems before they reach a human. Linters, type checkers, structural tests, AI review agents. These come in two flavors: computational (fast, deterministic — a test either passes or fails) and inferential (AI checking AI — slower but able to catch semantic issues no static tool can).¹

Together they form a feedback loop. Without guides, the agent repeats the same mistakes. Without sensors, nobody knows if the rules are working.¹

What sensors actually check for: code quality (easiest — existing tooling handles most of it), architecture fitness (harder, but enforceable with structural tests), and functional behavior (the unsolved one — most teams just run AI-generated tests on AI-generated code and hope for the best).¹

OpenAI proved it works. And also proved how hard it is.

Five months. Three to seven engineers. One million lines of code. Zero written by humans.² Roughly 1,500 merged PRs. An estimated 10x productivity gain.²

But early progress was slower than expected — not because the AI was incapable, but because the environment was underspecified.² The agents didn’t have the context to act toward high-level goals.

Their conclusion: “Our most difficult challenges now center on designing environments, feedback loops, and control systems.”²

Not the model. The harness.

They also spent 20% of every week cleaning up AI-generated drift manually — until they automated that too with background cleanup agents. The problem didn’t go away. It just got managed.²

This is now an organizational problem

At team level, harness engineering is hard. At org level, it becomes a different problem.

Three things break at scale: standardization vs. autonomy (a top-down harness gets ignored; a bottom-up one fragments), ownership (when an agent follows all the rules and still ships broken behavior — who is responsible?), and drift (teams fork templates, fix locally, never contribute back — six months later you have fifteen variants of one standard).

This creates a role that doesn’t exist yet in most organizations: someone who treats the harness as shared infrastructure, measures its coverage, and translates organizational standards into things an agent can actually act on.

The uncomfortable truth: most of us are stuck in the middle

Most companies using AI coding tools today are paying the cost of both worlds simultaneously.

We’re generating code with agents. We’re testing it manually. We’re catching things in review that sensors should have caught. We’re writing AGENTS.md files on instinct. We’re moving faster — but without proportionally more confidence.

We don’t have the systems maturity of the old world. But we’re not in the new world either. That’s the transition tax.

The productivity gain is real. But the confidence is borrowed. And most teams won’t invest in the harness until something breaks badly enough to force it.

Why this might not be easier than before

Specification is unbounded — the harness is not a configuration, it’s an ongoing practice. Failures are invisible — a harness producing false confidence accumulates quietly until it doesn’t. We have almost no tooling — 70 years of civilization built around catching execution errors; for specification errors we have markdown files and instinct. And the skill doesn’t exist yet — there are no senior practitioners, no curriculum, no recognized expertise.

We are all, in a real sense, juniors at this.

The bottleneck shifted from execution to specification. From writing to describing. From code to context.

That is closer to systems thinking than programming. And it is historically where QA has always lived — at the intersection of intent and execution, asking the hardest question: does it actually do what we meant?

QA has always lived at the intersection of intent and execution. That intersection just got dramatically more important.

References

Birgitta Böckeler — Harness Engineering for Coding Agent Users, martinfowler.com, April 2026. martinfowler.com/articles/harness-engineering.html
OpenAI — Harness Engineering: Leveraging Codex in an Agent-First World, openai.com, February 2026. openai.com/index/harness-engineering