← Back to Blog

When Code Generation Costs Zero

AI made code generation nearly free. But quality selection stayed expensive. The result: a structural crisis with only fragmented solutions.

When Code Generation Costs Zero

Three data points from the past quarter tell the same story.

10.3% of applications built with Vibe Coding platforms have critical security vulnerabilities — databases accessible without authentication, API keys exposed, personal financial data unprotected. This came from a scan of 1,645 apps on Lovable; a follow-up audit by security firm Escape across 5,600+ apps found over 2,000 vulnerabilities and 400+ exposed secrets.

cURL — used by virtually every internet-connected device — shut down its vulnerability reporting program. Not for budget reasons. Because AI-generated false reports buried the real ones. Before AI tools, roughly one in six submissions were valid. By late 2025, it was one in thirty.

GitHub shipped new repository settings that let maintainers disable Pull Requests entirely. When the platform builds “close the gate” features, the problem is structural.


Production Cost → Zero, Selection Cost → Unchanged

These are symptoms of one root cause: the cost of producing code collapsed, but the cost of evaluating code stayed the same.

Vibe Coding — generating entire applications from natural language — turned weeks of development into hours. US App Store submissions grew 56% year-over-year. Skill ecosystems reached tens of thousands of entries. Code output is accelerating everywhere.

But security audits still require human expertise. Code review still takes experienced engineers. Architecture assessment still demands judgment that no prompt can shortcut.

When production becomes nearly free and evaluation stays expensive, you get what ecologists call an invasive species explosion — a population boom in the absence of natural predators.


The Ad-Hoc Selection Era

Every platform is inventing its own filter:

PlatformResponseMechanism
AppleRemoved Anything app, froze Replit/VibecodeGatekeeper curation
cURLClosed bug bountyBarrier to entry
GitHubAdded “disable PRs” settingKill switch
GhosttyIntroduced Vouch systemTrust-based reputation

Each response is rational locally. None is sufficient globally. These are individual organisms evolving independent defenses against a shared environmental pressure — no coordination, no standardization, no ecosystem-level mechanism.


Biology Solved This

Biological reproduction is cheap. A single bacterium produces billions of copies. Most mutations are neutral or harmful. Very few improve fitness.

Biology’s answer was not “review every mutation.” It was automated selection pressure: organisms that don’t survive don’t reproduce. Quality emerged from competition, not inspection.

The parallel to software is structural:

BiologySoftware
Cheap reproductionAI code generation
Most mutations harmfulMost AI output mediocre or insecure
Manual review impossible at scaleHuman code review doesn’t scale
Natural selection?

The missing layer isn’t more reviewers. It’s automated selection.


What Automated Selection Requires

Selection needs three components:

A fitness measure. A function that scores a code unit on correctness, performance, resource efficiency, and security — empirically, by running it. Not “does this look right?” but “does this work, and how well?”

A competitive environment. Multiple implementations of the same capability evaluated against standardized inputs. Not benchmarks chosen by the developer. Scenarios defined independently.

Consequences. Low-fitness code gets displaced. High-fitness code gets propagated. The ecosystem improves without anyone manually reviewing every submission.

This is not “AI reviewing AI” — which inherits the generator’s blind spots. This is empirical measurement: run the code, observe the outcomes, compare the results. Selection based on what code does, not what it claims.


The Efficiency Paradox, Explained

A controlled study by METR found that AI tools made experienced developers 19% slower on maintenance tasks in large codebases — while the developers believed they were 20% faster.

The paradox dissolves when you see where the time went: not generating code, but evaluating AI output. Reviewing suggestions. Checking correctness. Debugging hallucinated logic.

Generation was fast. Selection was slow.

If evaluation is the bottleneck, automating evaluation — not just generation — is the higher-leverage intervention.


What Remains Human

Automated selection handles quality within a defined domain: which implementation of a given capability performs best? But it does not handle the question of what to build. Product direction, ethical constraints, creative vision — these are not fitness-measurable and should not be.

Y Combinator’s CEO noted that even startups with 95% AI-generated code have technically deep founding teams. The AI replaced typing, not judgment.

True but imprecise. Judgment decomposes. “Does this code work correctly and securely?” is measurable and automatable. “Should we build this feature?” is not. Conflating them — treating all judgment as equally unautomatable — leads to “hire more reviewers” as the only solution.

Selection pressure is the layer between generation and judgment. It’s the part that can be built now.