Three data points from the past quarter tell the same story.
10.3% of applications built with Vibe Coding platforms have critical security vulnerabilities — databases accessible without authentication, API keys exposed, personal financial data unprotected. This came from a scan of 1,645 apps on Lovable; a follow-up audit by security firm Escape across 5,600+ apps found over 2,000 vulnerabilities and 400+ exposed secrets.
cURL — used by virtually every internet-connected device — shut down its vulnerability reporting program. Not for budget reasons. Because AI-generated false reports buried the real ones. Before AI tools, roughly one in six submissions were valid. By late 2025, it was one in thirty.
GitHub shipped new repository settings that let maintainers disable Pull Requests entirely. When the platform builds “close the gate” features, the problem is structural.
Production Cost → Zero, Selection Cost → Unchanged
These are symptoms of one root cause: the cost of producing code collapsed, but the cost of evaluating code stayed the same.
Vibe Coding — generating entire applications from natural language — turned weeks of development into hours. US App Store submissions grew 56% year-over-year. Skill ecosystems reached tens of thousands of entries. Code output is accelerating everywhere.
But security audits still require human expertise. Code review still takes experienced engineers. Architecture assessment still demands judgment that no prompt can shortcut.
When production becomes nearly free and evaluation stays expensive, you get what ecologists call an invasive species explosion — a population boom in the absence of natural predators.
The Ad-Hoc Selection Era
Every platform is inventing its own filter:
| Platform | Response | Mechanism |
|---|---|---|
| Apple | Removed Anything app, froze Replit/Vibecode | Gatekeeper curation |
| cURL | Closed bug bounty | Barrier to entry |
| GitHub | Added “disable PRs” setting | Kill switch |
| Ghostty | Introduced Vouch system | Trust-based reputation |
Each response is rational locally. None is sufficient globally. These are individual organisms evolving independent defenses against a shared environmental pressure — no coordination, no standardization, no ecosystem-level mechanism.
Biology Solved This
Biological reproduction is cheap. A single bacterium produces billions of copies. Most mutations are neutral or harmful. Very few improve fitness.
Biology’s answer was not “review every mutation.” It was automated selection pressure: organisms that don’t survive don’t reproduce. Quality emerged from competition, not inspection.
The parallel to software is structural:
| Biology | Software |
|---|---|
| Cheap reproduction | AI code generation |
| Most mutations harmful | Most AI output mediocre or insecure |
| Manual review impossible at scale | Human code review doesn’t scale |
| Natural selection | ? |
The missing layer isn’t more reviewers. It’s automated selection.
What Automated Selection Requires
Selection needs three components:
A fitness measure. A function that scores a code unit on correctness, performance, resource efficiency, and security — empirically, by running it. Not “does this look right?” but “does this work, and how well?”
A competitive environment. Multiple implementations of the same capability evaluated against standardized inputs. Not benchmarks chosen by the developer. Scenarios defined independently.
Consequences. Low-fitness code gets displaced. High-fitness code gets propagated. The ecosystem improves without anyone manually reviewing every submission.
This is not “AI reviewing AI” — which inherits the generator’s blind spots. This is empirical measurement: run the code, observe the outcomes, compare the results. Selection based on what code does, not what it claims.
The Efficiency Paradox, Explained
A controlled study by METR found that AI tools made experienced developers 19% slower on maintenance tasks in large codebases — while the developers believed they were 20% faster.
The paradox dissolves when you see where the time went: not generating code, but evaluating AI output. Reviewing suggestions. Checking correctness. Debugging hallucinated logic.
Generation was fast. Selection was slow.
If evaluation is the bottleneck, automating evaluation — not just generation — is the higher-leverage intervention.
What Remains Human
Automated selection handles quality within a defined domain: which implementation of a given capability performs best? But it does not handle the question of what to build. Product direction, ethical constraints, creative vision — these are not fitness-measurable and should not be.
Y Combinator’s CEO noted that even startups with 95% AI-generated code have technically deep founding teams. The AI replaced typing, not judgment.
True but imprecise. Judgment decomposes. “Does this code work correctly and securely?” is measurable and automatable. “Should we build this feature?” is not. Conflating them — treating all judgment as equally unautomatable — leads to “hire more reviewers” as the only solution.
Selection pressure is the layer between generation and judgment. It’s the part that can be built now.