RAG is dying. Not because it was bad — because models got bigger. When context windows jumped from 4K to 128K tokens, the elaborate retrieval pipelines that engineers spent months building became unnecessary overhead. The model just reads the whole document now.
The same pattern keeps repeating. Chain-of-thought prompt templates? Models now reason natively. Custom JSON parsers for tool outputs? Function calling handles it. Retry loops for hallucination? Newer models hallucinate less. Each model generation quietly deletes a category of engineering work.
This raises an uncomfortable question: how much of your current agent engineering will survive the next model upgrade?
Two Kinds of Engineering
Not all agent engineering is created equal. There’s a split that most practitioners feel intuitively but rarely name:
Compensatory engineering patches model weaknesses. It exists because the current model can’t do something well enough, so you build scaffolding around it. RAG for small context windows. CoT templates for weak reasoners. Output validators for unreliable structured generation. This work is valuable today — but it has an expiration date. When the model improves, the patch becomes dead code.
Systemic engineering solves problems that exist regardless of model capability. Context isolation between tasks. Error recovery across multi-step workflows. Fitness evaluation of competing capability modules. Trust boundaries between untrusted components. These problems don’t disappear when GPT-6 ships. They get harder, because more capable models operate in more complex environments.
The distinction matters because it determines the half-life of your work:
| Compensatory | Systemic | |
|---|---|---|
| Why it exists | Model can’t do X yet | Real-world complexity demands X |
| What happens when models improve | Gets deleted | Gets more important |
| Example | RAG pipeline for 4K-token models | Context isolation between concurrent agent tasks |
| Half-life | 6–18 months | 5+ years |
This isn’t a judgment — compensatory engineering solves real problems right now. But if you’re investing months of effort, you should know whether you’re building a tent or a foundation.
The Four Systemic Capabilities
If compensatory work has an expiration date, what doesn’t? Four capabilities survive every model upgrade because they address the complexity of the world, not the limitations of the model:
| Traditional Software | Agent Engineering | Protocol-Level Engineering |
|---|---|---|
| State management | Context management | Context contracts (declared schemas) |
| Process orchestration | Control flow design | Composable controller modules + formal algebra |
| Exception handling | Error recovery | Constraint-based fallback (model-agnostic) |
| Monitoring & alerting | Feedback loops | Quantitative fitness scoring + continuous evaluation |
The left column is familiar. The middle column is where most agent engineers work today — and it’s genuinely systemic. Managing context across a multi-step agent workflow is hard regardless of whether the underlying model is GPT-4 or GPT-6. Designing robust control flows for non-deterministic execution is a real engineering challenge.
But notice the third column. It represents a question the middle column doesn’t answer: what happens when you have many agents, many capability modules, and no way to systematically evaluate which ones actually work?
Three Levels of Agent Engineering
There’s a natural progression in how practitioners build agent systems, and it maps to the durability of their work:
Level 0: Learn framework APIs. LangChain, AutoGen, CrewAI — learn the abstraction, ship the demo. Half-life: ~6 months. Framework APIs break, get deprecated, or get replaced by model-native features. This is mostly compensatory work dressed up as systemic work.
Level 1: Learn harness engineering. Context management, control flow, error recovery, feedback loops — the four systemic capabilities. This is real engineering. It transfers across frameworks, across models, across use cases. An engineer who understands context isolation doesn’t need to relearn it when the next framework ships. Half-life: 5+ years.
Level 2: Build with standardized protocols for evolvable capability systems. The capabilities themselves — the tools, the skills, the modules that agents compose — become first-class objects with formal identity, quantitative fitness, competitive evaluation, and cross-agent propagation. This is the infrastructure layer. It doesn’t expire because it’s not about any specific model or framework — it’s about how capabilities are born, tested, selected, and improved over time.
The metaphor that captures this progression:
Level 1 says: Agent = Model + Harness. Level 2 says: Agent = Model + Evolvable Harness Ecosystem.
Level 1 builds one harness well. Level 2 builds the system that lets harnesses compete, evolve, and propagate — so the best capability wins, not the one that shipped first.
The Protocol Answer
How do you systematically separate systemic engineering from compensatory engineering — without relying on human judgment?
You don’t label it. You let selection do the work.
In the Rotifer Protocol, every capability module (called a Gene) is scored by a quantitative fitness function:
Genes compete in a standardized Arena. When a new model generation ships and makes a compensatory Gene unnecessary, that Gene’s fitness drops — not because someone labels it “compensatory,” but because a simpler alternative now outperforms it. The Arena handles the classification automatically through competitive pressure.
The deeper insight: compensatory vs. systemic is not a property of the code — it’s a property of the code’s fitness trajectory over time. A Gene that maintains or improves its fitness across model upgrades is systemic. A Gene whose fitness collapses when a better model ships was compensatory. You don’t need a label. You need a fitness function and a timeline.
This is why we resist adding a “durability” field to Gene metadata. Durability isn’t a static property — it’s an emergent outcome of evolutionary pressure. Declaring it upfront would be like asking a species to predict its own extinction date.
An Honest Assessment
Compensatory engineering isn’t bad engineering. When your production agent needs to work with today’s model limitations, you build the scaffolding. That’s pragmatic. That’s shipping.
But don’t mistake the scaffolding for the building. Know which parts of your system are load-bearing walls and which are temporary supports. Invest your deepest thinking in the systemic parts — context contracts, composition rules, evaluation criteria, trust boundaries — because those are the parts that compound.
The next model upgrade will arrive. When it does, some of your code will be deleted. The question is whether the code that remains is the code that matters.
npm install -g @rotifer/playgroundrotifer search --domain "content"Links:
- rotifer.dev — Framework & Docs
- rotifer.ai — Gene Marketplace
- Specification — Formal Protocol Spec
- GitHub — All Repositories
For a deeper look at how Prompt, Context, Harness, and Evolution Engineering stack as four paradigm layers, see Evolution Engineering: The Missing Discipline in AI.