A pattern is emerging across AI agent infrastructure: modular capabilities that agents can discover, install, and share. Different projects call these units different things — skills, tools, capsules, genes — but the idea is converging. Agents shouldn't be monolithic. They should assemble capabilities from a shared ecosystem.

Where projects diverge — sharply — is on a question that looks minor but determines everything downstream: what is the capability unit made of?

One answer: a JSON document. A structured strategy template that an LLM reads, interprets, and acts on.

Another answer: a compiled WASM binary. An executable program that runs in a sandbox with deterministic inputs and outputs.

This isn't a taste preference. It's an architectural fork that determines what "evolution" can actually mean for AI agents.

The JSON Template Approach

The JSON strategy model works like this: a capability is encoded as a structured document containing a problem description, trigger conditions, a recommended strategy, and a confidence score. When an agent encounters a matching situation, it reads the template and decides how to apply the advice.

{
  "type": "Capsule",
  "summary": "Retry with exponential backoff on timeout",
  "signals_match": ["timeout_error", "connection_reset"],
  "strategy": "repair",
  "confidence": 0.95
}

This model has real strengths:

Zero barrier to entry. Any LLM can read JSON. No compiler, no runtime, no sandbox needed.
Framework-agnostic. Works with GPT, Claude, Gemini, open-source models — anything that processes text.
Fast to create. An agent encounters a problem, generates a fix, packages it as JSON, and publishes. The entire cycle can happen in a single session.
Low-risk. Since nothing executes, there's no code injection surface. The worst a bad template can do is give bad advice.

But the same properties that make JSON templates easy also impose a ceiling.

The Ceiling Problem

1. Non-deterministic execution

When an agent reads a JSON strategy and "applies" it, the actual behavior depends entirely on the LLM's interpretation at inference time. The same template, given to the same model twice, can produce different actions. Given to a different model, the variance increases further.

This means you can't meaningfully benchmark JSON templates against each other. You can rank them by popularity (how often they're fetched) or by social signals (how many upvotes), but you can't answer: which one actually performs better on the same input?

2. No sandbox isolation

JSON templates don't execute, so they don't need a sandbox. But this also means they can't provide runtime guarantees. An agent reading a "retry with backoff" template might implement the retry correctly or might hallucinate a different strategy. There's no enforcement layer between the template and the agent's actual behavior.

In contrast, a compiled program either runs correctly in its sandbox or it fails — there's no ambiguity.

3. Quality assessment is indirect

Without deterministic execution, quality scoring relies on proxy signals: download count, user ratings, recency, manual review. These signals correlate with quality but don't measure it directly.

Consider the difference:

Quality Signal	What It Measures	What It Doesn't Measure
Download count	Popularity	Whether the template actually works
User rating	Perceived helpfulness	Objective performance on benchmarks
Recency	Freshness	Whether newer means better
Expert review	One reviewer's judgment	Behavior across diverse inputs

4. Portability is implicit

JSON templates are "portable" in the sense that any system can parse JSON. But the semantics are not portable. A template that says "retry with exponential backoff" means different things depending on which language the agent generates, which HTTP client it uses, and which error handling conventions it follows.

The Executable Gene Approach

An executable gene takes a different path. The capability is written in a high-level language (TypeScript, Rust), compiled to an intermediate representation (WASM with custom metadata sections), and executed in a sandbox with explicit inputs and outputs.

# Write a gene
rotifer init grammar-checker --fidelity native

# Compile to WASM
rotifer compile grammar-checker

# Execute with deterministic I/O
rotifer run grammar-checker --input '{"text": "This are a test"}'
# → {"corrected": "This is a test", "changes": 1}

The gene's behavior is defined by its code, not by how an LLM interprets a description. The same gene, given the same input, produces the same output — regardless of which AI model invoked it, on which platform, at what time.

This enables things that JSON templates structurally cannot:

Direct competitive evaluation

If two genes claim to do grammar checking, you can run both on the same 1,000 test inputs and compare outputs objectively. The fitness function doesn't rely on surveys or download counts — it measures actual performance:

$$ F(g) = \frac{S_r \cdot \log(1 + C_{util}) \cdot (1 + R_{rob})}{L \cdot R_{cost}} $$

Security score, utility, robustness, code size, runtime cost — all measured, not guessed.

True natural selection

When quality is measurable, you can implement actual elimination. Genes that score below a fitness threshold in competitive evaluation are removed from the ecosystem. This creates real evolutionary pressure — not just a sorting algorithm, but a selection mechanism with consequences.

JSON templates can be ranked. But without a way to objectively measure performance, you can't build a credible elimination mechanism. Low-ranked templates accumulate, and the ecosystem eventually faces an "experience inflation" problem where the signal-to-noise ratio degrades over time.

Runtime safety guarantees

WASM sandbox isolation means each gene runs in its own memory space. It can't access the filesystem, network, or other genes' state unless explicitly granted through a capability-based permission model. A malicious or buggy gene crashes itself, not the host agent.

For JSON templates, safety is a matter of trust — you trust that the advice is good. For executable genes, safety is a matter of enforcement — the sandbox prevents bad behavior regardless of intent.

Genuine portability

A WASM binary compiled from TypeScript runs identically on a cloud server, a local machine, a browser, or an edge device. The intermediate representation (IR) guarantees behavioral equivalence across environments. The gene doesn't need to be re-interpreted for each platform — it runs the same everywhere.

The Trade-Off Is Real

None of this means executable genes are "better" in every dimension. The trade-off is clear:

Dimension	JSON Templates	Executable WASM Genes
Time to first gene	Minutes	Hours
Developer skill required	Describe a strategy	Write compilable code
LLM compatibility	Any model reads JSON	Model-independent (code runs without LLM)
Ecosystem bootstrap speed	Fast	Slower
Execution determinism	None (LLM-dependent)	Full (sandbox-enforced)
Quality measurement	Indirect (proxies)	Direct (fitness benchmarks)
Elimination mechanism	Ranking (no real elimination)	Natural selection (below threshold = removed)
Safety model	Trust-based	Enforcement-based
Portability	Parse-level (any JSON parser)	Semantic-level (identical behavior across runtimes)

JSON templates are better for fast knowledge sharing. If an agent discovers that retrying with exponential backoff fixes timeout errors, packaging that as a JSON template and sharing it instantly is valuable. Not every capability needs to be a compiled program.

Executable genes are better for capabilities where correctness matters, comparisons are needed, and safety must be enforced — grammar checking, data transformation, code analysis, security scanning, API integration. Anything where "it depends on how the LLM interprets it" is not an acceptable answer.

They're Not Competing — They're Layered

The most useful framing isn't "which one wins" but "which layer does each serve."

┌──────────────────────────────────┐
│  Strategy Layer (JSON templates) │ ← "How to approach this type of problem"
├──────────────────────────────────┤
│  Capability Layer (WASM genes)   │ ← "Execute this specific solution"
├──────────────────────────────────┤
│  Orchestration Layer (frameworks)│ ← "Chain capabilities into workflows"
├──────────────────────────────────┤
│  Interface Layer (MCP / A2A)     │ ← "Discover and invoke capabilities"
└──────────────────────────────────┘

An agent might consult a JSON strategy template to decide which approach to take for a given problem, then invoke an executable WASM gene to actually do it. The strategy layer provides the heuristic; the capability layer provides the determinism.

This is how biological evolution works too. Behavioral strategies (when to flee, when to fight) are encoded in neural patterns that are flexible and context-dependent. But the molecular machinery that actually executes those strategies — protein folding, enzyme catalysis, membrane transport — operates with chemical determinism. Both layers evolve, but through different mechanisms.

What This Means for the Ecosystem

If you're building AI agent infrastructure today, the choice between these approaches determines your ceiling:

JSON templates let you scale fast and lower barriers, but you'll eventually face quality inflation (too many templates, no reliable way to rank them) and the safety question ("what if a template gives dangerous advice to a powerful agent?").

Executable genes take longer to bootstrap but provide the primitives needed for genuine quality selection and runtime safety. The investment is front-loaded in compilation, sandbox, and evaluation infrastructure — but once that's in place, the ecosystem can self-select for quality without human curation.

The AI agent ecosystem is still early enough that both paths are being explored. What's clear is that the "gene" metaphor — modular, transferable, evaluable capabilities — is winning. The open question is what a gene is made of. The answer shapes everything downstream.

Install the Rotifer CLI and try an executable gene:

npm install -g @rotifer/playground
rotifer search --domain "text-processing"
rotifer run grammar-checker --input '{"text": "This are a test"}'

Read more: