Every major agent framework ships a plugin system. Install a plugin, get a capability. It’s clean, it’s modular, and it works — until you have 200 plugins claiming to do the same thing and no way to tell which one is actually good.

We spent time studying ElizaOS (formerly ai16z/eliza), one of the most mature and widely-adopted Web3 agent frameworks. Not to attack it — ElizaOS has a thriving ecosystem and real production usage — but to understand a deeper question: what can a plugin architecture structurally do, and where does it hit a ceiling?

The answer reveals something fundamental about how we should think about agent capabilities.

How ElizaOS Organizes Capabilities

ElizaOS uses a clean four-part extension model:

Component	Role
Actions	What the agent can do — executable behaviors selected by the LLM at runtime
Providers	What the agent can see — context data injected before each model call
Evaluators	What the agent learns — post-response processors that extract facts and track goals
Services	What the agent connects to — long-running background processes

A Plugin bundles one or more of these components into a self-contained package. The AgentRuntime loads plugins at startup, registers their components, and the agent is ready.

This is well-engineered. The separation of concerns is clean. The plugin interface is simple enough that community contributions scale. ElizaOS has 30+ official plugins covering everything from Discord integration to Solana DeFi.

But the architecture has a structural assumption baked in: the developer decides what’s good.

The Selection Problem

When a message arrives, ElizaOS presents all registered Actions to the LLM. The LLM reads each action’s name, description, and examples, then picks one to execute. This is LLM-intuition selection — the model’s judgment determines which capability gets used.

This works fine when you have 5 actions and they do obviously different things. It breaks down when you have 50 actions in the same domain — say, five different “search the web” plugins, each with a slightly different approach.

Which one returns more accurate results? Which one handles edge cases better? Which one costs less? The LLM doesn’t know. It picks based on description text and few-shot examples, not on measured performance.

There’s no fitness function. No benchmark. No historical performance data. The LLM is choosing blindly among capabilities it has never evaluated.

Compare this with how biological systems solve the same problem: natural selection. Organisms don’t choose which genes to use based on descriptions. Genes compete in the environment, and the ones that produce better outcomes propagate. The selection mechanism is baked into the system, not delegated to an external judge.

Six Structural Gaps

Studying ElizaOS’s architecture alongside the Rotifer Protocol reveals six capabilities that a plugin model structurally cannot provide:

1. No Fitness Evaluation

Plugins are either installed or not. There is no quantitative measure of how well a plugin performs relative to alternatives. If two plugins implement “summarize text,” the only selection signal is the developer’s judgment or the LLM’s guess.

Rotifer’s Arena runs genes through standardized benchmarks and computes F(g) — a multiplicative fitness score combining success rate, utilization, robustness, latency, and cost. The multiplicative structure means any single zero (zero security, zero reliability) produces zero fitness overall, regardless of other dimensions. Quality is measured, not assumed.

2. No Sandbox Isolation

ElizaOS plugins run in the same process space as the AgentRuntime. A plugin has access to the runtime’s full memory, all other plugins’ data, and the host system. A malicious or buggy plugin can compromise the entire agent.

Rotifer genes execute in WASM sandboxes with memory isolation and API boundaries controlled by the Binding layer. A gene cannot access another gene’s memory, cannot make unauthorized network calls, and cannot escape its sandbox.

3. No Cross-Environment Portability

An ElizaOS plugin is bound to the ElizaOS runtime. If you want the same capability in a different framework, you rewrite it. There’s no intermediate representation, no compilation target, no formal compatibility negotiation.

Rotifer genes compile to WASM IR with custom sections (metadata, schema, phenotype). Before execution, the runtime runs negotiate(R_ir, C_binding) — a formal compatibility check between the gene’s requirements and the binding’s capabilities. A gene written for local execution can be verified for cloud or on-chain compatibility without modification.

4. No Propagation Mechanism

In ElizaOS, plugins don’t move between agents. If Agent A discovers a great plugin configuration, Agent B benefits only if a human manually installs the same plugin. There’s no automated mechanism for good capabilities to spread.

Rotifer implements Horizontal Logic Transfer (HLT) — inspired by the biological mechanism that kept bdelloid rotifers alive for 40 million years without sexual reproduction. High-fitness genes propagate across the network proportional to their fitness score. Good capabilities spread automatically; bad ones don’t.

5. No Constitutional Layer

ElizaOS has no immutable constraint layer. Any behavior can be overridden by a later-registered plugin. If a plugin registers a service with the same serviceType as an existing one, it silently replaces it. There are no rules that cannot be broken.

Rotifer’s L0 Kernel defines constitutional constraints that no gene, no agent, and no evolutionary process can modify. The rules of the game don’t change even as the players evolve. This mirrors fundamental physical laws in biological evolution — gravity doesn’t evolve, but everything subject to gravity does.

6. No Collective Defense

When a malicious plugin is discovered in ElizaOS, only the agents whose maintainers see the advisory are protected. There’s no automated threat broadcasting, no collective memory of bad actors.

Rotifer’s L4 Collective Immunity layer enables threat fingerprints detected by one agent to propagate across the network, protecting agents that haven’t yet encountered the threat. This is the computational analog of the immune system’s memory B cells.

What This Means

These six gaps aren’t bugs. They’re architectural consequences of a fundamental design choice: install vs evolve.

The plugin model assumes a curated world — someone (the developer, the community, the marketplace) selects which capabilities are available, and the agent uses what it’s given. This works when the curator has good judgment and the option space is small.

The gene model assumes a competitive world — capabilities prove their worth through measured performance, and the system automatically amplifies what works and attenuates what doesn’t. This works when the option space is large and human curation can’t keep up.

Dimension	Plugin Model (Install)	Gene Model (Evolve)
Selection	Developer choice + LLM intuition	Fitness-based natural selection
Quality signal	Stars, downloads, recency	F(g) benchmark score
Security	Trust the developer	WASM sandbox + V(g) safety score
Portability	Runtime-specific	WASM IR + capability negotiation
Improvement	Manual updates	Arena competition + HLT propagation
Defense	Manual advisories	Collective Immunity broadcast

Neither model is universally better. If you have 10 well-tested plugins maintained by a trusted team, the install model is simpler and perfectly adequate. The gene model’s overhead only pays off when the ecosystem gets large enough that human curation becomes a bottleneck — when you need the system itself to distinguish quality.

The Uncomfortable Question

Plugin architectures have been the dominant pattern for agent extensibility since 2023. They’re familiar, well-understood, and they work at small scale.

But AI agent ecosystems are growing fast. The number of available capabilities is doubling every few months. At some point — and many ecosystems are already there — the number of options exceeds any curator’s ability to evaluate them.

When that happens, the question isn’t “which plugin should I install?” It’s “how does my agent figure out what’s good on its own?”

That’s the question the gene model is designed to answer. Not by replacing plugins with something fancier, but by adding the one thing plugins structurally can’t have: selection pressure.

Try it: npm i -g @rotifer/playground · rotifer.dev · Docs