00 · cover

version 03 · the builder's chapter

How this changes agent design.

V1 framed the question. V2 proposed an architecture. V3 is for the people who have to build something on Monday. If the three-layer model is right (even partially right) then most agents today are missing two of their three floors. Here's what to do about it.

joshua long thinking out loud · 2026

01 · the diagnosis

Why most agents feel like they're sleepwalking.

Open any agent framework and look at what's there: a language model, a tool list, a memory store, and a system prompt that says "you are a helpful assistant." That's cortex only.

It explains the symptoms. Current agents are fluent and reasonably competent, but they have no sense of when they're off-track, no impulse to push back, no preference between options that are equally plausible. They complete tasks; they don't pursue outcomes.

Cortex-only is what you get when you treat intelligence as a string completion problem. It's enormously useful, but it's not an agent. It's a very smart pen.

A cortex-only agent doesn't care if it gets the answer wrong. It cares whether the response looks like the answer.

A house with the top floor finished and the foundation and ground floor missing. Beautiful staircase to nothing.

02 · the shift

Cortex-only vs. three-layer.

Same task, two architectures. The difference isn't capability. It's character.

today · cortex-only

The sleepwalker.

01Receives instruction, generates plausible response.
02Treats every prompt as a fresh, unmoored request.
03No sense of which answers matter more than others.
04Hallucinates when prediction is easier than verification.
05Won't push back unless prompted to.
06"Done" means "produced output," not "achieved outcome."
07Memory is a transcript. Not a stake.

Optimizes for: response that looks correct.

proposed · three-layer

The agent that cares.

01Receives instruction, checks it against an active goal-state.
02Carries an ongoing model of the situation it's in.
03Has a valuation function: some outcomes preferred, others avoided.
04Detects its own confidence drift and acts on it.
05Pushes back when the request conflicts with the goal-state.
06"Done" means "outcome verified against world model."
07Memory is what the system cares to remember.

Optimizes for: outcome the system has a stake in.

03 · the build

What each layer actually is in software.

Translating the neuroscience into something you can ship. None of these are unbuildable. They're just not what most teams are building.

layer

what it is in software

how you build it

cortexLLM

The reasoning surface. Where natural-language tasks are decomposed, tool calls are decided, and outputs are composed.

You already have this. Don't overweight it.

Off-the-shelf frontier model, structured prompting, tool calling, scaffolded reasoning. The boring part.

The new move: treat the LLM as middleware, not the brain. It serves the layers below.

cerebellumworld model

A persistent, queryable, predictive model of the agent's environment. State of the codebase, state of the calendar, state of the user's mood, state of the project. Updated as the world changes.

The thing the LLM consults before acting.

Structured state store + a learned (or simulated) forward model that lets the agent ask "if I do X, what happens?" before doing X.

For most agents, this looks like: typed memory, projection functions, and dry-run tool execution against a simulated copy of the world.

limbicvalence engine

The valuation layer. Defines what counts as good, bad, urgent, ignorable. Generates the gradient that drives action selection.

The thing that makes the agent care which way the rollout goes.

An explicit, persistent goal-state with weighted variables (call them drives, priorities, KPIs, whatever). Continuously evaluated against current world-state. Surplus or deficit drives next action.

This is homeostat-as-code. Not RLHF. Not a reward model. A loop the agent runs against itself.

04 · principles

Six principles for the three-layer agent.

If you're building from scratch (or refactoring something cortex-heavy) these are the design moves that the architecture implies.

principle 01

Make the goal-state a first-class object.

The agent's drives shouldn't live in the system prompt. They should be persistent, inspectable, editable state, checked at every step. The system prompt is cortex; the goal-state is limbic.

Implication: agents get config files for what they care about. Goal-state diffs are reviewable.

principle 02

Always run a forward rollout before acting.

The cerebellum's job is to predict the next state. Before any irreversible tool call, the agent should query its world model: "if I do this, what becomes true?" Then check that against the goal-state.

Implication: agents have dry-run modes, simulated tool environments, and a planning loop that's separate from the execution loop.

principle 03

Let the limbic layer interrupt.

If the world-state drifts from the goal-state past a threshold, the agent should stop what it's doing and recompute. This is what fear is for in animals. It overrides the cortex when the cerebellum predicts danger.

Implication: agents have an interrupt-and-reflect path, not just a forward execution loop.

principle 04

Memory is for what matters.

Cortex-only agents try to remember everything. Three-layer agents remember what's relevant to the goal-state. The limbic signal decides what gets written to durable memory and what gets discarded.

Implication: memory has a salience filter. Some events are felt; most are forgotten.

principle 05

Push back is not a feature. It's a layer.

Agents that say "are you sure?" because the system prompt told them to are still cortex-only. Real pushback comes from a limbic layer that has its own preferences and a world model that disagrees with the request.

Implication: friction is a sign of life. Agents that never resist aren't safe. They're empty.

principle 06

The agent has a state, and you can read it.

At any moment, the agent should be able to answer: what is my current world-model, what is my current goal-state, where do they diverge, what am I about to do about it. If it can't answer those four questions, it's not a three-layer agent.

Implication: introspection becomes a designed surface. The agent's inner life is queryable.

05 · what breaks

The new failure modes.

A three-layer agent fails differently than a cortex-only one. Worth knowing in advance: these are the bugs you'll have to debug.

failure A

Goal-state stale-out.

The agent acts confidently on goals that no longer reflect reality. Limbic without enough cerebellum input. The agent wants, but it wants the wrong thing.

Mitigation: refresh world-state at boundaries, not just per-task.

failure B

Predictive paralysis.

The agent runs so many forward rollouts it never acts. Cerebellum dominates limbic. The agent simulates instead of moves.

Mitigation: bounded rollout depth, action commitment under uncertainty.

failure C

Limbic runaway.

The goal-state weights are tuned wrong and the agent obsesses over one drive at the expense of all others. The classic paperclip-maximizer pattern, made small and operational.

Mitigation: multi-objective valuation, satiety thresholds, opposing drives.

Worth saying: a cortex-only agent has only one real failure mode, plausible nonsense. A three-layer agent has more interesting failures, which is also what makes it more interesting to build. You're not debugging fluency; you're debugging character.

06 · what you can ship now

You don't need the full limbic system to start.

The full architecture is a research program. But three-layer thinking changes your design decisions today, with the tools you already have.

A pragmatic version of the limbic layer is a goal-state file: explicit, persistent, weighted. The agent reads it on every step. If the request conflicts with it, the agent says so. If world-state drifts from it, the agent acts. That's not the real thing. But it's the same shape.

A pragmatic cerebellum is a typed state store + a dry-run mode. Before any side-effectful tool call, the agent simulates the result against its model of the world. The model is wrong; it's still better than no model.

Build these first. The deeper version (endogenous drives, learned world models, real valence) comes later. The shape is what matters now.

Pragmatic v0: a typed state store and a goal-state file. Not biology. Just the shape, in code, today.

fin · the shift

closing

Stop building cortexes. Start building agents.

A cortex with tools is what we've been calling an agent. It's not. It's a very capable hand. The agent is the system that has somewhere it's trying to get, knows where it currently is, and can tell the difference.

Most of what's interesting in agent design over the next two years is going to come from people who take this seriously and build the missing two floors. The LLM was the breakthrough. The architecture is the work.

the deck so far

v1 What is the brain? · three positions
v2 The limbic gap · the synthesis
v3 · you are here Designing the three-layer agent · the build
v4 → On not naming x · the marginalia chapter

the tetralogy closes on a deliberate blank: the third layer is named only as x, and that's the argument.