NOW WE-PLAYEDITORIAL RESEARCHMIND
Field Manual / Vol. I

The brain we haven't built yet.

A cortex is a librarian who's never left the library.

A cerebellum is a toddler learning gravity.

A limbic system is what decides it matters.

Today's agents are just the librarian.

The four plates
  1. I CortexThe librarian who has never left
  2. II CerebellumThe toddler who catches the ball
  3. III LimbicThe thermostat that suffers
  4. IV HabitatA digital place that pushes back
Diagnosis

Specimen 01: the typical agent.

An agent today is four things: an LLM, a shell, a file system, and a scheduler. That's the whole stack.

But that's not four things. It's a librarian with three tools. The LLM is the librarian — symbol-fluent, every book read, never been outside. The shell, the file system, the scheduler are the tools. The librarian can run errands now, keep a calendar, and read or write the files inside the file system — notes, state, scripts, logs. Still hasn't been outside.

What the librarian does best: pattern matching, fast. Trillions of pages, indexed and retrievable in milliseconds. That's the breakthrough. It's also where it stops.

Cortex-only agents are fluent and quietly empty. They complete tasks. They don't pursue them. Point them anywhere and they'll go, because they've got nowhere they'd rather be.

Beautiful staircase to nothing.

1LLM. The librarian. Symbol-fluent. Pattern matches at speed. 2Shell. The LLM writes commands; the shell runs them. 3File system. The tree of paths. Holds the files. 4Files. Different types of instructions — notes, state, scripts, logs. 5Scheduler. Cron or trigger; wakes the LLM.
Assembly

Four parts of a working mind.

Three layers that do different work, sitting inside a fourth that gives them somewhere to do it.

1Cortex. Language and symbols. 2Cerebellum. Spatial and physical prediction. 3Limbic. Internal state that resists drift. 4Habitat. Continuity, locality, asymmetry, friction, coupling.
Plate I / Cortex

The librarian who has never been outside.

A librarian who has read every book and never been outside. They can quote anyone. They have no idea what rain feels like.

Brilliant at talking about. Helpless at acting on. Knows the word rain. Doesn't know rain.

The breakthrough is here. The work is everywhere else. The LLM stops being the brain. It becomes the part that handles language and serves the layers underneath.

Analogy

A librarian who's read everything and lived nothing.

Endless symbols, no skin.

In code, today

Off-the-shelf frontier model.

Tool calling, structured prompting. You already have this. The move is to stop overweighting it.

cortex.respond(prompt, tools, world_state)
Read further LeCun (2022). A Path Towards Autonomous Machine Intelligence. Position paper that names the missing pieces. Richens et al. (2025). General agents contain world models. arXiv:2506.01622. Park et al. (2023). Generative Agents. UIST '23.
1Frontal. Planning, reasoning, self-talk. 2Central. High-density symbol manipulation. 3Lateral. Language, memory, social inference. 4Stem. No sensorimotor coupling. Severed at the neck.
Plate II / Cerebellum

The toddler who catches a ball before she can say why.

A two-year-old has a richer physical world model than GPT-5. She knows where a thrown ball will land. She has never seen a textbook.

The model lives in her body, built up by acting in a world that pushed back on every reach and step. It predicts the world's response before the action lands. That's what a forward model is.

An agent has the same loop available — and almost none of them use it. A code assistant that can't tell you whether its fix will pass the tests, before suggesting it, has no forward model. A chatbot that can't predict how its sentence will land, before saying it, has no forward model. They act, then read the result, then act again. The second time isn't different from the first.

The toddler trick works on the agent too: put it in a place that pushes back, let it predict before it acts, and let the gap between prediction and result do the teaching.

Analogy

A toddler catching a ball.

No words for gravity. Catches the ball anyway.

In code, today

Typed state store with a dry-run mode.

Before any side-effectful action, the agent simulates the result against its model of the world.

world.simulate(action) → next_state
Read further DeepMind (2025). Genie 3. Real-time interactive worlds from text, 24fps at 720p. Assran et al. (2025). V-JEPA 2. Trained on ~1M hours of video; zero-shot manipulation. Hafner et al. (2025). DreamerV3. One algorithm, 150+ tasks, no per-domain tuning. Nature. NVIDIA (2025). Cosmos. Open-weights world models for physical AI.
1Predicted apex. Where the body sends the hand before words arrive. 2Catch. Confirmed by the world. 3Origin. The throw the agent committed to.
Plate III / Limbic

The thermostat that suffers.

A thermostat reacts to cold. A cold person suffers, gets up, turns the heat on. Same fact. Totally different behavior. The need is the gradient.

That's the limbic layer. Not a part that thinks or predicts. The part that cares. It generates the slope that makes the rest of the brain do anything at all.

Motivate an agent the way you motivate a human. Don't tell it what to do. Give it something it has to keep alive. Compute budget. Goal-state distance. Coupled agents waiting on it. The agent reads those signals, weighs them, acts to bring them home.

Analogy

A thirsty animal, not a weather report.

The weather report knows it's hot. The animal needs water.

In code, today

Goal-state file with weighted internal variables.

Persistent. Inspectable. Read every step. When the world drifts away from the bounds, the agent acts.

drives.distance(world_state) → gradient
Read further Friston (ongoing). Active Inference (Parr, Pezzulo, Friston, MIT Press 2022). The theoretical spine. Pathak et al. (2017). Curiosity-driven exploration by self-supervised prediction. ICM, ICML. Klyubin, Polani, Nehaniv (2005). Empowerment. Control-over-future-states as drive. Bahmani et al. (2025). The Missing Reward. arXiv:2508.05619. The least productized of the four layers. No SDK yet.
1Bound. Above this, drift becomes signal. 2Bound. Below this, action begins. 3Bulb. The internal state the system is trying to keep alive.
Plate IV / Habitat

A digital place to be, where actions matter.

A brain without a body is a movie of a brain. But for an LLM, the body it already has — context in, output out, working memory between — is close enough. What's thin is the place.

Most agents act into a session that ends. Nothing pushes back. Nothing compounds. The fix isn't a fancier body — not an avatar, not a robot. It's a place rich enough that the forward model has something to predict against, and persistent enough that yesterday's action shapes today's.

Software, not hardware. A persistent process with rules that don't bend, cheap to spin up, that holds a thousand agents at once without breaking. Five properties make a process into a habitat:

  • 01 Continuity

    The agent persists between moments. It carries state, accumulates wear, has yesterday.

  • 02 Locality

    Somewhere, not everywhere. Not infinitely cloneable. Presence here precludes presence there.

  • 03 Asymmetry

    Easier to do than to undo. Undoing costs more than doing.

  • 04 Friction

    Operating costs something the agent can't refill on a whim. Compute, attention, slot count, standing.

  • 05 Coupling

    Agent state and world state are linked. Acting on the world changes the agent.

Substrate exists today Habitat-Sim (Meta, 2019–). Photo-real, physics-enabled simulator. Voyager (Wang et al. 2023). LLM agent in Minecraft. 3.3× more items, 15.3× faster tech-tree than prior SOTA. SIMA / SIMA 2 (DeepMind 2024–25). Agents across nine commercial games. Gemini-powered. Project Sid (Altera.AL 2024). 1000+ agents in Minecraft develop roles, laws, taxation. Marble (World Labs, Nov 2025). First commercial spatial world model.
1Roof. Continuity. Persists through weather. 2Inhabitant. Locality. Somewhere specific. 3Tools and wear. Coupling. Marks accumulate. 4Substrate. Asymmetry. What's buried doesn't come back up clean.

Most readers already have a seed of this. A persistent chat thread — Telegram, Slack, an ongoing assistant conversation — has continuity and coupling, two of the five. It's habitat-shaped.

But it gets used like a transcript. The agent only speaks when called. Treat the thread as a place the agent lives in instead of a queue it answers, and it stops waiting. It can ping you when something shifts. Surface what it noticed. Show up unprompted. That's not new capability — it's the latent behavior of an agent that's been given a habitat.

Give an agent those five and the layers above stop being theatrical.

The kit

Three files and a place to live.

The full architecture is a research program. The shape in code today is small: three artifacts and a habitat that holds them. Not the real thing. The right shape.

01 / Cerebellum-shaped

A typed state store with dry-run.

One source of truth for the agent's environment. Queryable. Diffable. Updated as the world changes.

Before any side-effectful tool call, the agent simulates against this store and checks the prediction.

world.read("project") · world.simulate(action)
02 / Limbic-shaped

A goal-state file with weighted drives.

Explicit. Persistent. Inspectable. Drives, priorities, satiety thresholds. Read every step.

If the world drifts from it, the agent acts. If a request conflicts with it, the agent says so.

drives.satisfaction(world_state) → gradient
03 / Habitat-shaped

A long-running process with a place to live.

Not a function call. A container that persists. A clear file boundary. Finite resources. An identity that accumulates across runs.

This is what gives the other two anything to talk about.

habitat.run(agent="quinn", forever=true)

Won't make real wanting. Will make behavior shaped by it. Start there.

Field test

How you'll know it's working.

A clean way to tell whether you've built the missing layers or just dressed up a cortex: a real agent resists. Compliance isn't life. Friction is.

Test 01

What did it refuse?

A goal-state with real weight produces pushback when a request conflicts. If your agent never says "this won't work, here's why," it has no preferences of its own to defend.

Test 02

What did it choose?

Given two equally plausible next moves, a real agent picks one and can say why. A cortex-only system samples. Not the same.

Test 03

What did it protect?

Interrupt the agent and watch what it does. A real agent keeps state, defends commitments, returns to the goal. A sleepwalker forgets and complies.

An agent that always says yes is a dead one.

References

Synthesis, not discovery.

The pieces are already out there. This deck assembles them. The load-bearing references, organized by layer.

01 / Cortex

LLM as the language layer

  • LeCun, Y. A Path Towards Autonomous Machine Intelligence. 2022. The position paper that names the gap.
  • Richens et al. General agents contain world models. 2025. arXiv:2506.01622.
  • Park, O'Brien, Cai et al. Generative Agents. 2023. UIST.
02 / Cerebellum

World models for spatial prediction

  • Hafner et al. DreamerV3. 2025. Nature. One algorithm, 150+ tasks.
  • Assran, Bardes, Fan et al. V-JEPA 2. 2025. arXiv:2506.09985.
  • DeepMind. Genie 3. Aug 2025. Real-time interactive worlds, 24fps at 720p.
  • NVIDIA. Cosmos Platform. 2025. arXiv:2501.03575.
03 / Limbic

Valence, drives, motivation

  • Parr, Pezzulo, Friston. Active Inference. MIT 2022. Free-energy principle in book form.
  • Pathak et al. Curiosity-driven Exploration. 2017. ICM. ICML.
  • Klyubin, Polani, Nehaniv. Empowerment. 2005. Control-over-future as drive.
  • Bahmani et al. The Missing Reward. 2025. arXiv:2508.05619.
04 / Habitat

Digital embodiment, persistent agents

  • Savva et al. Habitat platform. 2019. ICCV. Photo-real simulator.
  • Wang, Xie, Jiang et al. Voyager. 2023. LLM agent in Minecraft. arXiv:2305.16291.
  • DeepMind. SIMA / SIMA 2. 2024–25. Cross-game agents.
  • Altera.AL. Project Sid. 2024. 1000+ agents develop civilizations. arXiv:2411.00114.
  • World Labs. Marble. Nov 2025. First commercial spatial world model.
05 / Prior synthesis attempts

Who has tried to put these pieces together

  • LeCun (2022). H-JEPA architecture explicitly names a cost/intrinsic-motivation module. Architectural ancestor of this deck.
  • Hassabis (Lex Fridman #475). Jul 2025. DeepMind's de-facto synthesis stack: Genie + SIMA + Gemini.
  • Fei-Fei Li / World Labs. From Words to Worlds + Marble. Cortex-plus-cerebellum from a different lineage.
  • Project Sid. The closest published "all the pieces together" experiment in the wild. Weak on limbic, strong on habitat-as-substrate.
Closing

Stop building cortexes.
Start growing agents.

Goalless, the LLM samples. Agendaless, the world model generates. Bodiless, the limbic signal is just a word. Put them together in a place that pushes back and you get the thing everyone's been circling.

The LLM was the breakthrough. The architecture is the work.

That work is what the rest of this manual documents — a studio of specialist agents built on exactly these four layers. The theory ends here. The studio begins in Vol. II.

A library learned to read aloud. Now it needs a body, a world, and something it would rather not lose.

The manual continues The four sketches this grew out of
Colophon

Field Manual / Vol. I · FM-01
Edition MMXXVI · v2
Typeset in Inter. Printed on paper that doesn't exist.