00 · cover
a question of architecture

If the LLM isn't the brain, what is?

Three positions on world models, language models, and what we actually mean when we say "intelligence." A deck for thinking, not concluding.

brain? LLM WORLD MODEL ?
joshua long thinking out loud · 2026
01 · the question

We've been calling the LLM "the brain."

But a brain doesn't just describe the world. It predicts it. It runs forward simulations in milliseconds: what happens if I drop this cup, what will she say next, where will the ball land.

Language models don't do that. They predict the next token in a stream of text. That's a kind of prediction, but it's not the kind that lets you catch a ball.

"The reason humans can plan novel actions is that we carry internal models of how the world works." Yann LeCun, paraphrased from his 2022 position paper
LLM WORLD MODEL next token the cup will ? linguistic continuation next state physical simulation predicts language DIMENSION: 1D SUBSTRATE: TEXT TIME: SEQUENTIAL predicts states DIMENSION: 3D+T SUBSTRATE: PIXELS/PHYS TIME: CONTINUOUS

An LLM predicts the next word. A world model predicts the next state. Catching a ball is the latter problem.

02 · three positions

Three ways to reassemble the brain.

Each position is a serious bet held by serious people. Each implies a totally different architecture, a different research program, a different kind of company.

position A · world model replaces LLM

The physics-first brain.

Imagine a child. Before she can speak, she already knows: objects exist when hidden, dropped things fall, faces have feelings. Language is built on top of that, not the other way around.

In this view, LLMs are a magic trick. They imitate intelligence by pattern-matching the linguistic exhaust of intelligent beings, but they have no model of what the words refer to. Scale won't fix that. The fix is architectural. You have to start over with systems that learn from video, action, and consequence.

  • 01The infant argument. A two-year-old has a richer world model than GPT-5, on a fraction of the data.
  • 02The grounding problem. Words point at things. Without a model of the things, the words are unmoored.
  • 03The plateau. LLM gains are decelerating. Reasoning benchmarks improve via scaffolding, not raw capability.
analogy

An LLM is a librarian who has read everything but never been outside. A world model is a toddler who has been outside but can barely speak. The bet is that the toddler grows up faster.

POSITION A · STACK language (thin) abstract reasoning world model physics · objects · agents space · time · causation sensory experience · embodiment cognition flows up from the world, not from the words

Language sits on top of a deep physical substrate. Build the substrate first; language emerges as a thin compression layer.

position B · LLM is a module

The federated brain.

The human brain isn't one thing. It's the visual cortex, the motor cortex, the language areas, the hippocampus, specialists that gossip. So why would AI be one architecture?

In this view, the world model handles physics and spatial reasoning. The LLM handles symbolic manipulation, social reasoning, planning in language. Memory systems handle persistence. A controller routes between them. Intelligence is the orchestra, not any one instrument.

  • 01Neuroscience parallels. Brains are modular. Lesion studies show language can be damaged without losing spatial reasoning, and vice versa.
  • 02It's already happening. Agentic systems today are LLMs calling tools: vision models, code interpreters, search. The pattern works.
  • 03The orchestra wins. Most engineering progress comes from composing strong specialists, not building one giant generalist.
analogy

A film studio. The world model is the cinematographer (knows space, light, motion). The LLM is the screenwriter (knows story, dialogue, intent). Neither makes the movie alone.

POSITION B · ORCHESTRA controller ROUTER · PLANNER world model PHYSICS language model SYMBOLS memory EPISODIC perception SENSORY intelligence is the routing, not the parts

Multiple specialists, a router, gossip between them. The brain as system, not single substrate.

position C · both are partial

The missing primitive.

What if both camps are arguing about the wrong thing? What if intelligence isn't prediction at all. What if it's action under uncertainty, a system constantly minimizing its own surprise?

This is the Karl Friston view, the active-inference view, the embodied-cognition view. The brain isn't a model that gets called by a controller. The brain is the controller, and the model exists only to keep the organism alive. Without a body, without stakes, without a goal that matters, the model is just a movie.

  • 01No model without a goal. A weather model predicting nothing for nobody is just math. The model needs to do something.
  • 02Embodiment is load-bearing. Hands, fatigue, mortality, attention. Intelligence in animals evolved to survive, and that pressure shapes the architecture.
  • 03The frame problem. Knowing what to predict is harder than predicting. That requires desire, salience, point of view.
analogy

Asking whether the brain is an LLM or a world model is like asking whether a fire is the fuel or the oxygen. Neither. Fire is the reaction. Intelligence is the loop, not the parts.

POSITION C · THE LOOP agent PREDICTS · DESIRES world RESISTS · SURPRISES action CHANGE WORLD perception UPDATE SELF minimize surprise the brain is what closes the loop

Active inference: cognition isn't prediction stored somewhere. It's the ongoing loop of acting to make the world match the model, and updating the model when it doesn't.

03 · so what

The useful takeaway.

You don't have to pick. But the three positions imply totally different things about what to build, who to bet on, and what 2027 looks like.

if A is right

LLMs are a tarpit

Expect a hard ceiling on chat-style AI within 18 months. The action moves to robotics, simulation, and spatially-grounded media. Bet: LeCun, World Labs, NVIDIA Cosmos.

if B is right

Composition wins

The platform isn't a model, it's an orchestration layer. Anthropic, Google, and the agent-frameworks crowd are well-positioned. World models become a tool the LLM calls.

if C is right

We're early

Neither path produces AGI; both produce extremely useful systems. The brain-equivalent comes from somewhere weirder: embodied agents, biological hybrids, or an idea no one's funded yet.

The most honest position is probably: language models are the cortex, world models are the cerebellum, and we have not yet built the thing that wants anything.
fin · open questions
questions worth holding

What does an AI want?

If position C has anything to it, the missing piece isn't intelligence. It's stakes. An LLM with no goals doesn't reason; it samples. A world model with no agenda doesn't simulate anything in particular; it generates.

The interesting frontier might not be bigger models. It might be: what's the smallest thing that wants something?

continue → v2 The limbic gap A three-layer architecture: cortex / cerebellum / limbic. The synthesis answer to the question this deck left open.

things to read next

  • LeCun · A Path Towards Autonomous Machine Intelligence (2022)
  • Fei-Fei Li · From Words to Worlds (Substack, 2025)
  • Ha & Schmidhuber · World Models (2018)
  • Friston · The free-energy principle
  • DeepMind · Genie 3 technical report (2025)