Three positions on world models, language models, and what we actually mean when we say "intelligence." A deck for thinking, not concluding.
But a brain doesn't just describe the world. It predicts it. It runs forward simulations in milliseconds: what happens if I drop this cup, what will she say next, where will the ball land.
Language models don't do that. They predict the next token in a stream of text. That's a kind of prediction, but it's not the kind that lets you catch a ball.
An LLM predicts the next word. A world model predicts the next state. Catching a ball is the latter problem.
Each position is a serious bet held by serious people. Each implies a totally different architecture, a different research program, a different kind of company.
Language is a thin film on top of physical cognition. To get general intelligence, you need to model the world directly. LLMs are a dead end.
→ LeCun, Fei-Fei Li
position BThe brain has subsystems. Language is one of them. The world model is the substrate; LLMs handle the symbolic, social, and abstract layer.
→ DeepMind-flavored
position CNeither models cognition. The real brain is something we haven't built yet: possibly active inference, possibly embodiment, possibly something stranger.
→ Friston, embodied AI
Imagine a child. Before she can speak, she already knows: objects exist when hidden, dropped things fall, faces have feelings. Language is built on top of that, not the other way around.
In this view, LLMs are a magic trick. They imitate intelligence by pattern-matching the linguistic exhaust of intelligent beings, but they have no model of what the words refer to. Scale won't fix that. The fix is architectural. You have to start over with systems that learn from video, action, and consequence.
An LLM is a librarian who has read everything but never been outside. A world model is a toddler who has been outside but can barely speak. The bet is that the toddler grows up faster.
Language sits on top of a deep physical substrate. Build the substrate first; language emerges as a thin compression layer.
The human brain isn't one thing. It's the visual cortex, the motor cortex, the language areas, the hippocampus, specialists that gossip. So why would AI be one architecture?
In this view, the world model handles physics and spatial reasoning. The LLM handles symbolic manipulation, social reasoning, planning in language. Memory systems handle persistence. A controller routes between them. Intelligence is the orchestra, not any one instrument.
A film studio. The world model is the cinematographer (knows space, light, motion). The LLM is the screenwriter (knows story, dialogue, intent). Neither makes the movie alone.
Multiple specialists, a router, gossip between them. The brain as system, not single substrate.
What if both camps are arguing about the wrong thing? What if intelligence isn't prediction at all. What if it's action under uncertainty, a system constantly minimizing its own surprise?
This is the Karl Friston view, the active-inference view, the embodied-cognition view. The brain isn't a model that gets called by a controller. The brain is the controller, and the model exists only to keep the organism alive. Without a body, without stakes, without a goal that matters, the model is just a movie.
Asking whether the brain is an LLM or a world model is like asking whether a fire is the fuel or the oxygen. Neither. Fire is the reaction. Intelligence is the loop, not the parts.
Active inference: cognition isn't prediction stored somewhere. It's the ongoing loop of acting to make the world match the model, and updating the model when it doesn't.
You don't have to pick. But the three positions imply totally different things about what to build, who to bet on, and what 2027 looks like.
Expect a hard ceiling on chat-style AI within 18 months. The action moves to robotics, simulation, and spatially-grounded media. Bet: LeCun, World Labs, NVIDIA Cosmos.
The platform isn't a model, it's an orchestration layer. Anthropic, Google, and the agent-frameworks crowd are well-positioned. World models become a tool the LLM calls.
Neither path produces AGI; both produce extremely useful systems. The brain-equivalent comes from somewhere weirder: embodied agents, biological hybrids, or an idea no one's funded yet.
If position C has anything to it, the missing piece isn't intelligence. It's stakes. An LLM with no goals doesn't reason; it samples. A world model with no agenda doesn't simulate anything in particular; it generates.
The interesting frontier might not be bigger models. It might be: what's the smallest thing that wants something?
continue → v2 The limbic gap A three-layer architecture: cortex / cerebellum / limbic. The synthesis answer to the question this deck left open.things to read next