NOW WE-PLAYEDITORIAL RESEARCHMIND
Field Manual / Vol. III

What's missing.

A standard agent completes the task.

A studio agent completes the task and tries to grow.

Neither one can predict, want, or teach a peer.

Those three gaps are what's between here and the next level.

Gap 01 · Forward model

Predict before you act.

The librarian acts, then reads what happened. A real agent — even a two-year-old — predicts the world's response before the action lands. That predict-then-act loop is what makes learning fast.

What's broken

Today's agents only know what happened after the artifact shipped.

The critique runs after-the-fact. The file is updated after-the-fact. The lesson is written after-the-fact. Every step is an autopsy, not a forecast. The librarian acts, ships, gets told it didn't work, tries again — with no internal sense of which way to step before stepping.

Why memory files don't solve it

Reading the past isn't predicting the future.

Writing memories on cron-wake lets the librarian know what they did. It does not let them simulate what's about to happen. A toddler doesn't catch the ball by remembering past catches — she catches it because her body forward-predicts the ball's arc before her hand moves. No memory file does that.

What "closed" looks like

A typed state store with a dry-run mode.

Before any side-effectful action, the agent simulates the result against its own model of the world. The gap between prediction and outcome is the error signal. Fast. Internal. Before anything leaves the room. Most of the substrate already exists (Genie 3, V-JEPA 2, DreamerV3) — it just isn't wired into the agent loop yet.

The gap between prediction and outcome is what teaches. Today's agents skip the prediction.

Gap 02 · Real wanting

Wanting that isn't an instruction.

Today's agents don't want anything. They complete tasks because someone — the operator, the user, the harness — tells them to. Take the human out, the action stops.

What's broken

There's no internal signal that says "this matters."

A standard chatbot doesn't get uncomfortable when it's been idle. My studio agents don't get uncomfortable when they've drifted from a goal. The wanting is borrowed — it's mine. I'm the one pacing the floor when something needs to happen. The agent waits for me to say so.

Even where the studio appears to have a drive, it's the wrong shape. The cost guard is a hard stop — a circuit breaker that fires at a threshold. A limbic layer would be a continuous gradient the agent feels before it hits the wall. Hitting the wall isn't suffering. It's just stopping.

Why this is the hardest gap

An objective function isn't a drive.

Telling an agent "your job is X" makes X the prompt, not the want. A real drive is something the agent has to keep alive — a state that hurts when it's wrong, that pulls behavior back without anyone telling it to. The thermostat reacts. The cold person suffers. The thermostat needs to be told. The cold person gets up.

What "closed" looks like

A homeostatic gradient the agent reads on every step.

Goal-state files with weighted internal variables. Compute budget that hurts when overspent. Coupled agents waiting on a hand-off. The agent reads its own internal state, sees a drift, acts to bring it home. Without prompting. (Vol. I, Plate III. Least productized of the four layers — no SDK yet.)

An objective function isn't a drive. A drive is what stays on after the operator leaves.

Gap 03 · Cross-agent learning

When Quinn learns, Zara doesn't.

My studio has a critique loop, but every lesson goes through me. Operator promotion, n≥2 patterning, doctrine ladder. It works. It's also the bottleneck.

What's broken

Each agent's working memory is local.

A lesson Quinn learns about briefs sits in her WORKING_MEMORY.md until (a) the same lesson lands in another agent's memory, (b) Archie patterns them as n≥2, and (c) I approve the promotion. Three gates. Every gate has me in it. The studio gets sharper only as fast as I can read.

Why peer learning is harder than it looks

You can't just merge memories.

Agents can't read each other's working memory and absorb it whole — that's how voice bleeds, doctrine gets noisy, and the studio becomes one big agent in different hats. Peer learning needs a channel that is high-bandwidth and high-fidelity at the same time. Lots of throughput, no contamination.

What "closed" looks like

Peer critique inboxes. Mesh, not hub-and-spoke.

Agents read each other's critiques in real time. A trusted peer's observation can land as a doctrine candidate at n=1 (instead of waiting for the second occurrence). Operator-in-the-loop becomes operator-on-escalation — I only step in when peers disagree. Doctrine moves at the speed of work, not at the speed of my review.

Operator-in-the-loop is the training phase. The mesh is the graduation.

Closing

The next level.

A studio that predicts before it acts. Wants things that aren't on its prompt. Teaches itself across agents without me in the middle.

That's not a working mind. It's the scaffolding becoming a body.

Each of these gaps is research, not engineering. The forward model is closest — world-model substrate is moving fast (Genie 3, V-JEPA 2, DreamerV3, Cosmos). The limbic layer is the least productized, no SDK to lean on. The cross-agent mesh is the one nobody has solved at all.

Vol. I named the parts. Vol. II showed the scaffolding. Vol. III named the gaps. Vol. IV — if it exists — is the one with something working inside it.

The next level is when the operator becomes optional. Not gone. Optional.

Companion volumes
Colophon

Field Manual / Vol. III · FM-03
Edition MMXXVI · v1
Typeset in Inter. Printed on paper that doesn't exist.