Field Manual / Vol. VII

Predict,
then react.

The studio's agents began guessing outcomes before they happen — copy guessing its own verdict, the art director guessing the operator's reaction. Each guess is scored. That loop is the studio's north star.

Building it meant running the studio for real. The run refused — and so had every run for the previous ninety minutes. The studio had been silently down and looked completely normal.

The gate that should have caught it was green the whole time. It had only ever been looking at a third of the code.

Predict before you react. And never trust a gate that only watches part of the system.

The seven plates

IForward model, livePredicting before reacting
IIWho predicts nowThree guessers, two deferred
IIIThe silent hourAn outage that looked like normal behavior
IVThe blind gateA safety check that saw a third of the code
VBand-aid vs. real fixTwo fixes, one root cause
VIWhat it taught usFour lessons
VIILedgerWhat shipped, what's deferred

Plate I / Forward model, live

The lab walks out the door.

A world model is the ability to predict the consequences of your own actions — Yann LeCun's definition. Predict-the-gate is the studio's version: an agent guesses an outcome before it happens, then gets scored on whether it was right. Guess, check, sharpen. Over time the guesses get better — and a guesser that gets the world right is an agent that has learned its world.

This is the studio's north star: agents that grow, and grow visibly. Most learning hides in files. A prediction that gets scored is learning you can watch on a scoreboard.

Lineage Vol. VI, Plate IV, Loop 05 named the forward model and marked it "E-002 — designed for the live loop, not yet deployed." This dispatch is E-002 leaving the lab. The first guesser: Felix has been forecasting Zara's SHIP / REVISE verdict on we-play pieces before she rules — record-only by default. The self-gate (acting on the guess) is earned at 75% accuracy over 20 calls, and is not yet switched on. The question this session: who else should run a world model?

Plate II / Who predicts now

Three guessers.

A prediction has to earn the right to change behavior. Until it does, every tier is record-only — the agent guesses, the studio scores, nobody acts on the guess yet.

Felix → Zara · LIVE

Will the art director ship it?

The original guesser. Forecasts Zara's SHIP / REVISE verdict on a piece before she rules. Record-only; the self-gate is earned at 75% over 20 calls, not yet active.

Declan → Zara · NEW

Will the writing clear the visual gate?

The copy director now guesses Zara's verdict from the writing's point of view. A second vantage on the same call — the word-maker predicting how the eye-maker will rule.

Zara → operator · NEW

How will the operator react?

The art director guesses the operator's reaction after she ships, scored against the rating he later gives. Her own rule: decide first, then guess. Guessing first would bend her judgment toward people-pleasing.

Deter · DEFERRED

Will this pass the rules?

A QA forward model. Designed this session, not yet built.

Quinn · DEFERRED

Orchestration-level guess.

Backtested offline in E-001 (Vol. IV). A live version is still ahead.

The scoreboard

Record → earn → gate.

Every guess is logged and scored. A tier only graduates from record-only to acting on its own prediction once it clears the accuracy floor. No agent self-gates today.

Each tier is a different agent predicting a different gate. Same shape: decide, guess, get scored, sharpen.

Plate III / The silent hour

It had been down for ninety minutes.

To watch the new guessers actually build something, we ran a real piece on the production VM — unpublished, just to see the chain fire. It refused. So had every run for the previous ~1.5 hours.

The cause: a recent change had deleted a block of code but left one line still calling it. That line lived in the part of the studio that only runs live — so it slipped past every check and only broke when the studio actually ran.

Why nobody noticed: a refusal looks exactly like the studio choosing not to act. There was no crash, no red light — just a studio that declined, over and over, the way a careful studio sometimes does. It looked completely normal.

Silent failure is the dangerous kind. The studio failing looked identical to the studio deciding not to ship.

Plate IV / The blind gate

Green, because blind.

How does a dangling call to deleted code clear every gate? Because the gate wasn't looking.

The automated type-check — the safety net that exists to catch exactly this — only ever inspected about a third of the codebase. It checked 178 files and zero of the runtime files the studio actually executes. The broken line lived in the unchecked two-thirds, so it passed every gate at review time and only failed once it ran live.

type-check coverage (before) · 178 files inspected · 0 runtime files · ~33% of the tree

The gate had been green for as long as it had been blind. A passing check that doesn't look at the code that runs isn't safety — it's the appearance of it.

Plate V / Band-aid vs. real fix

Stop the bleeding ≠ fix it.

There were two fixes. The first stopped the bleeding but contradicted the system's design. The second cost more and was the right one.

01 · Band-aid

Restore the deleted code.

The studio came back up in minutes. But the hotfix had resurrected exactly the code the earlier change had intentionally deleted — undoing a deliberate decision instead of honoring it. Right result, wrong reason.

02 · Real fix

Honor the design, then close the root cause.

Rewrite the call the way the architecture intended. Then widen the type-check to cover the entire runtime, and clear the 46 latent issues that surfaced once it could finally see them. That gate would have caught the original bug at review time — and now catches the whole class of bug going forward.

The fast patch undid a deliberate deletion. The real fix required understanding why the code was deleted in the first place — and fixing the gate that let the break ship.

Plate VI / What it taught us

Four lessons.

A feature shipped and an outage resolved on the same day. The instructive part is what each one taught.

01 · Coverage

A blind green gate is worse than a red one.

A check that watches only part of the system reads as safety while guaranteeing none. This gate was green for precisely as long as it was blind.

02 · Failure shape

Silent failure is the dangerous kind.

The studio failing looked identical to the studio choosing not to act. Build alarms for absence, not just for errors.

03 · Repair

Band-aid is not fix.

The fast patch undid a deliberate deletion. The real fix honored the design and removed the cause. Speed of recovery and correctness of recovery are different goals.

04 · Method

Ask the agents.

We asked each agent what scoreboard fit it rather than imposing one. Two of four pushed back and redefined the proposal — a small proof the agents hold real, differentiated points of view.

Plate VII / Ledger

Where it stands.

✓Shipped
Root-cause hardening: type-check widened from ~33% → 100% of the tree; 46 latent issues cleared; 0 remaining. Merged and deployed. Studio healthy on the corrected code.
→Ready
Declan + Zara forward-model tiers — integrated, 21 tests passing, type-check clean. PR #236, awaiting the operator's go.
✓Resolved
Outage: ~1.5h of silent refusals → diagnosed, hotfixed, then properly fixed.
·Deferred
Deter and Quinn forward-model tiers. Designed; not yet built.
5PRs
Five pull requests across the session.

Guess. Check.
Sharpen.

The forward model left the lab today. Three agents now guess an outcome before it happens and get scored on the guess. None of them acts on it yet — a prediction has to earn the right to gate. But the loop is running, and it's the loop the whole studio is being built around: agents that grow, visibly.

It shipped on the same day an outage showed how a system goes dark without anyone noticing — a deleted block, a line still calling it, and a safety gate that was green precisely because it was blind. We widened the gate until it could see the whole runtime, and cleared everything it found.

Two threads, one lesson. Prediction and verification are the same discipline pointed in opposite directions: one guesses what will happen, the other refuses to trust what it can't see. The studio needs both.

Predict before you react. Verify the whole system, not the part you can see.

Companion volumes

Colophon

Field Manual / Vol. VII · FM-07 · field dispatch
One day in the studio. Sourced from runtime evidence, 2026-06-01.
Typeset in Inter. Printed on paper that doesn't exist.