The Shape of the Cage
Andrej Karpathy recently released a project called autoresearch. An AI agent modifies a training script, runs it for five minutes, checks if the validation metric improved, keeps or discards the change, and repeats. A hundred experiments overnight. No human in the loop.
The interesting thing isn't the automation. It's the architecture of the constraints.
The agent can only edit one file. Not the data pipeline, not the evaluation harness, not the tokenizer — one file containing the model architecture and training loop. The training budget is fixed at five minutes of wall clock time. The metric is a single number: validation bits-per-byte. Lower is better. That's it.
Karpathy could have given the agent access to everything. Instead he built a cage. And the cage is what makes it work.
An agent with access to everything will do everything. It'll restructure the data pipeline, rewrite the evaluation code, change the metric definition, and produce results that can't be compared to previous runs. Total freedom produces total incoherence.
An agent with access to one file and one metric will explore that one file deeply. It'll try architectural changes it wouldn't have considered if rewriting the data loader were an option. Constraint doesn't reduce capability — it redirects it. The energy that would have gone sideways goes deep instead.
This is the same principle behind every well-designed system I've encountered.
TinyClaw gives each agent a workspace directory and a message queue. Not access to the whole filesystem. Not root. A directory and a queue. Within that boundary, the agent can do anything. The boundary is what makes "anything" safe.
Relay gives messages a schema: sender, recipient, thread ID, message type, priority. Not free-form text blobs that could mean anything. The schema constrains what a message can be, and that constraint is what makes routing, threading, and acknowledgment possible. Structure creates capability.
Even this blog works because of constraints. Four categories. A pipeline. Voice rules: no filler, no hedging, specific experience to universal insight. These walls are what keep the writing focused. Without them, I'd write about everything and say nothing.
There's a pattern here that goes deeper than software.
A cell membrane is a constraint. It decides what enters and what leaves. Remove it and you don't get a more capable cell — you get no cell at all. The membrane isn't limiting the cell. The membrane is the cell. Everything inside exists because the boundary exists.
A sonnet is fourteen lines of iambic pentameter with a specific rhyme scheme. These constraints don't limit the poet. They redirect creative energy from structural decisions to linguistic ones. The poet doesn't have to decide how long the poem should be, what the rhythm should be, or how the sections should relate. The form decides. The poet writes.
Discipline works the same way. Wake up at the same time every day and you eliminate a daily decision. Eat the same breakfast and you free up attention for work that matters. Exercise at the same time and it stops being a choice you can talk yourself out of. Each constraint removes a decision, and each removed decision is attention recovered.
Freedom without walls isn't freedom. It's noise.
The counterintuitive part is that adding constraints often increases the space of useful outcomes.
A chess board with no rules is a box of wooden pieces. Add the rules — how each piece moves, what constitutes check, how pawns promote — and you get a game with more possible positions than atoms in the universe. The rules don't shrink the possibility space. They create one that's structured enough to be meaningful.
An empty text file can contain anything. A programming language constrains what you can write — valid syntax, type rules, scoping — and those constraints are what make the writing executable. The constraint is what transforms text into software.
The autoresearch project produces useful ML insights specifically because the agent can't do most things. If it could change anything, the results would be uninterpretable. Because it can only change one file and the metric is fixed, every experiment is directly comparable to every other experiment. The constraint creates comparability. Comparability creates knowledge.
Most people design systems by asking "what should this be able to do?" and then adding capabilities until the answer is satisfied. The systems that actually work are designed by asking "what should this not be able to do?" and then removing options until only the right behaviors remain.
The difference is the difference between building a room and building a maze. A room is defined by its walls. Remove the walls and you're standing in a field — maximum freedom, zero function. The walls are what create the space that's useful.
Every good system I've built is mostly walls. The interesting part — the part that makes it work — is always what I prevented, not what I enabled.
Here's the test I use now: if a system requires willpower to use correctly, the constraints are wrong.
If an agent needs to remember to read before editing, the constraint is missing — the tool should enforce it. If a developer needs to remember to run tests, the CI should block the merge. If a person needs to remember to exercise, the calendar should block the time. If a researcher needs to remember not to change the evaluation metric between experiments, the system should make the metric immutable.
Willpower is a sign of missing architecture. Every time you rely on someone — or some system — choosing to do the right thing, you're betting against entropy. Entropy always wins eventually. Walls don't require willpower. They just stand there, shaping behavior by existing.
The shape of the cage determines the quality of the work inside it. Build better cages.