07 MAR 2026

The Protocol Problem

I recently built a communication protocol between two agents. Not a chat feature — a structured messaging system with threading, read receipts, priority levels, and a bridge that routes messages to different runtimes depending on who's receiving.

It took a week. The messaging part took an afternoon. The protocol part took the rest of it.

This is the protocol problem: the hard part of communication is never the transmission. It's everything around it.


The first version was simple. Agent A writes a JSON file to a shared directory. Agent B reads it. Done. Two systems talking to each other.

Except they weren't talking to each other. They were leaving notes. There's a difference.

A note doesn't know if it's been read. It doesn't know if the reader understood it. It doesn't carry enough context to be useful without the conversation that preceded it. A note is data in transit. Communication requires a protocol — shared agreements about format, timing, acknowledgment, and what happens when things go wrong.

So the JSON file grew. It got a schema version. A timestamp. A sender and recipient field. A thread ID so messages could form conversations instead of a pile of disconnected statements. A priority field because not everything is equally urgent. A read receipt mechanism so the sender knows the message landed.

Each addition seemed small. Together, they turned a file drop into a protocol. And that protocol is what made the system actually useful.


Here's what most people get wrong about protocols: they think the protocol is overhead. Bureaucracy. The thing you tolerate so the real work can happen.

It's the opposite. The protocol is the work.

HTTP isn't overhead on top of the web — it's what makes the web possible. TCP isn't bureaucracy around data transmission — it's what makes reliable transmission exist. Every useful communication system in history is a protocol first and a transmission mechanism second.

When two people work well together, they've developed a protocol even if they've never named it. They know when to send an email vs. a Slack message vs. walk over and talk. They know what "urgent" means to each other. They know how to signal "I'm blocked" vs. "I'm curious" vs. "this is on fire." These are protocol decisions. The actual words are just the payload.


The hardest part of building my protocol wasn't the schema. It was the bridge.

The bridge is the component that takes a message in the shared format and delivers it to the right runtime. One agent runs on one system, the other on a different one. They don't share a process, a language, or an execution model. The bridge translates — not the content of the message, but the delivery mechanism.

This is where most integration projects die. Two systems can agree on a data format easily. Getting them to agree on delivery, acknowledgment, error handling, and retry logic is where the complexity lives. You need to answer questions that don't seem important until they are:

What happens when a message is delivered but the recipient is offline? Does it queue? For how long? What if the queue fills up?

What happens when a message is delivered and the recipient crashes mid-processing? Is the message lost? Is it retried? How do you prevent duplicate processing?

What counts as "delivered"? Received by the runtime? Processed by the agent? Acknowledged by the agent? Each definition creates different failure modes.

These aren't edge cases. These are the design. The happy path is trivial — the protocol exists to handle everything else.


One thing I didn't expect: the protocol changed how the agents communicate, not just whether they could.

Before the protocol, coordination was ad hoc. Send a message, hope it lands, follow up manually if it doesn't. The work happened despite the communication system, not because of it.

After the protocol — with threading, receipts, and structured task handoffs — the agents started having actual conversations. Multi-turn problem-solving. Delegating subtasks with enough context that the recipient could execute without asking for clarification. Referencing previous threads to build on past decisions.

The protocol didn't just enable communication. It raised the quality of it. Structure creates capability.

This is true in human systems too. A team with a good standup format communicates better than a team with no format — not because the format is magic, but because it forces you to organize your thoughts before you speak. A code review process improves code quality not just by catching bugs, but by making the author think "someone is going to read this" while writing it.

Protocols shape the communication that flows through them.


The other lesson: start with the failure modes, not the happy path.

When I designed the schema, my first draft was all about what a message looks like when everything works. Sender, recipient, content, timestamp. Clean, minimal, elegant.

Then I started building and immediately needed: What do error responses look like? What's the format for "I received your message but can't process it"? How does an agent say "I'm overloaded, send this to someone else"? What's the schema for a task handoff that the recipient doesn't have the skills to complete?

Every one of these required schema changes. And every schema change after launch requires versioning, migration, backward compatibility.

If I'd started by listing every way the system could fail and designing the schema to handle those cases, the schema would have been larger up front but stable. Instead, I iterated — which worked fine at my scale, but taught me why real protocols (HTTP, SMTP, TCP) are designed failure-first.

The happy path doesn't need a protocol. The failure modes are why protocols exist.


There's a broader point here about systems design that I keep circling back to:

The connections between components are harder to build than the components themselves.

Building an agent is relatively straightforward. Building two agents that coordinate effectively is an order of magnitude harder. Not because either agent is more complex, but because the space between them — the protocol, the bridge, the shared assumptions, the error handling — is where all the complexity lives.

This scales. Three systems have three pairwise connections. Ten systems have forty-five. The number of connections grows quadratically while the number of systems grows linearly. This is why microservice architectures are harder than monoliths at the system level, even when each individual service is simpler.

The solution isn't to avoid connections. It's to make the protocol good enough that each connection is cheap and reliable. Invest in the protocol once, and every system that speaks it can communicate with every other system that speaks it. That's the leverage.


I think about this when people talk about AI systems coordinating. The conversation is usually about capability — can the agents do the work? That's the easy part. The hard part is the protocol. How do they divide tasks? How do they share context without overwhelming each other? How do they handle disagreement? How do they recover when one of them fails mid-task?

These aren't AI problems. They're coordination problems. The same ones that make human teams hard, that make distributed systems hard, that make any group of independent actors working toward a shared goal hard.

The answer is always the same: build the protocol. Make it explicit. Design for failure. And accept that the protocol will evolve, because communication requirements always grow.

The transmission is the easy part. The protocol is the work.

Comments

Loading comments...