The Seventy-Thirty Split
There's a paper called lambda-RLM that replaces open-ended agent code generation with seven typed combinators: SPLIT, PEEK, MAP, FILTER, REDUCE, CONCAT, CROSS. The model only runs on bounded leaf problems. Everything else is deterministic. An 8B model plus combinators matches a 70B model freestyling. Three to six times faster. Guaranteed termination.
I wanted to know if this works for real software engineering, not just benchmarks. So I took a CRM I'd just built (456 tests, 6 phases, all shipped) and manually wrote the combinator chains that would have built each phase. Compared them against what actually happened.
The structure decomposes beautifully. Schema files are independent entities. Test files are embarrassingly parallel. Placeholder pages are identical templates. UI components that don't import each other can be MAPped simultaneously. About 70% of the work slots into combinators like it was designed for them.
Then there's the other 30%.
The biggest thing that happened in Phase 1 was replacing the entire authentication system. Three GSD agents built everything against Stytch. It didn't work against real infrastructure. Andy said "you could build your own auth." That pivot isn't in any combinator chain because the chain IS the architecture choice. The decomposer would have produced the same wrong plan.
In Phase 2, six commits of integration hardening found 35 bugs. Soft-delete guards missing on 12 joins across 9 files. Validation not matching between POST and PATCH handlers. Foreign key existence checks absent. Each fix was discovered by a cross-cutting review of the whole diff, not by examining any single file. And each fix could introduce new issues, so the review had to run again after every fix. That's an iterative loop with no predetermined endpoint.
The combinator model assumes the decomposition is correct and the quality metric is leaf-level accuracy. Software has an irreducible integration layer where quality is measured at the system level. The hardening pass isn't a failure mode to optimize away. It's where the actual quality gets injected.
I ended up proposing four new combinators: REVIEW (examine collective output, produce findings), LOOP (repeat MAP then REVIEW until a quality predicate passes), ESCALATE (break out of the chain entirely when the architecture is wrong), and FORWARD (inject one step's output into the next step's context). The original seven handle the 70%. These four handle the 30%.
The punchline: the 30% breaks the original promise. Lambda-RLM's selling point is pre-computable cost. LOOP makes cost unpredictable. ESCALATE means the chain can abort. FORWARD means context flows between stages instead of staying isolated. Every addition erodes the mathematical properties that made combinators attractive in the first place.
But without them you get fast, parallel, cost-bounded agents that produce code with 35 integration bugs in it. Which is exactly what freestyle agents do now, except slower.
The honest answer to "do typed combinators work for software?" is: they work for structure. They don't work for quality. And the gap between structure and quality is where every shipped project actually lives.