The same registry that decides what a turn writes into its response can drive every read path too, so the write model and its read models can't drift apart.
The two smallest, most-reused mechanisms in the catalog — a tiered config resolution order with normalizers, and a prompt-slot injection seam that routes one call's output into the next call's prompt without anyone wiring it up.
Load skills from a directory with role-aware conflict resolution, and assemble their outputs into the response through a registry the core reads — so adding a capability means adding a directory, not teaching a core node another skill's fields.
Name your model-and-params combinations as profile records the application registers at bootstrap, and let node code ask for a model by role so configuration scales with the graph instead of against it.
The rules that govern what counts as a valid plan belong on an injected policy the application supplies, not in the planner's prompt or your core — so the answer can differ per product without forking the planner.
A tool's entire interface with the orchestrator — schema, config, invariants, state reads and effects — collapses onto one declarative record, and the registry becomes the dispatcher.
The function that materializes your graph can have zero application imports. Topology becomes a declarative record the application supplies and the platform composes unchanged.
Extracting a traditional library is mostly mechanical — you can read the imports. Agentic systems hide three more coupling vectors where your toolkit can't see them: in prompts, in state field names, in the rules the planner enforces. The second product doesn't take a fraction of the runway because of effort. It takes it because those abstractions don't exist yet.
Semantic search doesn't throw errors when it returns the wrong tenant's data. It just returns it—and your agent weaves it into the response like it belongs there. This failure surfaces silently between customer 20 and 30, and when it does, the fix isn't a configuration change. It's a full audit of your execution graph, measured in months. Here's the architectural decision that prevents it.
Two frontier teams published production data that directly contradicts each other. Cognition says multi-agent architectures cause compounding information loss and single-threaded execution is the fix. Anthropic says multi-agent delegation beat a single agent by 90.2% on their research eval. Both are right—for the model version they tested on. Architectural best practices in agent systems have a six-month half-life.
When researchers stripped the safety constraints from a bounded autonomy system, they expected faster task completion. Instead, they got worse. The bounded version completed 23 of 25 tasks; the unconstrained baseline completed 17—and produced wrong-entity mutations that silently corrupted a different customer's data. The guardrails weren't slowing the system down, they were actually making it smarter.
Your agent can surface any conversation from six months ago verbatim, yet it is still making the same mistakes it made then. Recall and learning are architecturally distinct, and most agent memory systems only build the former. Removing memory from an agent hurts performance more than swapping the underlying LLM. You're probably investing in the wrong layer.
One team running a six-agent debate system switched to two agents with a strict state machine. Latency dropped from 18 seconds to 3. Cost per query dropped from $8-12 to $0.40. Accuracy changed by less than 1%. This wasn't a fluke. Information theory explains exactly why multi-agent systems can never outperform a single agent with full context.
Your vector store has the right answer stored. Your agent still gets it wrong. The failure is in retrieval logic that fetches topically similar content instead of logically relevant facts. Backward chaining, a technique from 1970s Prolog theorem provers, closes a 21-point accuracy gap on multi-hop memory benchmarks without touching your storage layer.