The Hidden Couplings in Agentic Systems

Why prompts, tools, policies, and state make reuse hard, and the patterns that make it easier.

The AI codegen era didn't just speed up how engineers write software. It reset what executives believe is possible to ship in a quarter. The patterns in this catalog are what made that survivable for our team.

I'll show you what changed, what we built in response, and the catalog of patterns that came out of it.

Why is this important?

Frameworks like LangGraph, Google's ADK, OpenAI's Agents SDK, and Anthropic's Claude SDK are all good at the same thing: giving you the primitives to build one agentic application. None of them tell you how to build the second one without forking the first. That's the missing layer — and it's exactly the layer the AI codegen era puts the most pressure on, because the era resets how fast executives expect that second, third, and fourth product to ship.

That layer is what I ended up building. I extracted the orchestration core into what became HUGO (Heterogeneous Unified Graph Orchestrator), an internal platform rather than an open-source project: a set of named seams and guardrails that let developers and coding agents extend a shared platform instead of forking it for every new vertical. Building it is what gave me and my team enough confidence to put fast, largely AI-generated work in front of real users — and to do it again for the next product without starting over. The patterns in this catalog are HUGO's building blocks.

The velocity that started it all

Our team was given a very short amount of time to produce a working demo of a medical AI system shortly after forming. I hadn't built a prod-quality agent application with such a high level of customization on that sort of timeline ever, but the bar had moved. The same Cursor and Claude demos that had reset every engineer's expectations had also reset every executive's.

The product specs shifted almost daily. We made the calls we had to make to ship — centralized what was convenient to centralize, scattered what was convenient to scatter, encoded domain rules wherever the code happened to be open. Standard tradeoffs for a team building fast under fluid requirements. The code that came out the other side worked, served real users, and embedded our domain across roughly every layer of the system. It also contained genuinely hard-won design moves — patterns that come out of trial, revision, and the conversation that happens around a shared screen. The refactor question was: how do we preserve all that we'd figured out and make it repeatable?.

After we completed the MVP, we learned about the broader roadmap: a substantial number of similar applications across the company's portfolio — with each one getting a fraction of the already short runway we'd had for the last one. The business had concluded that AI codegen made a portfolio of AI products affordable per quarter. Our newly formed AI engineering team had not yet figured out how.

This is the moment the HUGO framework exists to address. Not can we reuse the code, but: what is the smallest set of named, reusable patterns that lets each of the next applications snap into a shared core without forcing a fork every time? I started that investigation, and the patterns I'll be writing about in this catalog are what came out of it.

This isn't an article about a refactor. It's about the abstractions we needed to keep shipping at the pace the business had come to expect of us. And it's about the patterns any team facing the same expectation will eventually need to name.

Why velocity is an abstractions problem in agentic systems

Every senior engineer reading this has either lived this already or is about to. An executive saw a demo that was built in a day. The demo set the new baseline for what "quickly" means. Now the team is being asked to deliver against the demo's pace on a roadmap the demo didn't account for.

The instinct, when this happens, is to push back on the timeline. That instinct is correct as far as it goes — the timeline is wrong, the expectation is possibly unrealistic, the AI-codegen multiplier doesn't compose the way the slide deck implied. But it isn't useful. The conversation lands you in a debate about effort, and you lose. The better move is to change the question: why would the second product on the roadmap take a fraction of the time the first one did? Because that question's answer is abstractions. And once you've named the abstractions, the timeline argument resolves itself — either you have them and the roadmap is realistic, or you don't and it isn't. The abstractions are the guardrails that provide the confidence in knowing exactly what you're going to get from codegen agents. You can take advantage of progressive disclosure to improve these agents by building skills, policies, and documentation around the abstractions.

In a traditional CRUD-style service, the second consumer of your code is mostly an imports problem. Extract a library, version it, publish, done. Decades of tooling exist to make that mechanical. Agentic systems have four coupling vectors instead of one, and three of them are invisible to a senior engineer's normal toolkit:

1. Prompts encode domain identity. The orchestrator's system prompt opened with "You are a medical AI assistant." That string lived in a module-level constant in the prompt registry. A branding leak, sitting in a file the directory structure called core.

2. Tool sets are domain-specific by definition. The orchestrator imported its retrieval tools by name (from app.tools.hyperlink_retrieval import …) and bound them into a build_execution_tools function. The next application that needed a different tool set would have had to fork the orchestrator file to swap them.

3. Planning policies are domain rules disguised as prompt strings. Our planner's tool description had instructions pertaining to specific tools in the tool set. That isn't a hint to the LLM. It's a policy, encoded in English, smuggled inside a prompt. Try extracting that into a reusable core without forking the planner.

4. State schemas accumulate domain fields. follow_up_questions, related_articles, gray_area_analysis — each one a clinical-UI affordance that bled into the orchestrator's state class.

A traditional library extraction is mostly mechanical because coupling is mostly visible — you can read the imports. Agentic-system coupling is also encoded in prompts, in state field names, and in the rules the planner enforces against tool calls. The reason the second product takes a fraction of the first product's runway is not effort. It's that the abstractions for those three invisible coupling vectors don't exist yet. They have to be named, built, and ratified before the velocity question has an honest answer.

The diagnosis framework

This is the part you can use on your own codebase tomorrow.

When I started the investigation, I needed a way to convert this codebase will be expensive to extend to the next product into a concrete, prioritized list of things to change. In fact, in its current form, there was very little about the codebase that was reusbale. I sat down with an initial selection of files under app/core/ and app/graphs/ and read them with one question in mind: if I were the second consumer of this code, what would force me to fork instead of extend? Every fork-forcing surface became a finding that I could convert into an agent skill for rapid analysis. Each finding got the same five fields:

Violation:                  <one-line description>
Core module:                <file path>
Application concept:        <what's leaking in>
Missing abstraction:        <what should be there instead>
Concrete failure for reuse: <what breaks for the second consumer>
P-level:                    <P0 blocks reuse / P1 forces fork / P2 quality of life>

The template is the technique. It forces you, for each leak, to articulate what would actually break if the next product on the roadmap tried to use this code. That's the question that converts "this feels coupled" into "this is the missing abstraction" — and it's the question that makes the velocity gap quantifiable instead of intuitive. The audit produced a backlog. The backlog produced commits. The commits became the catalog.

Three example findings from that backlog:

The orchestrator imported its tools by name. Our orchestrator node had top-of-file imports for the concrete medical-domain retrieval tools, then bound them directly into a build_execution_tools function. The missing abstraction: a tool registry the orchestrator queries at runtime, with executable bindings registered by the application at startup. That registry became the Tool Contract pattern.

The prompt registry hardcoded domain identity. A module-level constant in our prompt registry declared the orchestrator's preamble as "You are a medical AI assistant." — branding leaked into a file the directory structure called core. The missing abstraction: role configs (preamble, slots, base prompt path) passed into the registry's constructor by the application, not declared inside the registry itself. The application owns its identity; the registry owns the assembly.

The Settings class mixed infrastructure with domain flags. Code throughout core called get_settings() to read those domain flags, which meant a fork couldn't simply override values: the call sites presupposed those keys existed on the settings object. The missing abstraction: split into a CoreSettings (infrastructure only) and a per-app settings class, with domain feature flags moved off settings entirely and onto the policy objects that consume them. That last move, flags onto policies, became the Planning Policy pattern.

All three were P0 in my audit. The P-level matters because it changes what you do about it. P0 violations block the extraction itself — you can't ship a reusable core without fixing them. P1 violations let the extraction ship, but the second consumer has to fork files instead of extending classes. P2 violations are quality-of-life: ugly but tolerable.

The five-field template isn't novel. The discipline of applying it row by row, with the "concrete failure for reuse" field forced, is what turns vague coupling concerns into a concrete work-item backlog. It's also what makes the velocity conversation tractable. Once each leak has a named missing abstraction, the question "can we ship the next product in half the time?" stops being a hand-wave and becomes a list with line items.

Three observations about code written under velocity pressure

None of what follows is a surprise. The team made every one of these calls on purpose, under the constraints of the moment. The observations below are what I noticed after the fact, once I started looking for patterns. They're the kind of thing every reader's codebase has if it shipped fast enough. Recognizable, not novel. Naming them is what made the catalog possible.

The following observations are about the shape coupling takes under velocity pressure, not about anyone's judgment.

Observation 1 — The framework was almost never the constraint.

The instinct, when an engineer starts an extraction like this, is to blame the underlying framework. LangGraph forced this. MessagesState was the wrong primitive. The blog-post genre practically writes itself.

After spending time reading our own code carefully, I had the opposite reaction. Every primitive LangGraph gave us (StateGraph, MessagesState, subgraphs, conditional edges) was the right shape for what we'd built on top of it. The coupling we'd accumulated wasn't a framework problem. It was a we-didn't-have-time-to-name-the-abstractions problem, layered on top of the fact that the system was only intended to be a one-off pilot.

The reflex is seductive for a specific reason: the framework is the one part of the system you didn't write, which makes it the one part you can blame without implicating your own decisions. But the coupling vectors from the last section — prompts, tool imports, planning policies, state fields — are all things you put there. The framework didn't force them.

Observation 2 — The `core/` directory filled up with the most-coupled code, because the directory name granted permission.

The files under app/core/ were where application-specific concepts had leaked deepest. A tool utils file declared ORCHESTRATOR_NODE_PARAMS — a list of ToolConfigParam instances that named app-specific feature flags, sitting in a file the directory structure called core.

This isn't a code review finding. It's a sociological one. Under velocity pressure, you don't audit each file's location against its content. You drop the file where it imports cleanly, and you move on. The directory name does not stop you, because nothing in the toolchain is checking that the name reflects the contents.

The lesson: directory names are aspirational, not load-bearing. Putting something in core/ does not make it core. It just makes future-you, and the second-product team, trust the lie.

The corollary I'd give my past self: when you create a directory called core/, gate every file that lands there with one question — does this know what business we're in? If yes, it doesn't belong, no matter how convenient the import path. That single question, applied at code-review time, would have prevented most of the leaks the audit turned up.

It would not, however, have meaningfully changed the demo timeline. That's the honest answer to "why didn't we do this from day one": we knew what good looked like. Holding up pull requests for arguably nit-picky details was not an option. We chose to ship instead.

Observation 3 — Implicit registration cost the most to unwind.

The tool registry had a function called _load_contracts() that imported and registered the application's tool contracts directly. Coupled — but at least findable. A senior reader could grep _load_contracts and see the problem inside thirty seconds.

The skill output registry had a quieter version of the same pattern, and it was the one that cost. A function called _register_default_effects() was called eagerly inside SkillOutputRegistry.get_instance(), registering three app-specific output effects the first time anything in the system touched the registry. A fork importing that module would have inherited those defaults silently. No error. No log line. Just three named domain affordances quietly appearing in a registry the new team hadn't asked for.

The substantive cleanup was one deleted line, plus a docstring explaining why the registry now starts empty:

 @classmethod
 def get_instance(cls) -> SkillOutputRegistry:
-    """Get or create singleton instance."""
+    """Get or create singleton instance.
+
+    Effects are registered by the application bootstrap.
+    This method does NOT register any defaults — calling it before
+    bootstrap returns an empty registry, which is the correct
+    state for a freshly-constructed application."""
     if cls._instance is None:
         cls._instance = cls()
-        _register_default_effects(cls._instance)
     return cls._instance

Same architectural shape as _load_contracts, two extension models, one of them invisible. The implicit one had to be unwound first, because every downstream product would inherit defaults its team didn't know existed. The lesson is broader than this one diff: code written under velocity pressure accumulates two kinds of coupling — explicit and implicit. The explicit kind shows up the moment a reviewer reads the file. The implicit kind (eager registration, module-import side effects, defaults set inside singleton constructors) only shows up when the second consumer is already broken.

If you take one practical thing from this entire article, take this: when you audit your own codebase for fork-forcing surfaces, start with the implicit ones. The explicit ones will still be there next week, easy to find, easy to fix. The implicit ones bite first.

And once you've unwound an implicit coupling, the discipline doesn't hold itself — so we made the machine hold it. Two tests now stand guard. One walks every module in the platform with an AST parser and fails the build on any import of the application package, anywhere — including the kind nested inside a function body that a grep would miss. The other re-runs the application's bootstrap from a clean slate and asserts that every seam the platform reserves is actually filled, because we'd been bitten twice by registering a new seam at some of its sites but not all, and getting a silent fallback instead of an error. These aren't features; they're fitness functions — executable statements of an architectural rule, run on every commit. The implicit coupling bit first, and it bit twice. The point of the tooling is that it can't bite a third time.

The patterns that came out of it

Once you have the inventory, you don't tackle it all at once — and at the velocity we were operating at, you can't. What I found is that most of the coupling collapsed onto a small number of seams: places where application choices and platform machinery needed to be cleanly separated. Each seam, once named, has a shape that lets the next product on the roadmap snap into the same core. The rest of this catalog is one article per seam — what the pattern is, what it solves, how it looks in code, and where it doesn't work.

Eight patterns. None of them is novel in the sense of being new computer science — they're all recognizable to anyone who's worked on plugin architectures or dependency injection. What's interesting is what combination of them turns an agentic system from "one product's codebase" into "a platform that can host more than one product." The first six govern how application content flows into the graph and the response; the last two turn around and govern the read side — how those same fields come back out through history and the live stream. Here's what's coming.

1. The Graph Blueprint Pattern — Topology as Data. The graph builder stops importing subgraphs by name. Instead, the application supplies a GraphBlueprint (a declarative list of nodes and edges) and a compose_graph function materializes it into a StateGraph without importing a single domain module. The blueprint is the spine the rest of the patterns plug into.

2. The Tool Contract Pattern — One Declaration Point per Tool. A tool's entire interface with the orchestrator — schema, config params, invariants, state effects — collapses onto a single ToolContract record. The orchestrator stops importing tools by name; the registry becomes the dispatcher. Adding a tool means defining one piece of data.

3. The Planning Policy Pattern — Application Behavior Without Core Pollution. The "medical questions must include both X and Y" rule moves out of the planner's prompt string and onto a PlanningPolicy protocol implementation the application supplies at startup. Contracts are about tools; policies are about planners.

4. The LLM Profile Pattern — Centralized Model Configuration That Scales. Every node in a large agent graph eventually wants a different model. The pattern: name the model+params combinations as LlmProfile records the application registers at bootstrap, then assign profiles to node roles via an allowlist policy. Node code asks for a profile by role, not by model name. The deep treatment covers the seam itself: how a profile registry, role policies, and a runtime adapter let the same node code resolve a different model per deployment.

5. The Skill Loader + Skill Output Effect Patterns. Skills get loaded dynamically with conflict resolution and per-deployment config; their outputs get assembled into responses via a declarative SkillOutputRegistry instead of hand-wired field plumbing. Two patterns that travel together — what the planner can decide to use, and how the result flows back into the response.

6. The Tiered Config Cascade + Prompt Slot Injection Patterns. Two foundational utilities that several other patterns rely on: a three-tier config resolution order (configurable → state_config → declared default) with normalizers, and a prompt-template-variable injection mechanism that lets tool outputs flow back into the next LLM call without anyone wiring them up explicitly. The smallest patterns in the catalog and the most-used.

7. The Registry-Driven Field Projection Pattern. The first six patterns are all write-side. This one turns around: the same SkillOutputRegistry that decides what a turn writes into its response also drives what every read path projects — conversation history reconstruction and the live SSE stream both ask the registry which fields are active instead of keeping their own hand-written list. One write model, several read models, and no read model ever enumerating a field the write model didn't hand it. It's read/write skew solved by construction — and the reason a domain field can no longer be silently dropped from history.

8. The Parse-Boundary Ownership Patterns. A pair that governs who owns the field names and meanings at the boundary where the framework reads the model's structured output: an allow-mode carrier with an app-supplied mapper (the framework owns the wire grammar, the application owns the vocabulary) and its mirror, a forbid-mode schema provider (the field set is the contract the LLM must fill exactly). Two sides of one question — taught together because neither half conveys the distinction alone.

The honest close: name the seams, build behind them on demand

The bar for platform work isn't "design for every future consumer." It's "design for the consumers you have, with seams named for the ones you don't yet, so the next product on the roadmap can extend without forcing a fork." Naming the seams is what made the rest of the portfolio survivable. Building the cleanup behind each seam, in priority order as the next vertical demanded it, is what turned the names into a working ADK.

The rest of this catalog goes pattern by pattern. Pattern #1 — the Graph Blueprint — comes next. It's the spine. Every other pattern plugs into it.