Published on

The Tiered Config Cascade and Prompt Slot Injection Patterns: Two Foundational Utilities the Rest of the Catalog Stands On

Authors

Pattern #6 in the HUGO ADK Catalog, and the last of the write-side patterns. See the Skill Loader article for what this builds on, and the introduction for the catalog framing.


This is the article about the bricks. Patterns #1 through #5 were buildings: the Graph Blueprint, the Tool Contract, the Planning Policy, the LLM Profile, the Skill Loader. Each one is a coherent abstraction you can hold in your head and point at. This one is about the two pieces of clay everything else got shaped from.

I pushed this toward the end of the catalog on purpose. If I'd opened the catalog with them, you'd have had no idea why they mattered. They're not, individually, exciting. But you've now seen five patterns that quietly lean on them. Every time a tool "resolved its config params," something resolved those params. Every time a policy's output "appeared in the next system prompt," something put it there. Those two somethings are the subject of this article. You've actually met them five times already without being introduced. This is the introduction.


The "hand-wavey" parts

Go back and reread the earlier articles with a skeptical eye and you'll find two places I quickly moved through that are relevant here.

The first is config resolution. The Tool Contract article said a tool "declares the config keys it needs and receives them resolved." The Planning Policy article said the policy's flags (web_search_policy, sot_enabled) were "resolved per invocation." The LLM Profile article had a node reading resolved_config to decide whether to pin a web-search model. In all three I said resolved like it was a verb that explained itself. It isn't. Something has to decide, for web_search_policy, whether the value comes from this run's configurable dict, or from per-thread state, or from the policy's declared default. And it has to decide the same way every time, or you get the bug where a value is uppercase in one node and lowercase in another and a comparison silently fails.

The second is prompt assembly. The Skill article ended by promising "the prompt slot injection that lets a policy's output flow into the next LLM call." Several patterns produce something (retrieved evidence, a hyperlink result, a loaded-skills manifest) that has to show up in a later system prompt. I kept describing that as if the value teleported. It doesn't. Something reads the value out of state, serializes it, and drops it into the prompt at the right spot, without the tool that produced it and the prompt that consumes it ever knowing about each other.

Two utilities. Both small enough that they don't earn standalone buildings. Both load-bearing enough that five buildings rest on them. Here they are.


Pattern A: The Tiered Config Cascade

Name: Tiered Config Cascade.

Tagline: Resolve every runtime config value through one predictable lookup order, configurable → state → declared default, with normalizers applied at resolution time, so every consumer sees the same value in the same shape.

Intent: Make per-invocation configuration predictable. Anywhere config gets read (a tool's params, a policy's flags, a node's behavior toggles) should follow the same resolution order, with the same fallback, normalized at the same moment. The goal is to delete a whole category of question from your codebase: "where did this value come from, and what shape is it in?" The answer should always be the same: "the cascade resolved it, and it's normalized."

Structure

Two small dataclasses are the entire abstraction.

@dataclass
class ToolConfigParam:
    """Declares a config key a tool needs, resolved via 2-tier cascade.

    Resolution order: configurable -> state_config -> declared default.
    If a normalizer is provided, it is applied to the resolved value
    (including the default) before storing.
    """

    key: str
    default: Any = None
    normalizer: Callable[[Any], Any] | None = None


@dataclass
class ResolvedToolConfig:
    """Resolved config values for a specific tool invocation."""

    _values: dict[str, Any] = field(default_factory=dict)

    def get(self, key: str, default: Any = None) -> Any:
        return self._values.get(key, default)

    def __getitem__(self, key: str) -> Any:
        return self._values[key]

    @classmethod
    def resolve(
        cls,
        params: list[ToolConfigParam],
        configurable: dict | None = None,
        state_config: dict | None = None,
    ) -> ResolvedToolConfig:
        ...

ToolConfigParam is the declaration: a key, a default, and an optional normalizer. ResolvedToolConfig is the snapshot: the resolved values for one invocation, read-only, handed to whoever needs them.

The resolution algorithm fits in a paragraph, and the code is barely longer than the paragraph:

@classmethod
def resolve(cls, params, configurable=None, state_config=None):
    configurable = configurable or {}
    state_config = state_config or {}
    values: dict[str, Any] = {}

    for param in params:
        val = configurable.get(param.key)
        if val is not None:
            values[param.key] = val
            continue
        val = state_config.get(param.key)
        if val is not None:
            values[param.key] = val
            continue
        values[param.key] = param.default

    # Normalizers run after resolution, on whatever tier won (default included).
    for param in params:
        if param.normalizer is not None and param.key in values:
            values[param.key] = param.normalizer(values[param.key])

    return cls(_values=values)

For each declared param: try configurable[key]; if that's None, try state_config[key]; if that's None too, fall back to the declared default. Then, in a second pass after every value has landed, apply each param's normalizer. Return an immutable snapshot.

Three things in there are deliberate, and each one is a decision I'd defend.

The cascade is named. Compare the alternative, which is what every codebase does first: each consumer hand-rolls value = config.get("x") or state.get("x") or DEFAULT_X inline. That's the same algorithm copy-pasted across twenty call sites, and every copy is one chance to forget a tier, flip the order, or skip the normalizer. Naming the cascade and resolving through one function means every consumer gets identical behavior for free. The bug where two nodes disagree about precedence can't exist, because there's only one precedence.

Normalizers run at resolution time, not at use time. This is a small decision with an outsized payoff. A caller can trust that resolved["web_search_policy"] is already lowercase, because the param declared normalizer=str.lower and the cascade applied it before anyone read the value. No call site has to remember "oh, this one I should lowercase before comparing." The shape of the value is part of the declaration, not a chore distributed to every consumer. The is not None check at each tier (falling through on None, never on falsy) is the other quiet correctness decision: a legitimate False or 0 or "" survives the cascade instead of getting skipped as if it were absent.

The cascade is two tiers, and it used to be three. An earlier version had a third tier: if neither configurable nor state had a value, fall back to get_settings().key, the global settings object. That tier was convenient and it was a trap. It let domain feature flags leak globally: web_search_policy could be set in the app's settings and silently picked up by the platform core, which is exactly the kind of coupling this catalog spends eight articles trying to delete. Removing the global-settings tier forced those flags somewhere better: onto the Planning Policy's constructor, where the LLM Profile and Planning Policy articles found them. The cascade got simpler because the other patterns absorbed what used to leak through it. That's not a coincidence; that's the catalog working.

Deleting the tier was the headline; the cleanup it forced is the detail worth keeping. With nothing falling back to global settings anymore, every domain default that had accreted into the platform's settings object had to be evicted too. The core was carrying a domain-flavored auth issuer and a domain-specific cache namespace as its own baked-in defaults, which is the same leak wearing a different hat. The fix moved those values down onto the application's settings and left the platform's defaults neutral. One of them, the cache namespace, lives on a nested settings model, and nested-model defaults can't be overridden by a plain subclass attribute. So the app's settings restore it in a post-init guard, and only if it's still neutral:

def model_post_init(self, __context) -> None:
    # The hugo platform leaves this neutral ("hugo"). Stamp the domain
    # value only if nothing upstream already set it; an explicit
    # override (env var, config) must win over our restore.
    if self.redis.namespace == "hugo":
        self.redis.namespace = "ai_answer_engine"

The shape of that guard is the whole lesson in miniature: the platform default is neutral, the application restores its own value, and the restore defers to any explicit override by checking that the value is still the neutral one before touching it. The cascade and the settings split are the same move at two scales: the platform reserves a neutral slot, the application fills it, and neither leaks into the other.

A real example

The cleanest instance lives in the medical engine's planning policy. Here's how it declares the params the cascade will resolve:

@dataclass
class MedicalDecompositionPolicy:
    sot_enabled: bool = False
    web_search_policy: str = "orchestrator"
    web_search_query_mode: str = "all_subqueries"
    benchmark_prompt_prefix: str = ""

    def config_params(self) -> list[ToolConfigParam]:
        return [
            ToolConfigParam(
                "web_search_policy",
                default=self.web_search_policy,
                normalizer=_normalize_str_lower,   # → "orchestrator", not "Orchestrator"
            ),
            ToolConfigParam(
                "web_search_query_mode",
                default=self.web_search_query_mode,
                normalizer=_normalize_str_lower,
            ),
            ToolConfigParam("sot_enabled", default=self.sot_enabled),
            # ...
        ]

Look at where the defaults come from: default=self.web_search_policy. The policy's constructor argument is the cascade's bottom tier. That's the design move from the Planning Policy article paying off here: the feature flags that used to live on global settings are now constructor args on a per-app policy, and the cascade reads them as the default tier. No global lookup, no leak.

And on the consuming side, a node resolves the whole set in one call through a thin helper:

def resolve_node_config(state, config, params):
    configurable = (config.get("configurable") or {}) if config else {}
    state_config = (state.get("config") or {}) if state else {}
    return ResolvedToolConfig.resolve(
        params, configurable=configurable, state_config=state_config
    )
cfg = resolve_node_config(state, config, policy.config_params())
if cfg["web_search_policy"] == "orchestrator":   # guaranteed lowercase
    ...

The node body never touches configurable or state directly. It asks the cascade for the resolved value and reads it, confident the value is present (default guarantees that) and normalized (the normalizer guarantees that). One declaration on the policy, one resolution at the node, no precedence logic anywhere in between.

Consequences

  • Every config lookup in the system follows one resolution order. "Which tier wins?" has exactly one answer, everywhere.
  • Normalizers delete the "sometimes this is uppercase" class of bug. The value's shape is declared once, applied once, trusted everywhere.
  • Adding a config param is a single declaration: one ToolConfigParam in a list, and every consumer of that list gets it.
  • Removing the global-settings tier forced upstream design improvements. The cascade can stay two tiers because feature flags moved onto policy constructors. The simplicity is downstream of the other patterns.

When not to use it

For a simple agent with a single config source (just a LangGraph configurable and nothing per-thread, for example), this is overkill. The cascade earns its keep the moment you have two legitimate sources of per-invocation config (the run's configurable and per-thread state) and you want consumers to be agnostic about which one a given value came from. That's most agents with memory. It arrives earlier than you'd expect.


Pattern B: Prompt Slot Injection

Name: Prompt Slot Injection.

Tagline: The declaration that a tool writes a value to state and the declaration that the value appears in a prompt are the same record. One field, prompt_template_var, sits on the tool's state effect so the producer and the prompt can never drift apart.

Intent: Let me concede the mechanism first, because it's not the point. "Tool output shows up in a later prompt via a named template variable" is something every agent framework does. LangGraph threads it through a node that reads state and formats a string. Semantic Kernel has {{$var}} pulled from kernel context. LlamaIndex fills a PromptTemplate. None of that is novel and I'm not claiming it is.

The part worth naming is where the slot is declared. In those frameworks, two facts live in two places: the node that writes the tool's result to state, and the node or template that reads it back into a prompt somewhere else. Two declarations, and nothing binds them. Rename the state key and the prompt still references the old slot; the drift compiles fine and fails silently. This pattern fuses the two facts into one record. The tool's contract says "I write state[key]" and "that value's prompt home is {{VAR}}" in the same ToolStateEffect. There is no second place to update, because there is no second declaration — and so the tool still never learns how prompts are built, and the prompt author still never learns which tool fills the slot.

Structure

The mechanism rides on ToolStateEffect, the same type the Tool Contract article introduced for declaring what a tool writes to state. There's one field on it we didn't dwell on then:

@dataclass
class ToolStateEffect:
    key: str
    type_hint: type = dict
    merge: str = "replace"  # "replace" | "append" | "merge_dict"
    preserve_across_turns: bool = False
    prompt_template_var: str | None = None   # ← this one

Set prompt_template_var, and you've declared a slot. The contract is now saying two things at once: "this tool writes to state[key]" and "that value should appear in system prompts at {{prompt_template_var}}." One declaration, two consequences.

Here's the test for whether that co-location is doing real work or is just tidy packaging: take prompt_template_var off the effect and put it in a separate prompt-wiring config instead. The mechanism still runs — the loop below still substitutes exactly the same way. But now you have the two-places problem back: a state effect over here, a slot registration over there, free to disagree the moment someone renames the key. The co-location is the pattern; the substitution loop is just plumbing. That's why the field lives on ToolStateEffect and not in a prompt-config table of its own.

The injection itself is a single generic loop, living on the registry, run once after a node builds its system prompt:

def apply_prompt_injections(self, prompt: str, state: dict) -> str:
    """Inject tool state values into prompt template variables."""
    import json as _json

    for effect in self.get_all_state_effects():
        if effect.prompt_template_var is None:
            continue
        val = state.get(effect.key, effect.empty_default())
        serialized = _json.dumps(val or effect.empty_default(), ensure_ascii=True)
        prompt = prompt.replace("{{" + effect.prompt_template_var + "}}", serialized)
    return prompt

Read what it does and, more importantly, what it doesn't do. It walks every registered state effect. It skips the ones with no prompt_template_var, since those are state writes that aren't meant for prompts. For the rest, it reads the current value from state, JSON-serializes it, and string-replaces {{VAR}} in the prompt. That's the whole mechanism. It knows nothing about any specific tool and contains no conditional branching logic. The empty-default fallback is the quiet robustness move: a slot whose tool didn't run this turn resolves to {} or [], not a stray {{VAR}} left sitting in the prompt for the model to puzzle over.

The flow, end to end:

  1. The hyperlink-retrieval tool runs and writes its result into state["hyperlink_retrieval_result"]. Its contract declares that write and the slot it feeds.
  2. The next node builds its system prompt from a template that contains {{HYPERLINK_RETRIEVAL_RESULT}} somewhere.
  3. The node calls apply_prompt_injections(prompt, state) once. The loop finds the effect, reads the value, serializes it, substitutes it.
  4. The model sees the retrieval result inline in its prompt. Nobody wired the producer to the consumer.

A real example

This is a real contract from the medical engine: hyperlink_retrieval, lightly trimmed:

HYPERLINK_RETRIEVAL_CONTRACT = ToolContract(
    name="hyperlink_retrieval",
    description=(
        "Retrieve hyperlinks from cached embedding "
        "indexes. Maps a free-text medical query to relevant reference pages."
    ),
    state_reads=[
        ToolStateRead("enhanced_query_text"),
        ToolStateRead("skills_loaded"),
    ],
    state_effects=[
        ToolStateEffect(
            key="hyperlink_retrieval_result",
            type_hint=dict,
            merge="replace",
            prompt_template_var="HYPERLINK_RETRIEVAL_RESULT",   # ← the slot
        ),
        ToolStateEffect(
            key="skills_loaded",
            type_hint=list,
            merge="append",
        ),
    ],
)

Two state effects, and only one of them feeds a prompt. The first writes the retrieval result and declares prompt_template_var="HYPERLINK_RETRIEVAL_RESULT", so a downstream prompt template that mentions {{HYPERLINK_RETRIEVAL_RESULT}} gets the result inlined automatically. The second effect, skills_loaded, has no prompt_template_var at all; it's a state write that other machinery reads, not something destined for a prompt. The same type expresses both; the presence or absence of one field is the entire difference between "this goes in prompts" and "this stays in state."

The thing I want you to notice is what the tool author didn't write. There's no reference to which prompt, which node, or which template. The tool declares a destination name and stops. Whether any template actually references {{HYPERLINK_RETRIEVAL_RESULT}} is the prompt author's concern, made independently. If a template references it, the value lands. If no template references it, the value sits unused in state, harmless. The two authors never have to meet, and the contract is the only contract between them.

Consequences

  • Tool outputs flow into the next prompt without explicit wiring. Declare the slot on the contract; the injection follows.
  • Producer and consumer stay decoupled. Tool authors don't think about prompt construction; prompt authors don't think about which tool fills a slot.
  • Prompt construction becomes data-driven. The prompt is the template, the values are the registered state effects, the wiring is the registry's generic loop. Templates can be edited without touching tool code, and tools can be added without touching templates.
  • An unfilled slot degrades gracefully. A tool that didn't run leaves its slot at an empty default, never a raw {{VAR}}.
  • The slot is also a trust boundary. This is the consequence the ergonomics hide. The loop does prompt.replace("{{VAR}}", json.dumps(value)), and some of those values are tool outputs derived from retrieved web content, like a hyperlink-retrieval result or a search snippet. The moment untrusted text flows into the next model's system prompt, slot injection stops being purely a convenience and becomes an indirect prompt-injection surface: a retrieved document carrying instruction-shaped text, or a stray }}, is now sitting inside the assembled prompt. The mechanism happens to dull the sharpest edge. The value is json.dumps'd, so it lands as a quoted JSON string rather than as bare prose the model reads as its own instructions, and it's substituted into a known template region rather than concatenated freeform. That's not a guarantee, it's a mitigation. The rule worth stating out loud is that a slot-injected value is data, not instruction, and the serialization is part of why. Treat anything that originated outside your system as content to be quoted, never as prompt to be obeyed, and keep the model's actual instructions in the static region of the template the cascade and the slots never write to.

When not to use it

For agents where state is small and a single one-shot prompt does the whole job, this is more mechanism than the problem needs; interpolate the value directly. Slot injection becomes worth it when state accumulates across nodes and multiple LLM calls in one turn need access to the same accumulated values. The moment you have two prompts that both want the same tool's output, you either build this or you wire it by hand twice and forget to update one of them.


How the two fit together

The cascade resolves config values that shape prompts. The slot injection populates state values that appear in prompts. Different sources, same destination: both end up as text in the system prompt that goes to the model on the next call.

The orchestrator's planning prompt has four distinct kinds of value living in one template, each arriving by a different road:

  • Static role content: "You are an agentic planner. Output a plan as JSON." Hardcoded in the template. It doesn't come from anywhere; it is the template.
  • Cascade-resolved config: {{web_search_policy}}, {{sot_enabled}}. These come from the Tiered Config Cascade, normalized to lowercase strings before the template ever sees them.
  • Slot-injected tool output: {{HYPERLINK_RETRIEVAL_RESULT}}. This comes from state, written by a tool, routed in by Prompt Slot Injection via prompt_template_var.
  • Application policy mandate: the Planning Policy's planning_mandate(), which returns the "medical questions MUST include both a retrieval_llm step and a hyperlink_retrieval step" sentence. Supplied by the application, not the core.

Four sources. One prompt. None of them know about each other, and all of them obey the same protocol: produce a value into a known location, and let the assembler substitute it. The static content sits in the template. The cascade produces normalized config. The slot injection routes state. The policy supplies its mandate. The prompt builder substitutes all of them without knowing which is which.

That's the meta-pattern, and it's the same one every article in this catalog has been circling from a different angle: prompt construction is itself an instance of declarative-records-assembled-at-runtime. The prompt isn't built by code that knows every value it needs. It's assembled from independent declarations (a template, a config param, a state effect, a policy method) that were registered separately and don't reference each other. Same shape as the blueprint assembling topology, the registry assembling tools, the response node assembling its envelope. One more layer, same move.


The honest close: not exotic, and that's the point

Neither of these is exotic, and I won't pretend otherwise. The cascade is "named precedence with normalizers." The slot injection is "template variables resolved from state." Both have decades of prior art well outside agentic systems. Every web framework has a config-precedence story; every templating engine has variable substitution. There's nothing here you couldn't have invented over a lunch break.

What's worth naming isn't the cleverness of either utility. It's how much of the catalog rests on them. The Tool Contract pattern doesn't resolve its params without the cascade. The Planning Policy pattern doesn't get its flags to a node without the cascade. The orchestrator's prompt doesn't carry a tool's output without slot injection. These two boring utilities are the reason the five interesting patterns stay small: without the cascade, every pattern would carry its own ad-hoc config resolution; without slot injection, prompt assembly would be a tangle of hand-threaded values. The catalog reads as clean because two of its members do the unglamorous work the other five would otherwise each reinvent.

That's the lesson I'd leave you with about foundational patterns: they're the boring ones, and they're the multipliers. The exciting abstraction usually gets the article. The utility it quietly stands on usually doesn't, and that's exactly the one worth finding in your own system, because naming it once removes the same five lines of ad-hoc resolution from twenty places at once.

That closes the write-side patterns. Six of them: Graph Blueprint, Tool Contract, Planning Policy, LLM Profile, Skill Loader, and now the Cascade and the Slot Injection. And one idea underneath all of them: the application declares, the platform consumes, and everything an application supplies is a named record registered at bootstrap rather than a fact the core had to import. If there's a single sentence to carry out of these articles, that's it.

Everything so far has been write-side: how application content flows into the graph and the response. The catalog turns around next. The two remaining patterns govern the read side, where those same fields come back out. Registry-Driven Field Projection makes one registry the single authority for which fields a turn writes and which fields every read path projects, so history reconstruction and the live stream can never drift from the response. Parse-Boundary Ownership governs who owns the field names and meanings at the seam where the framework reads, or constrains, the model's structured output. Both are about the same fields you've watched flow inward through these six patterns, observed on their way back out.

The door stays open beyond those, too. There are patterns I know are real and haven't written: intent routing expressed as data instead of branches, the run-executor-and-event-broker seam that shows up when an agent system has to scale horizontally, the identity primitive for threading anonymous-versus-authenticated state through tool calls. They may show up as the platform grows and the next agentic products put pressure on parts of the framework that the current ones don't.

Use them. Push back on them. Find the one I got wrong and tell me. And if you build the next one before I do, I'd like to read about it.