The LLM Profile Pattern: Centralized Model Configuration That Scales

Pattern #4 in the Agentic Platform Patterns catalog. See the Planning Policy article for what this builds on, and the introduction for the catalog framing.

Individual nodes in an agent system need the ability to declare custom model configurations. The query normalization node wants something small and fast. The intent classifier wants something smaller still, pinned to temperature=0 so its answers don't wobble. The orchestrator wants the strongest model you can afford, because it does the planning, and planning is where everything else lives or dies. The retrieval consolidator asks for high reasoning effort, since synthesis benefits from thinking. The post-generation pass needs yet another model, set to low verbosity, because its job is to polish rather than invent.

None of those wants is wrong. Each node really does have a different job, and different jobs really do want different models. The problem isn't that the wants exist. The problem is where they end up living.

Every node wants a different model

In a small agent (three or four nodes), every node uses the same model and you never think about it. You set a model name in one place and move on. That arrangement survives only as a proof-of-concept. Then the wants start arriving — one node at a time — and you reach for the first solution that comes to hand. There are three of them, and I've watched teams pick each one.

Hardcode the model name in each node file. This looks clean for a week. Then you decide to move the orchestrator to a stronger model, and you open a pull request that touches eight files, and the reviewer asks the question that should have warned you the first time: "why are all of these changing together?" Search-and-replace across your codebase is not a configuration strategy — it's the absence of one.

Build one giant model-selector helper. A single file takes a node name and returns a model plus a params dict. This is genuinely better than the first option, because the changes are local now, all in one place. But the file grows into a Christmas tree of if/elif/else, and adding a new node means reading the whole tree to find where your branch belongs, and the reviewer's question mutates into "why is this branch in the orchestrator section and not the retrieval section?" You've centralized the sprawl without organizing it.

Pass the model in through graph state. This one doesn't actually work, and the reason is instructive. A model is a runtime resource, not a state value. Threading it through graph state means every node has to know about every other node's model selection, which couples nodes that should never have met.

Each of these fails the way the planning rules failed in the last article. The configuration ends up owned by the wrong thing, scattered across the wrong places, with no single point where you can answer "what model does this node use, and why?" So we asked what model selection would look like if it were its own layer: owned by no single node, configurable per deployment, reviewable in one file.

Three things wearing one coat

Before the code, there's a conceptual move to make. The whole pattern falls out of noticing that "LLM configuration" is three different things we'd been collapsing into one.

The first is what a vetted configuration is. A model name, a params dict, and a few capability flags: does this model support tool binding? Native web search? Structured output? Streaming? Bundle those together, name the bundle, freeze it, and you have a profile. retrieval_default is "the fast Cerebras model with empty params." intent_classification_default is "the nano model pinned to temperature=0." A profile is a noun. It describes a thing that exists, vetted and named, ready to be reused.

The second is which profiles a node may use. The model_retrieval role's default is retrieval_default, and it's also allowed to use retrieval_web_search when a step needs web search. Nothing else. That allowlist-plus-default is the role policy, a different kind of statement from a profile. A profile says "here is a configuration that exists." A policy says "here is what this role is permitted to reach for." It's a fence, not a thing.

The third is per-deployment difference. Production runs retrieval_default on one provider; a staging load test wants to point that same profile at a cheaper model without anyone touching the registry. That's what overrides are for: a JSON patch loaded from an environment variable at startup, applied to the registry's values before the runtime ever reads them.

Most teams collapse these three into one configuration concept and pay for it forever, because the three have different shapes, different owners, and different rates of change. Profiles change when you vet a new model. Role policies change when a node's job changes. Overrides change per deployment and never touch code at all. Keep them separate and each one stays small. When a reviewer asks "why is this changing?", the answer is one block: because one profile in one registry function changed, and here it is.

The pattern

Name: LLM Profile-Per-Node.

Tagline: Centralized, role-scoped, per-deployment-overridable model configuration for a multi-node agent, where adding a model variant is one line and swapping a model for a role is one diff.

Intent: Make LLM configuration scale with the number of nodes in your graph instead of against it. Adding a new model variant should be one entry in a registry. Swapping a model for one role should be a single-line change a reviewer can take in at a glance. Per-deployment behavior should ship in an environment variable, not a code edit.

Structure

Two small frozen Pydantic models are the entire abstraction.

class LlmProfile(BaseModel):
    """Immutable profile describing a vetted model configuration."""

    model_config = ConfigDict(frozen=True)

    profile_id: str
    model: str
    params: dict[str, Any] = Field(default_factory=dict)
    supports_tools: bool = True
    supports_web_search: bool = False
    supports_structured_output: bool = True
    supports_streaming: bool = True


class LlmRolePolicy(BaseModel):
    """Allowlist of profiles the runtime may use for a given role."""

    model_config = ConfigDict(frozen=True)

    default_profile_id: str
    allowed_profile_ids: tuple[str, ...]

LlmProfile is frozen=True on purpose. Once a profile is in the registry, nothing mutates it at runtime. The only way its contents change is through the override mechanism, which runs once at startup and produces a new frozen instance. The four capability flags are the quiet load-bearing part. They let the runtime reason about a profile's affordances ("can this one bind tools? can it do native web search?") without parsing model-name strings or keeping a separate capability lookup table that drifts out of sync. A flag on the profile is a fact that travels with the profile.

LlmRolePolicy is the safety layer. Each role declares one default profile and a tuple of profiles it's allowed to use. When the runtime resolves a model for a role, it must land on a profile inside that allowlist. Anything outside it is an error — raised loudly, never a silent fallback.

The registry

Profiles are built in one function, and the shape of that function is the point:

def build_llm_profile_registry(settings: Any) -> dict[str, LlmProfile]:
    base_profile = LlmProfile(profile_id="default", model=settings.llm_model, params={})

    registry: dict[str, LlmProfile] = {
        "default": base_profile,
        "orchestrator_default": base_profile.model_copy(
            update={"profile_id": "orchestrator_default"}
        ),
        "orchestrator_reasoning": base_profile.model_copy(
            update={
                "profile_id": "orchestrator_reasoning",
                "params": {**base_profile.params, "reasoning_effort": "medium"},
            }
        ),
        "retrieval_default": base_profile.model_copy(
            update={
                "profile_id": "retrieval_default",
                "model": "gpt-fast-routing",
                "params": {},
            }
        ),
        "retrieval_web_search": base_profile.model_copy(
            update={
                "profile_id": "retrieval_web_search",
                "model": "gpt-mini-web",
                "params": {},
                "supports_web_search": True,
            }
        ),
        "retrieval_consolidator_default": base_profile.model_copy(
            update={
                "profile_id": "retrieval_consolidator_default",
                "model": "gpt-fast-routing",
                "params": {"reasoning_effort": "high"},
            }
        ),
        "post_generation_default": base_profile.model_copy(
            update={
                "profile_id": "post_generation_default",
                "model": "gpt-mini-web",
                "params": {"reasoning_effort": "low", "verbosity": "low"},
            }
        ),
        "intent_classification_default": base_profile.model_copy(
            update={
                "profile_id": "intent_classification_default",
                "model": "gpt-nano",
                "params": {"temperature": 0},
            }
        ),
        # ... the rest of the roster
    }

    overrides_raw = getattr(settings, "llm_profile_overrides", "") or ""
    if overrides_raw.strip():
        registry = _apply_profile_overrides(registry, _parse_overrides(overrides_raw))

    registry.update(_extra_profiles)
    return registry

Three things in that function are worth pulling out.

base_profile.model_copy(update={...}) is doing most of the work. Every profile is a small delta from one base profile. The registry reads as "the default, but with these tweaks": intent_classification_default is the default with a nano model and temperature=0, and retrieval_consolidator_default is the default with high reasoning effort. You're not maintaining a dozen full profile definitions with the same boilerplate repeated. You're maintaining a dozen diffs. That's what makes a single-line model swap a single-line diff.

The override step is one block near the bottom. The environment overrides is a JSON object, parsed at startup, that patches profiles in place. It honors exactly three things (the model, a shallow merge into params, and the supports_web_search flag), and it refuses unknown profile IDs with an error instead of silently doing nothing. That deliberate narrowness matters. An override can repoint a profile or nudge its params for one deployment, but it can't smuggle in a whole new profile or restructure the registry. The registry's design stays in code; only its values move to the environment.

The last line is an extension seam, and it's the catalog's whole thesis again. registry.update(_extra_profiles) merges in profiles the application registered, not the platform. The platform ships a roster of generic roles, and the application adds its own at startup through extend_profile_registry(). The same move the blueprint made for topology and the contract made for tools, made here for models: the platform declares a base, the application extends it, and neither has to import the other by name.

A real example

"model_retrieval": LlmRolePolicy(
    default_profile_id="retrieval_default",
    allowed_profile_ids=("retrieval_default", "retrieval_web_search"),
),

Two profiles, one default. retrieval_default is the fast model that handles most retrieval. retrieval_web_search is a different model whose profile carries supports_web_search=True. The allowlist says this role may use either, and only these two.

A word on the piece that ties it together, since it's been doing work in the background. There's a small runtime object that owns the profile registry and the role policies and translates a (role) request into a concrete chat model. Nodes never name a model or instantiate a client. They call get_model_for_role("model_retrieval", ...) and the runtime does the resolution, the allowlist check, and the model instantiation behind one method. It caches the instantiated models for the life of the process, which is safe precisely because profiles are frozen: an immutable profile can't drift out from under a cached client.

Watch what happens when a retrieval step actually needs web search. The tool decides, honoring the planner's explicit web_search flag or falling back to the deployment's web_search_policy, and then it pins the web-search profile explicitly:

if _should_use_web_search(call, resolved_config):
    # Pin retrieval_web_search; resolve_profile defaults to retrieval_default otherwise.
    return runtime.get_model_for_role(
        "model_retrieval",
        state={"config": {"llm": {"selected_profiles": {"model_retrieval": "retrieval_web_search"}}}},
        config=config,
    )

return runtime.get_model_for_role("model_retrieval", state=state, config=config)

The pin is a request: "use retrieval_web_search for this role, this time." The runtime's resolve_profile takes that request and checks it against the role's allowlist before honoring it. If someone pinned a profile the model_retrieval role isn't allowed to use, the runtime raises rather than quietly serving the wrong model. The allowlist isn't documentation that hopes to be obeyed. It's a gate the resolver runs through on every call.

And the capability flag isn't decoration either. When retrieval_web_search resolves, its supports_web_search=True flows straight into the model adapter:

return ChatLiteLLM(
    model=profile.model,
    use_responses_api=profile.supports_web_search,
    **model_kwargs,
)

The flag toggles the provider's native web-search API. The same flag the runtime guards against — it refuses to bind native web search to a profile that doesn't claim the capability — is the flag that turns the capability on. The profile says "I can do web search," the runtime trusts that to enforce a precondition, and the adapter trusts it to configure the model. One declared fact, three consumers, and no string-matching on model names anywhere.

Consequences

What this pattern enables:

Adding a model variant is one entry in build_llm_profile_registry, a single model_copy with the delta you want.
Swapping a model for a role is one diff. Change one profile's model field. The pull request fits on a phone screen.
Per-deployment behavior ships in an environment variable. No code change, and the override can't reach past model, params, and supports_web_search.
Runtime selection within a role is supported and safe. A caller can pin or narrow profiles per request, and the allowlist validates every pin.
Capability flags become behavior. The runtime enforces them and the adapter configures from them, so model affordances stop living in someone's memory.

What it costs:

Indirection. A node's model is now node-role → role policy → default (or pinned) profile → model. That's three hops a new developer has to learn before "what model does this node use?" has an obvious answer.
Profile proliferation. The roster grows past a dozen quickly, and not every profile stays heavily used. Without occasional pruning, the registry accumulates a graveyard of profiles that no role allowlists anymore.

When not to use it

A two- or three-node agent where every node uses the same model does not need this, and building it there is ceremony without a load to bear. The pattern becomes useful at around five nodes with at least two genuinely different model configurations — which, in my experience, is the absolute minimum of where most real agent systems are.

The surprising payoff: two unanticipated consequences

Two things fell out of this pattern.

Reviewing model changes became pleasant. Before the profile registry, "move the orchestrator to a stronger reasoning model" was a meeting before it was a pull request. Someone had to find every file that named the model, change them in lockstep, and convince a reviewer that the scattered diff was actually one decision. After the registry, the same change is a one-line edit to the orchestrator_default profile's model field. The reviewer sees the entire change without opening a second file. The conversation moved from "what do I need to read to understand this?" to the only question that was ever worth having: "is this the right model?"

One declared capability flag ended up driving more than one consumer. supports_web_search was a dispatch flag from the start — it had to be. The adapter reads it to decide which API surface to construct: a profile that claims web search is built against the provider's Responses API, and one that doesn't gets the ordinary chat-completions path. Those two speak different response formats, so the flag was never inert documentation; it was load-bearing the day it was added. What we didn't plan was the second consumer. Once the fact was sitting on the profile, the runtime started leaning on it too: it refuses to bind native web search to a profile that doesn't claim the capability. The same declared bit became both a construction switch and a guard rail, because once a fact lives in a structured place, the rest of the system finds reasons to trust it.

That's the general lesson of declarative configuration, and it's worth saying plainly: write data structures that describe what something is, not just what it's called, and the system will find more uses for those descriptions than the one you wrote them for. A flag added to dispatch the right API became a precondition the runtime enforces, because it was available, structured, and trustworthy. Identities tell you which thing you have; affordances tell the system what it can do with the thing, and affordances compound.

The honest close: it records the model choice, it doesn't make it

Here's the scope this pattern doesn't cover, stated honestly. It doesn't make prompt-engineering decisions for you. Which model to prefer for a job is a judgment the registry records but doesn't make, and that judgment is the genuinely hard problem one layer up. It doesn't solve multi-tenant model selection for SaaS deployments where different customers get different models; that's a real need this pattern would have to grow to meet. And the override mechanism is deliberately narrow. It patches values, not structure, so you can't reshape the registry from an environment variable, only nudge it.

There's one more extension this pattern is shaped for but doesn't yet reach: per-step profile selection driven by the planner. Everything in this article has the orchestrator selecting tools and policies, but its models still arrive as resolved per-call from a policy the caller supplies. A node can pin its own role's profile for a single call (the web-search example above does exactly that), but the orchestrator can't reach down and set a downstream node's profile as part of a plan. The mechanism for it already exists: selected_profiles is a pin the resolver honors, and the allowlist is a gate that already validates every pin. What's missing is just a simple profile field on the plan step and an orchestrator that writes selected_profiles as a planning output instead of only reading it. We haven't yet run into a scenario where we need that level of llm-selection autonomy, but it would be very easy to add. Then model choice becomes a planning decision rather than a node-local default.

What it does, durably, is make model selection legible. The profiles are one function. The role policies are one function. The per-deployment overrides are one environment variable. The application's additions come through one extension seam. Four places, four responsibilities, no overlap, and a model change you can review in a single block.

This completes the LLM-runtime layer of the catalog. The orchestrator now selects its tools through a registry, its planning behavior through an injected policy, and its models through a profile registry, and in every case the selection is a value the application supplies rather than a fact the platform's core had to learn. By now the shape should be familiar enough to name once and for all: the application declares, the platform consumes. It's the same idea at every seam where the two used to be tangled.

There's one layer left where they're still tangled in most agents I've seen — capabilities. The response node knows every skill by name, and the prompt-assembly code knows exactly how each skill's output is supposed to flow into the answer. That's the last place the core knows too much, and it's the next pattern.

Pattern #5 in the catalog: the Skill Loader + Skill Output Effect Patterns, pluggable capabilities loaded by configuration instead of imported by name, and a declarative description of how their outputs assemble into the response. If your response node has a growing pile of if skill == "..." branches, that's the one to read.