The Planning Policy Pattern: Where Cross-Tool Domain Rules Actually Belong

Pattern #3 in the Agentic Platform Patterns catalog. See the Tool Contract article for what this builds on, and the introduction for the catalog framing.

The last pattern landed on the ToolContract as the single declaration point for a tool's interface with the orchestrator. The orchestrator stopped importing tools by name; the registry became the dispatcher; adding a tool stopped meaning "edit five files." That worked. But it left one question unresolved. Where do the rules between tools go? That question turned out to be one of the more interesting design decisions in the whole extraction.

If you've read the standard pattern catalogs, this is not the Planning pattern they describe. Plan-and-execute tells you to decompose a task into steps and run them, with replanning on failure: a control-flow loop. This article is about something that loop never addresses. Where do the domain rules that constrain a valid plan live, so they don't leak into your planner's prompt or harden into your core? The Planning pattern governs how a plan gets executed. The Planning Policy pattern governs what counts as an acceptable plan in the first place, and, crucially, lets that answer differ for each product that is built on top of the same platform.

The almost-right idea

I'll be honest about the dead end I went down, because the dead end is where the pattern comes from.

When I sat down to pull domain behavior out of our orchestrator, the obvious move was to encode every rule as a ToolInvariant on whichever contract was closest. The flagship rule was the one our planner enforced through prompt text: "medical and clinical questions must include at least one retrieval step and one source step." It was sitting inside the planner's tool description, a sentence shipped to the LLM. I had just built a beautiful home for tool rules, so why not move this one to the retrieval tool's contract, I figured.

The check function was easy to imagine, but three things broke in sequence.

The check needed access to a different tool. The rule wasn't about whether a retrieval step exists. It was about whether a retrieval step and a hyperlink_retrieval step both exist. Half the rule was about a tool the retrieval tool's contract had no business knowing about. So which contract owns a rule that spans two tools? Neither answer was defensible. Put it on the retrieval tool and you've taught that contract about source acquistion. Put it on source acquisition and you've done the reverse. Put it on both and the rule lives in two places, drifting apart the first time someone edits one.
The check needed the whole plan, not one call. Tool contract invariants run per-call. The check function takes a single (call, state, resolved_config) tuple and answers a yes/no about that call. But "the plan must contain both X and Y" isn't a fact about any single call. It's a fact about the shape of the finished plan. The contract's per-call view couldn't see the thing the rule was actually about.
The check shouldn't even fire for the next product. This was the one that settled it. Contract invariants are part of the tool, and they travel with it. If the consumer-health product reused the medical retrieval tool (and it would; clinical retrieval is clinical retrieval), the medical-decomposition rule would ride along inside the contract, and that team would have to remember to surgically remove an invariant they never asked for. The rule wasn't a property of the tool. It was a property of our product's idea of a good plan.

A tool contract is owned by the tool's author and describes what the tool enforces about itself. This rule was owned by the application and described what the planner should enforce about plans. Different owner, scope, and lifecycle. I'd been trying to file an application rule in a tool's drawer, and it didn't fit because it was never a tool rule.

Once I saw that, the contract abstraction stopped being a candidate home for this kind of logic. I needed a another seam.

Contracts vs. policies

Here's the distinction the protocol is built on, because the whole pattern is just this distinction made into code.

Dimension	`ToolContract`	`PlanningPolicy`
Owned by	The tool's author	The application
Scope	One tool, one call	Cross-tool, plan-wide
Travels on reuse?	Yes, comes with the tool	No, each application brings its own
Validates	A single call in isolation	A plan step in plan-wide context
Knows about other tools?	No	Yes, that's the point
Contributes prompt text?	Yes (the tool's `description`)	Yes (the cross-tool "mandate")
Lifecycle	Changes when the tool changes	Changes when the product changes

Two of those rows carry the whole idea.

Ownership. A tool contract is part of the tool the way an interface is part of a class. Whoever owns medical retrieval owns its invariants, and nobody downstream should be editing them. A planning policy is part of the application's configuration of the platform. The medical product owns its decomposition policy, the next product owns whatever planning rules it needs, and so on. They are different artifacts with different authors, even when they happen to drive the same orchestrator over the same tools.

Lifecycle on reuse. When you extract a platform and a second product adopts it, the tool contracts come with the tools. That's the reuse win from the last article. The planning policies don't come along. Each application supplies its own. That's not an accident of the implementation; it's the entire reason the policy is a separate seam. If policies traveled with tools, you couldn't reuse a tool across two products with different planning rules.

The one-line version, which is also where the last article ended: contracts are how a tool says what it is; policies are how an application says what it wants the planner to enforce about its tools collectively. Once that's clear, the protocol almost writes itself.

The pattern

Name: Planning Policy.

Tagline: Inject application-specific, cross-tool planning rules into a shared orchestrator through a protocol the application supplies at startup, without the core ever learning what business you're in.

Intent: Give cross-tool, plan-wide, application-owned rules a home that is not the tool contract and not a prompt string buried in the core. Make the orchestrator's planning behavior a value the application supplies, the same way the blueprint made topology a value and the contract made the toolset a value.

Structure

The seam is a Protocol: four methods, no inheritance required. An application's policy is any object that implements them.

@runtime_checkable
class PlanningPolicy(Protocol):
    """Per-application planning rules injected into the orchestrator."""

    def config_params(self) -> list[ToolConfigParam]:
        """Config keys the policy needs resolved before its hooks fire."""
        ...

    def planning_mandate(self) -> str:
        """One or more sentences appended to the create_plan tool description."""
        ...

    def prompt_slots(
        self,
        state: dict,
        resolved_config: ResolvedToolConfig,
    ) -> dict[str, str]:
        """Slot values for the orchestrator system prompt template."""
        ...

    async def validate_plan_step(
        self,
        call: dict,
        state: dict,
        resolved_config: ResolvedToolConfig,
    ) -> tuple[dict, list[str]]:
        """Validate one plan step. Returns (possibly_corrected_call, errors)."""
        ...

Each method earns its place by living closer to the application than to any tool.

config_params() is the policy declaring what it needs to read at runtime. The orchestrator resolves these through the same cascade tool contracts use, configurable → state_config → declared default, so there's no new resolution mechanism to learn. Why does this belong on the policy? Because the policy is what consumes domain feature flags. The flags belong with their consumer, not in a global settings singleton that any code anywhere can reach into.

planning_mandate() returns the sentence (or sentences) appended to the planner's create_plan tool description, the prompt text the LLM actually sees. This is where "medical questions must include both X and Y" lives now. Why on the policy? Because that sentence is cross-tool, application-owned, and prompt-visible. All three properties point away from any single tool and toward the application.

prompt_slots() returns a dict of values merged into the orchestrator's system prompt template. Some prompt-time data depends on the policy's resolved config: whether the Skeleton of Thoughts (SoT) retrieval method is enabled, which web search policy is active, etc. The policy owns the slot keys derived from its config; the orchestrator owns the keys derived from graph state. Clean division, each side fills the slots it's responsible for.

validate_plan_step() is the load-bearing method. It validates a single step in the context of the whole plan so far and returns either the original call or a corrected one, plus a list of error messages. Why here and not on a contract? Because the decision may depend on the plan-wide state (what other steps already exist) and on application feature flags (the resolved config), both of which are policy concerns the contract deliberately can't see. Note the signature returns a tuple: a possibly-corrected call and errors. A policy can fix a step, reject it, or pass it through — the same correct-or-reject menu the contract pattern established, now applied at plan scope.

A real example

Here's a trimmed-down example of a planning policy. The shape of the dataclass captures the secondary point regarding feature flags in one screen of code.

@dataclass
class DecompositionPolicy:
    """Domain-specific planning policy: decomposition + web-search rules."""

    sot_enabled: bool = False
    web_search_policy: str = "orchestrator"
    web_search_query_mode: str = "all_subqueries"
    benchmark_prompt_prefix: str = ""

    def config_params(self) -> list[ToolConfigParam]:
        return [
            ToolConfigParam("web_search_policy", default=self.web_search_policy,
                            normalizer=_normalize_str_lower),
            ToolConfigParam("web_search_query_mode", default=self.web_search_query_mode,
                            normalizer=_normalize_str_lower),
            ToolConfigParam("sot_enabled", default=self.sot_enabled),
            ToolConfigParam("benchmark_prompt_prefix", default=self.benchmark_prompt_prefix,
                            normalizer=_normalize_str),
        ]

    def planning_mandate(self) -> str:
        return (
            "Medical/clinical questions MUST include at least one retrieval step "
            "and one source acquistion step."
        )

    # prompt_slots left out for brevity; it reads the four flags out of resolved_config

Three things are visible in those few lines, and each one is a design decision the rest of the article has been building toward.

The four feature flags are constructor arguments, not global settings. sot_enabled, web_search_policy, web_search_query_mode, and benchmark_prompt_prefix every one of them used to live on the global Settings class, readable from anywhere via get_settings(). Now they're fields on a dataclass. Constructing DecompositionPolicy(sot_enabled=True, web_search_policy="off") configures the policy directly. No global state, no ambient lookup, no magic.

The mandate is a method you can read, not a string you have to hunt for. When you want to know exactly what cross-tool rule gets sent to the LLM, you read planning_mandate() and you see it. When the next product reuses the orchestrator, it supplies a different policy with a different mandate, and the planner's behavior changes. Nobody touches the orchestrator.

The policy speaks the same ToolConfigParam dialect as tools do. config_params() returns the identical record type the Tool Contract pattern uses, normalizers and all. The policy plays by the cascade's rules. There's no second config system for policies; the seam the contract pattern opened gets reused exactly.

Now the method the dataclass left out for brevity, validate_plan_step, because it's where the cross-tool, plan-wide claim stops being abstract:

async def validate_plan_step(self, call, state, resolved_config):
    if call.get("tool") != "retrieval_llm":
        return call, []

    policy = str(resolved_config.get("web_search_policy",
                                     self.web_search_policy)).lower()
    if policy != "orchestrator":
        return call, []

    args = dict(call.get("args") or {})
    if "web_search" in args or args.get("subquestion_ids"):
        return call, []

    # Alternate the inferred web_search flag across the plan's retrieval_llm steps.
    history = state.get("_retrieval_llm_inferred_web_search", [])
    inferred = not history[-1] if history else False
    args["web_search"] = inferred
    history.append(inferred)
    state["_retrieval_llm_inferred_web_search"] = history

    return {**call, "args": args}, []

Read what this does, because it's three policy properties in one function. It only acts when web_search_policy resolves to "orchestrator", a per-deployment flag the policy owns. It carries plan-wide state through a private _retrieval_llm_inferred_web_search key, so each retrieval_llm step's inferred flag depends on the steps before it (the policy alternates the value to spread web-search load across the plan). And it returns the corrected call with an empty error list, because this rule infers a value rather than rejecting, so the LLM never has to know it left an argument out. None of those three things is expressible on a tool contract: not the flag, not the plan-wide memory, not the inference. All three are natural on a policy.

Consequences

What this pattern enables:

Cross-tool rules get an honest home. "Plans must contain both X and Y" lives on the artifact that owns the concept of a plan, not crammed onto a tool that only knows about itself.
Reuse stays clean at the planning layer. A second product supplies its own policy. The orchestrator runs unchanged; the tools run unchanged; only the planning rules differ, exactly as they should.
Domain feature flags leave the global settings class. They become constructor arguments on the policy that consumes them. (This deserves its own section, below.)
One config dialect, two consumers. Policies and tools both speak ToolConfigParam and resolve through the same cascade. No new mechanism.

What it costs:

A second seam to understand. A reader now has to know that some rules live on contracts and others live on policies, and which is which. The dividing line is clean (per-tool vs. cross-tool), but it's a line you have to internalize.
The policy can grow into a junk drawer. Because it's the home for "application planning behavior," there's a pull to dump any orchestrator customization here. Resist it. The policy is for cross-tool planning rules and the config they need, not for everything the application wishes the orchestrator did differently.

When not to use it

If your agent has one product, one set of planning rules, and no second consumer on the horizon, the mandate sentence is fine where it is, in the prompt, and a policy protocol is ceremony you don't need yet. The policy earns its keep at the moment a rule spans more than one tool, or the moment a second agentic product is needed. Before either of those, it's a seam without a load to bear.

Where feature flags go to die

There's a side effect of this pattern that I didn't design and wouldn't have predicted: it became the place domain feature flags go to die.

Before the extraction, our Settings class held two completely different kinds of thing: infrastructure config and things like domain feature flags and intent-confidence thresholds. Mixing them felt natural at the time. Pydantic settings are convenient, .env toggles are comfortable, and when you're shipping fast you put the flag where the flag is easy to read.

What that convenience cost was implicit global access. Any code anywhere could call get_settings().web_search_policy and read a domain flag out of thin air. The flag wasn't an input to the thing that used it; it was part of the orchestrator's ambient environment. You couldn't look at a node and know which flags it depended on, because the dependency was a function call buried somewhere in the body.

The policy pattern fixes this almost without trying. Once a domain flag is a constructor argument on the policy, the only code that can read it is code that was handed the policy instance. No global lookup. No get_settings() inside a node body pulling a flag from nowhere. The dependency is explicit because it had to be passed in.

The corollary is the part I find most striking: after the extraction, almost nothing in the orchestrator reads get_settings() for anything but infrastructure. The domain flags moved onto the policies that consume them, and the Settings class got smaller. Tests got easier too. You construct a policy with explicit values instead of monkey-patching a global singleton, which means a test reads like a statement of the scenario instead of a setup ritual.

I want to be precise about the claim. The pattern wasn't designed to fix feature-flag management. It just did, because the underlying decoupling was real. When you move a rule to the artifact that genuinely owns it, the rule's inputs come with it. Feature flags were always inputs to policy decisions. They just hadn't been living next to the decisions.

The honest close: the orchestrator-behavior layer, finished

The Planning Policy pattern finishes the orchestrator-behavior layer of the catalog. The orchestrator now selects its tools through a registry (Pattern #2) and its planning behavior through an injected policy (this one). Both are values the application supplies; neither requires the core to know what business you're in.

What the orchestrator still doesn't select cleanly, at least nothing we've covered, is its model. Every node in a real agent graph eventually wants a different model with different params: a cheap fast one for intent classification, a frontier one for the actual answer, something in between for tagging. This typically exists as a scatter of model names and temperature settings across files, and moving one role to a stronger model is a PR that touches eight of them. That's the next pattern.

Pattern #4 in the catalog is about LLM Profiles: centralized model configuration that scales with graph size. Adding a model variant becomes one line, swapping a model for a role becomes a config change, and per-deployment overrides ship as environment variables instead of code edits.

What's worth carrying out of this article, before the handoff: contracts are about tools, policies are about planners, and almost every domain feature flag in a production agent system can leave the global settings class and become an explicit constructor argument on the policy or runtime object that consumes it. Three small habits, large compounding consequences — and the line to keep on the sticky note:

Contracts are about tools. Policies are about planners. Don't mix them.