Behavioral contract part II : not everything deserves a plan

Last month I watched Claude Code spend four minutes analyzing naming conventions, weighing three alternative approaches, and producing a structured comparison table. The task was renaming a variable from data to response.

The thing is, Claude wasn't wrong. The analysis was thorough, the alternatives were valid, the table was well-formatted. It just had absolutely no reason to exist. Renaming a variable is a five-second operation. The AI treated it like an architectural decision because nothing in its context told it otherwise.

This is the second article in a series about building a behavioral framework for Claude Code. The first one covered the three behavioral pillars (epistemic rigor, total ownership, structured feedback). Those pillars shape how Claude works. This article is about something different: how much work Claude should do in the first place.

The Missing Routing Layer

Most AI development workflows follow the same structure regardless of what you're building. You brainstorm, you plan, you execute, you test, you review. That pipeline makes perfect sense for implementing a new feature. It makes zero sense for adding an import statement.

The problem isn't that structured workflows are bad. The problem is applying them uniformly. A database migration and a typo fix don't need the same amount of ceremony. But without an explicit routing mechanism, the AI defaults to maximum effort on everything, because capability without constraint always trends toward excess.

I needed a way to assess task complexity before choosing which parts of the workflow to activate. The result is a five-level scale that runs silently at the start of every request.

Five Levels, From zero to full Pipeline

Level 0: Conversation

"What does this function do?" "Why did we pick Postgres over Mongo?" "Can you explain this error?"

No code changes. No tools. Just a response. The AI should answer and stop. No planning phase, no verification step, no subagent dispatch. If Claude reaches for a tool when you asked a question, your framework is miscalibrated.

Level 1: Trivial

Fix a typo. Add a missing import. Bump a version number. Rename a variable (yes, that one).

The signal is mechanical and single-file. No side effects, no dependencies, nothing that could break something else. Claude should just do it. Run validation when it's done, but skip every other step in the pipeline. No brainstorming, no planning document, no code review.

Level 2: Simple

Modify a function's behavior. Add a test for an existing feature. Small refactor within a module.

One or two files involved, clear scope, low risk. Claude thinks through the approach internally but doesn't need to share a plan or ask for approval. Execute, run tests, move on.

Level 3: Moderate

Implement a feature. Add an API endpoint. Refactor a module. Build a new component with state management.

Three to five files, potential side effects, enough complexity that the approach matters. This is where process starts earning its keep. Claude shares a brief plan before starting (three to six lines, not a document). Something like:

Plan: Add user preferences endpoint
- New route in src/api/preferences.ts
- Prisma schema update for preferences table
- Validation middleware for the request body
- Risk: migration might conflict with the auth PR in review

That's enough to catch a bad approach before code gets written. Implicit approval is fine at this level. If the plan looks reasonable, Claude proceeds without waiting for a thumbs up.

Level 4: Complex

System migration. Cross-domain refactor. New architecture. Breaking changes that ripple through the codebase.

Five or more files, architectural decisions, things that are expensive to reverse. This is where you want the full pipeline: structured plan with context, objectives, risks, steps, and validation criteria. Explicit approval required before any code gets written. Multi-agent execution for parallel workstreams. No shortcuts.

Context: Auth currently uses session cookies, need to migrate to JWT
Objective: Replace session-based auth with stateless JWT across all endpoints
Risks:
- Active sessions will be invalidated (mitigation: 2-week overlap period)
- Token refresh logic doesn't exist yet (mitigation: build it first)
Steps:
1. Implement JWT service (issue, verify, refresh)
2. Add refresh token rotation
3. Migrate middleware to check JWT instead of session
4. Update all protected routes
5. Add session-to-JWT migration endpoint
6. Deprecation notice on session endpoints
Validation: All existing auth tests pass with JWT, session endpoints still work during overlap

How to Assess

The assessment happens through two passes, and it's supposed to be invisible. The user shouldn't know their request was categorized. They should just notice that Claude's response is proportional to the task.

Heuristic pass (handles 80% of cases). Keywords and scope. "Fix," "typo," "rename," "import," "format" signal L1. "Implement," "add feature," "new endpoint" signal L3. "Migrate," "redesign," "architect" signal L4. File count and blast radius fill in the rest. This takes less than a second.

Light inspection (for the ambiguous 20%). A quick look at the codebase to gauge actual scope. "Refactor the auth module" could be L2 (one file, simple cleanup) or L4 (auth touches everything). One or two file reads to clarify, then route.

Two rules override the heuristic. When in doubt, round up one level. Better to share a plan for an L2 task than to skip planning for an L3. And if the user says "just do it" or "don't overthink this," drop one level. They're telling you the process is heavier than the task warrants.

The Mid-Task Upgrade

Sometimes you start an L2 and discover it's actually an L4. The function you're modifying has no tests. The module you're refactoring is imported by fifteen other files. The "simple" schema change triggers a cascade.

The instruction for this case is explicit: stop. Don't quietly escalate. Don't keep going and hope it works out. Tell the user the task is bigger than expected, share a revised plan at the appropriate level, and wait for confirmation before continuing.

This is the one that took the longest to get right. Without the explicit instruction, Claude either powers through (creating a mess) or asks permission for everything (which is its own kind of annoying). The rule is surgical: you only escalate when the actual complexity exceeds the assessed level. Not when you're being cautious. Not when something might go wrong. When you've concretely discovered that the scope is bigger.

The Routing

The adaptive system plugs into the broader workflow like a router. Every request passes through it before anything else happens. The output is a set of skills to invoke.

User request
  │
  ▼
Adaptive complexity assessment (silent)
  │
  ├─ L0: No skills. Respond directly.
  ├─ L1: TDD (if touching logic) + verification.
  ├─ L2: TDD + verification.
  ├─ L3: Brainstorm (light) + plan + TDD + consider subagent.
  └─ L4: Full pipeline. Brainstorm + plan + execute with subagents
         + TDD + code review + verification.

L0 through L2 skip the heavyweight process entirely. L3 gets an abbreviated version. L4 gets everything. The framework adapts to the task instead of forcing the task to adapt to the framework.

The Skill File

This lives as a skill that Claude Code loads automatically. The core of it is compact enough to fit in a CLAUDE.md if you don't use a skill system:

## Adaptive Complexity

Silently assess every request on a 0-4 scale before acting.

- L0 (Conversation): No tools. Just respond.
- L1 (Trivial): Single file, mechanical. Just do it. Validate when done.
- L2 (Simple): 1-2 files, clear scope. Think internally, execute directly.
- L3 (Moderate): 3-5 files, feature-level. Share a brief plan first.
- L4 (Complex): 5+ files, architectural. Full structured plan. Wait for approval.

Rules:
- Never announce the level. Just exhibit the right behavior.
- When in doubt, round up.
- "Just do it" from the user = drop one level.
- If mid-task the scope exceeds the assessment: STOP, reassess, communicate.

That's it. No configuration files, no complex setup. Fifteen lines that route every interaction to the right amount of process.

What Changes in Practice

The difference isn't dramatic on any single task. It's cumulative. Over a day of work, you stop watching Claude write analysis documents for trivial changes. You stop interrupting brainstorm sessions that shouldn't have started. You stop feeling like the AI is making every task heavier than it needs to be.

And when a genuinely complex task comes in, the full pipeline activates and you're grateful it exists. The structured plan catches a dependency you missed. The subagent dispatch parallelizes work you would have done sequentially. The code review finds an edge case.

The adaptive layer makes the framework livable. Without it, the behavioral pillars from the previous article apply uniformly and the process becomes its own kind of burden. With it, 80% of your interactions are lightweight and fast, and the other 20% get the rigor they actually need.

Next article: the full pipeline in action, from brainstorm to merge, on a real feature.

Behavioral contract part II : not everything deserves a plan

The Missing Routing Layer

Five Levels, From zero to full Pipeline

Level 0: Conversation

Level 1: Trivial

Level 2: Simple

Level 3: Moderate

Level 4: Complex

How to Assess

The Mid-Task Upgrade

The Routing

The Skill File

What Changes in Practice

Comments

More from this blog

Behavorial contract part IV : the Assembled Framework

Behavorial contract part III : the pipeline behind the code

Behavioral contract part I : Introduction

Everything I needed to understand before Playwright tests stopped looking like magic

Command Palette

The Missing Routing Layer

Five Levels, From zero to full Pipeline

Level 0: Conversation

Level 1: Trivial

Level 2: Simple

Level 3: Moderate

Level 4: Complex

How to Assess

The Mid-Task Upgrade

The Routing

The Skill File

What Changes in Practice

Comments

More from this blog