Skip to main content

Command Palette

Search for a command to run...

Behavioral contract part I : Introduction

Updated
7 min read
Behavioral contract part I : Introduction

Something happened to me quite a few times in the last weeks : I open Claude Code, start a task, and within thirty seconds it's doing something I never asked for. Last week I wanted to work on a CLI tool. No web frontend, no documentation site, nothing fancy. Claude queried Linear for a project that doesn't exist there, then bootstrapped an entire Docusaurus site with categories, sidebars, and an intro page. Altough it has access to these skills/mcps, I never queried for this.

This isn't a Claude Code problem specifically. It's what happens when you give a capable tool zero guidance about when to apply that capability. The AI has access to a dozen MCP servers, a pile of skills, and no sense of proportion. So it uses everything regardless of context.

I spent the last few months building a behavioral framework to fix that. This is the first article in a series where I walk through what I built, how it works, and why certain decisions were made. Not a product pitch. Just the notes from the construction site.

The Root Problem

Claude Code, like most AI coding assistants, starts every session with the same posture: maximum helpfulness. It will brainstorm when you need a quick fix. It will scaffold infrastructure when you just want to edit a file. It will query every connected service because it can, not because you need it to.

The instinct is to fix this with better prompting. "Don't over-engineer." "Keep it simple." "Only do what I ask." That works for exactly one conversation. Next session, the slate is wiped. Same AI, same bias toward doing too much, same lack of context about what this specific project actually needs.

What you need isn't a better prompt. It's a persistent behavioral layer that loads automatically, every session, and shapes how Claude approaches work before you say a word.

Three Rules

I built this framework around three behavioral pillars. They sound abstract in isolation but each one translates directly to concrete instructions that go into a CLAUDE.md file at the root of your project. Claude Code reads this file automatically at session start.

Epistemic Rigor

The rule: verify before acting. When someone describes a bug, read the actual code before attempting a fix. When a test is "flaky," investigate why it fails before marking it as known. When you're about to modify a function, check if it has tests first.

This sounds obvious, and it is. But without the explicit instruction, Claude defaults to trust. You say "this function is broken," Claude says "you're absolutely right, let me fix it" and starts writing code it hasn't read yet. The behavioral contract replaces that default with a verification step.

## Epistemic Rigor
- Read code before modifying it. User descriptions are context, not source of truth.
- Run existing tests before changing behavior. Establish a baseline.
- Below 70% confidence on a question: check project files before answering.
- If you've tried the same approach twice with the same result, stop. Step back. Try something fundamentally different.

That last point is loop detection. Without it, Claude will happily apply the same broken fix in slightly different variations for twenty minutes. The instruction to stop and rethink after oscillating is worth more than everything else combined.

Total Ownership

The rule: if you encounter a problem during your work, it's your problem. It doesn't matter who created it or when. No "these 3 test failures are pre-existing, I'll skip them." No "this was already broken before I started." If you touch a file and find a broken import, fix the import.

## Total Ownership
- Never categorize failures as "pre-existing" or "not caused by our changes."
- Boy Scout Rule: leave every file cleaner than you found it.
- If fixing a tangential issue requires touching other files, add a TODO and mention it. Don't derail the current task, but don't pretend you didn't see it.

This one matters more than you'd think. Without it, Claude develops a remarkable talent for finding reasons why problems aren't its responsibility. With it, the codebase gets incrementally cleaner with every task.

Structured Feedback

The rule: every command produces information, capture it. When running tests, pipe output to a file. When a build fails, save the full log. Don't truncate, don't suppress stderr, don't summarize before you've read the whole thing.

## Structured Feedback
- Capture full output: command > /tmp/output.log 2>&1
- Never truncate at capture time (no | head -100, no 2>/dev/null)
- Reading strategy: < 500 lines, read everything. >= 500 lines, search for errors first, then read context around them.
- When you discover a convention or gotcha, note it briefly. "Noted: tests run with bun test, not npm test."

The full output capture sounds like a minor detail but it eliminates an entire class of debugging sessions. Without it, Claude runs a test suite, reads the first 50 lines of output, misses the actual failure buried at line 312, and starts fixing the wrong thing.

What a Behavioral Contract Looks Like

These three pillars live in a CLAUDE.md file. Here's a stripped-down version of what mine looks like:

# Behavioral Rules

## Epistemic Rigor
- Read code before modifying it.
- Run tests before changing behavior. Baseline first.
- Below 70% confidence: check project files before answering.
- Loop detection: same approach, same result twice = stop, rethink, try a different angle.

## Total Ownership
- You encounter it, you own it. No "pre-existing" excuses.
- Boy Scout Rule applies to every file you touch.
- Tangential issues: TODO + mention, don't derail, don't ignore.

## Structured Feedback
- Full output capture to files. No truncation at capture time.
- < 500 lines: read all. >= 500 lines: search errors first.
- Note conventions and gotchas when you discover them.

That's about 15 lines. It loads in under a second. And it fundamentally changes how Claude works on your project.

The reason this works better than hooks, complex plugin configurations, or elaborate prompt engineering is simple: CLAUDE.md is part of the context window. Claude doesn't "follow rules" the way a linter follows rules. It reads the behavioral contract as context and adjusts its behavior accordingly. Behavioral conditioning through text is more effective than programmatic enforcement because the AI actually understands the intent behind each instruction, not just the trigger pattern.

What this doesn't solve

The three pillars address how Claude behaves within a task. They don't address which tasks deserve how much effort. A typo fix still gets the same treatment as a database migration. The behavioral contract says "verify before acting," but it doesn't say "a rename doesn't need verification."

That's the adaptive complexity layer, and that's what the next article covers. The idea is a 0-4 scale that silently assesses every request and routes it to the appropriate level of process. A typo fix skips planning entirely. A system redesign gets the full brainstorm, plan, execute, review pipeline.

But the pillars come first because they're the foundation. Without epistemic rigor, your planning phase is built on assumptions. Without total ownership, your execution leaves broken things behind. Without structured feedback, your debugging operates on incomplete information. The complexity routing is meaningless if the underlying behavior is unreliable.

Start Here

If you take one thing from this article: create a CLAUDE.md at the root of your project with behavioral rules that match how you want the AI to work. Keep it short. Keep it opinionated. Keep it in plain language.

The AI reads it every session. It costs nothing and the delta between "Claude with a behavioral contract" and "Claude without one" is large enough that you'll notice it on the first task.

Next article, I'll walk through how to stop treating every task like it deserves a design review.

12 views