Behavorial contract part IV : the Assembled Framework

The previous three articles covered the pieces in isolation. Behavioral pillars that shape how Claude works. Adaptive complexity that decides how much process a task gets. A pipeline that enforces brainstorm, plan, execute, test, review, and push. This article is about what happens when those pieces run together as a system.

The Skill Layer

The framework is organized as a set of skills that Claude Code loads automatically. Each skill handles one responsibility and hands off to the next when its job is done. No skill tries to do everything, and no skill assumes the others exist. If you remove one, the rest keep working.

~/.claude/skills/
├── adaptive-complexity/   # Routes tasks to the right process level
├── launchpad/             # Project kickoff and situation reports
├── linear-sync/           # Bidirectional sync with Linear
├── deploy-check/          # Vercel deployment diagnostics
├── design-doc/            # Prototype-to-spec from Stitch
├── document/              # Docusaurus documentation generation
└── retrospective/         # Captures learnings to project memory

On top of these sit the Superpowers skills (brainstorming, writing-plans, executing-plans, TDD, code review, git workflow) and the Aegis behavioral rules (epistemic rigor, total ownership, structured feedback). The custom skills handle orchestration and integration. Superpowers handles the development workflow. Aegis handles discipline.

The Kickoff: Launchpad

Every project session starts with a question, not a query. The Launchpad skill asks what services this project uses before touching any external API.

What does this project use?
- [ ] Linear (project management)
- [ ] Vercel (hosting/deploy)
- [ ] Supabase (database/auth/storage)
- [ ] Stitch (UI prototyping)
- [ ] Docusaurus (documentation site)
- [ ] None of the above

Selected services get queried. Unselected services are completely ignored for the session. A CLI tool with no web frontend, no database, and no project management board gets a sitrep built entirely from local context: git status, recent commits, file structure, detected stack. Nothing more.

The sitrep itself only includes sections for what's relevant:

## Sitrep: Beacon

### Local
- Branch: main (clean)
- Stack: TypeScript, Next.js 15, Prisma, Vitest

### Linear
- Phase: Sprint 3
- In Progress: LIN-89 "Add workspace invite flow"
- Backlog: 6 issues (2 high priority)

### Vercel
- Last deploy: Ready (2 days ago)
- Domain: beacon.app

### Recommended Next Actions
1. Continue LIN-89 (in progress)
2. LIN-91 "Rate limiting on public API" (high priority)

No Supabase section because the project doesn't use Supabase. No Docusaurus section because nobody selected it. This is the fix for the problem described in the first article: Claude querying services that have nothing to do with the project. The gate is a single question, and it eliminates an entire category of noise.

The Routing: Adaptive Complexity in Context

Once work begins, adaptive complexity sits between every request and the skill layer. It determines which skills activate and which ones stay dormant.

A concrete example of what this looks like across different tasks in the same session:

"Fix the typo in the settings page" gets assessed as L1. Claude fixes the string, runs the build, commits. No brainstorm, no plan, no subagents, no code review. Fifteen seconds.

"Add email notifications to the invite flow" gets assessed as L3. The brainstorming skill asks three clarifying questions. The planning skill produces a seven-task breakdown with file paths. Subagents execute in parallel batches. TDD writes tests against the plan contract. The review checklist runs before push. Twelve minutes.

"Add rate limiting across all API routes" starts as an apparent L2 but gets upgraded to L4 when Claude discovers the API has three entry points with different requirements. It stops, communicates the scope change, presents a structured plan with risks and validation criteria, and waits for explicit approval before proceeding.

Same session, same framework, three completely different levels of process. The developer doesn't configure this per-task. The routing is automatic and silent. The only visible sign is that Claude's behavior is proportional to the task.

The Sync: Linear Integration

The Linear sync skill operates on a simple rule: creating things requires confirmation, updating things doesn't.

After the planning skill produces a task list, Linear sync proposes turning those tasks into issues:

I'd create these issues in Beacon:

1. "Implement notification service" (estimate: M)
2. "Add invite email template" (estimate: S)
3. "Wire notification into invite API" (estimate: S)
4. "Add notification tests" (estimate: M)

Create them?

If the plan has phases, it also proposes milestones. If there's a design document, it proposes pushing a summary to Linear as a project document. Everything creative requires a yes.

Status updates happen silently. When a branch name matches a Linear issue identifier, progress flows automatically:

Event	Linear action
Branch created	Issue moves to "In Progress"
Push to branch	Issue moves to "In Review"
Work verified complete	Issue moves to "Done"

The developer never opens Linear to drag a card. The board reflects reality because the framework updates it as a side effect of normal git operations.

The Gates: Pre-Push Quality Checks

When Claude runs git push, a hook intercepts and runs a stack-aware check sequence before the push goes through.

For TypeScript projects:

Secrets scan — regex patterns for API keys, tokens, private keys, hardcoded credentials. Blocks on match, reports exact file and line.
Type checking — tsc --noEmit. Blocks on errors.
Linting — Biome or ESLint. Blocks on errors, reports warnings.
Tests — Vitest or Jest. Blocks on failures.

The sequence adapts to the stack. A Python project gets mypy, ruff, and pytest. A C# project gets dotnet build and dotnet test. Detection is automatic based on config files in the project root.

There's a complementary guard for MCP pushes. Claude Code can push files directly through the GitHub MCP server, bypassing git entirely. The MCP push guard intercepts these calls and blocks them, redirecting Claude to use git push so the quality gates actually run. Without it, the gate system has a backdoor that Claude will occasionally walk through.

The Diagnostics: Deploy Check

After a push, Vercel deploys automatically but failures are silent in the terminal. The deploy-check skill surfaces what happened without requiring a context switch to the Vercel dashboard.

## Deploy Status: Beacon (feat/lin-91-rate-limiting)

- Status: Error
- Triggered: 3 minutes ago

## Build Error

Error: Cannot find module 'ioredis'
  at src/lib/rate-limiter.ts:2:1

Suggested fix:
- ioredis is in devDependencies but needs to be in dependencies
- Vercel production builds only install production dependencies

Five relevant lines from a 200-line build log, plus a concrete fix. The skill reads the logs, identifies the error pattern, and presents the minimum information needed to resolve it. If the deploy succeeded, it reports the URL and stops.

The Memory: Retrospective

At the end of a work session (or after any significant task), the retrospective skill proposes capturing what was learned. It scans the conversation for gotchas, conventions, and decisions, then presents them for approval:

Gotcha: ioredis in devDependencies passes locally but fails Vercel build.
Always put runtime packages in dependencies.
→ Save as: project gotcha

Convention: Rate limiting is per-route-group, not global.
Public: 100/min. Authenticated: 500/min/user. Webhook: 50/min/IP with allowlist.
→ Save as: architecture decision

Each approved learning gets saved to project memory. Next session, when Claude loads the project context, those gotchas are already there. It won't rediscover the devDependencies issue by hitting the same Vercel build failure. It won't redesign the rate limiting strategy because the decision and its rationale are on record.

The filtering matters as much as the capturing. The skill explicitly avoids saving things that are derivable from the code, present in git history, or project-specific config that belongs in CLAUDE.md. Only surprising, non-obvious, or decision-contextual information gets persisted. Two sharp learnings are worth more than ten vague ones.

What Stays Out of the Way

The framework's most important property is what it doesn't do. Across a full session with the assembled system:

No service gets queried without the developer selecting it. No documentation site gets created without an explicit request. No trivial task gets routed through a planning phase. No push bypasses quality gates. No Linear card requires manual updating. No deploy failure requires opening a dashboard.

Each of these is a specific skill, rule, or gate doing its job. Individually they're small interventions. Together they eliminate enough friction that the developer's attention stays on the work instead of the tooling around the work.

The Starting Point

If you want to build something like this, the pragmatic path is to start small and add layers as you hit real problems.

Week 1: Create a CLAUDE.md with the behavioral rules from the first article. Epistemic rigor, total ownership, structured feedback. Twenty lines. Immediate impact.

Week 2: Add the adaptive complexity rules from the second article. Another fifteen lines. Claude stops over-engineering trivial tasks.

Week 3: If you use a pipeline framework like Superpowers, integrate the complexity routing so it gates which skills activate. If you don't, the behavioral rules and complexity levels already cover the majority of the value.

After that: Add integration skills as you need them. Linear sync if you use Linear. Deploy check if you use Vercel. Retrospective if you want persistent memory across sessions. Each one solves a specific problem. If you don't have the problem, you don't need the skill.

The framework I've described took months to assemble. Most of that time was spent discovering what was actually needed versus what sounded good in theory. The nine specialized agents from an earlier version got dropped because they never activated in practice. The complex hook system got replaced by behavioral rules in a text file because text conditioning works better than programmatic enforcement.

Build for the problems you have. The framework will tell you what it's missing.

Behavorial contract part IV : the Assembled Framework

The Skill Layer

The Kickoff: Launchpad

The Routing: Adaptive Complexity in Context

The Sync: Linear Integration

The Gates: Pre-Push Quality Checks

The Diagnostics: Deploy Check

The Memory: Retrospective

What Stays Out of the Way

The Starting Point

Comments

More from this blog

Behavorial contract part III : the pipeline behind the code

Behavioral contract part II : not everything deserves a plan

Behavioral contract part I : Introduction

Everything I needed to understand before Playwright tests stopped looking like magic

Command Palette

The Skill Layer

The Kickoff: Launchpad

The Routing: Adaptive Complexity in Context

The Sync: Linear Integration

The Gates: Pre-Push Quality Checks

The Diagnostics: Deploy Check

The Memory: Retrospective

What Stays Out of the Way

The Starting Point

Comments

More from this blog