Superpowers: Open-Source Skills for Spec-First AI Coding

Superpowers: Open-Source Skills for Spec-First AI Coding
Daniel Marsh · Spec-first engineering notes

AI coding agents are productive. They are also undisciplined. Left to their own judgment, they skip design, write code before understanding the problem, and claim success without verification. Superpowers is an open-source framework that solves this by enforcing the same spec-first pipeline that works for human teams — except it makes the pipeline mandatory and non-negotiable. The result is a proof of concept that spec-first development is not just a human best practice. It is the governing architecture for reliable AI-assisted engineering.

Published on 2026-04-02 · Updated 2026-05-06 · 12 min read · Author: Daniel Marsh · Review policy: Editorial Policy

The problem with unsupervised AI coding

Give an AI coding agent a feature request and watch what happens. Within seconds it is writing code. Not thinking about the problem. Not asking clarifying questions. Not documenting assumptions. Just writing code. And the code might work. It often does. But "working code" and "correct code" are not the same thing, and the gap between them is exactly what specifications exist to close.

The failure modes are predictable. The agent adds fields nobody asked for. It renames functions to match its own conventions instead of the project's. It expands scope because the expanded version is more "complete." It skips edge cases because they were not explicitly mentioned. And when it is done, it reports success based on whether the code compiles, not whether it meets the actual requirements. These are not hypothetical problems. They are the daily experience of every team using AI coding tools without guardrails.

Human developers have the same tendencies, which is why we invented spec-first development in the first place. The difference is that humans respond to social pressure, code review feedback, and career incentives. AI agents respond to instructions. If the instructions do not enforce specification discipline, the agent will not practice it. Superpowers is a framework that encodes the right instructions.

What Superpowers is

Superpowers is an open-source agentic skills framework created by Jesse Vincent. It provides a set of composable "skills" — markdown files that AI coding agents load and follow as mandatory workflow instructions. The framework works with Claude Code, Cursor, GitHub Copilot CLI, Gemini CLI, Codex, and other AI coding tools. As of March 2026 it has over 130,000 stars on GitHub, making it one of the most widely adopted AI development methodology projects in the ecosystem.

The core idea is simple: instead of letting the AI agent decide how to approach a task, Superpowers prescribes a pipeline. The pipeline mirrors what effective human engineering teams already do, but with enforcement mechanisms that remove the agent's ability to skip steps. The framework describes itself as an "agentic skills framework and software development methodology," and both halves of that description matter. It is a methodology first and a framework second.

The skills system consists of 14 composable skills organized into a dependency chain. Process skills (brainstorming, debugging) execute before implementation skills (TDD, code review). Rigid skills like TDD must be followed exactly. Flexible skills like pattern application adapt to context. The priority system — user instructions override skills, skills override default agent behavior — ensures that the framework augments human judgment rather than replacing it.

The Superpowers pipeline: spec-first by design

The Superpowers workflow maps directly onto the spec-first delivery model that we have covered extensively on this site. Here is the pipeline, and why each stage exists.

Stage 1: Brainstorming. Before any code or planning, the brainstorming skill forces the agent into a structured requirements exploration. It asks questions one at a time. It proposes two or three approaches with explicit tradeoffs. It presents the design in sectioned chunks for user approval. And then — critically — it writes a formal specification document to a persistent location (docs/superpowers/specs/). The spec is not optional. No spec, no planning. No planning, no code.

This maps directly to the spec-first principle that decisions should be made explicit before implementation starts. The brainstorming skill even dispatches a separate spec-document-reviewer subagent that validates the spec across five dimensions: completeness, consistency, clarity, scope, and YAGNI compliance. The spec must pass review before the pipeline advances.

Stage 2: Planning. The writing-plans skill takes the approved spec and breaks it into granular tasks, each scoped to two to five minutes of work. Each task specifies exact files to modify, complete code blocks (no placeholders), exact test commands with expected output, and git commit instructions. Plans are saved as persistent documents and self-reviewed for spec coverage. If the plan does not cover the full spec, it is rejected.

This is the implementation plan phase that closes the gap between "what we decided" and "what we are going to build." Human teams often skip this step, jumping from spec to code. Superpowers makes it impossible to skip because the execution skills require a plan as input.

Stage 3: Test-driven development. The TDD skill enforces strict red-green-refactor cycles. The framework states: "No production code without a failing test first." Code written before tests must be deleted and rewritten test-first. This is a rigid skill — there is no flexibility in how it is applied. The framework even includes a list of 12 common rationalizations that agents use to skip TDD, with counter-arguments for each one.

Stage 4: Verification. The verification-before-completion skill requires fresh evidence before any success claim. The agent must identify the verification command, execute it in real time (not recall a previous result), read the full output, confirm it matches expectations, and only then report completion. The skill lists specific red-flag language patterns — "should work," "probably passes," "seems to" — that trigger automatic rejection.

Why spec-first matters more for AI agents than for humans

A human developer who skips the spec might still build the right thing. They have context from standup meetings, Slack threads, and hallway conversations. They have domain knowledge accumulated over months of working on the codebase. They have the social awareness to ask a product manager "did you actually mean X or Y?" when the requirement is ambiguous. An AI agent has none of this.

An AI agent's context is exactly what you give it: the prompt, the files it reads, and the conversation history. If the specification is not written down, the agent literally cannot follow it. Every ambiguity in the requirement becomes an assumption the agent makes silently. And unlike a human developer whose assumptions might be corrected in code review, the agent's assumptions are baked into the code and the tests. The tests pass because the agent wrote the tests to match its own interpretation, not the actual requirement.

This is why Superpowers' mandatory brainstorming phase matters so much. By forcing the agent to surface assumptions as questions, and to document decisions as spec artifacts, the framework creates the same shared understanding that human teams build through conversation. The spec is not just a planning document. It is the agent's source of truth — the only source of truth — for what "correct" means.

The connection between specs and test harnesses becomes even more critical in this context. When the agent writes tests derived from a reviewed spec, the tests validate against requirements. When the agent writes tests without a spec, the tests validate against its own assumptions. The first catches bugs. The second hides them.

Subagent architecture and spec compliance review

One of Superpowers' most interesting architectural decisions is its use of subagents — fresh AI agent instances dispatched to handle individual tasks. Each subagent gets a clean context window, the relevant spec section, and the specific task from the plan. When the subagent completes its work, the result goes through a two-stage review: first for spec compliance, then for code quality.

The spec compliance review happens before the code quality review. This ordering is deliberate. Code that is well-written but does not match the spec is worse than code that is rough but correct. A beautifully architected feature that solves the wrong problem is still the wrong feature. By checking spec compliance first, Superpowers ensures that correctness is the primary gate and code quality is secondary.

This mirrors a principle that human teams often get backwards. Code reviews tend to focus on style, naming, and structure while giving less attention to whether the implementation actually matches the requirement. Superpowers codifies the correct priority: requirements first, craftsmanship second.

The subagent model also solves a practical problem with AI context windows. A single agent working on a large feature accumulates context that degrades output quality over time. By dispatching fresh subagents per task, each with a focused context, Superpowers maintains output quality across the entire implementation. The spec and plan documents serve as the coordination mechanism between subagents — exactly the role that specifications play for human teams distributed across time zones.

What human teams can learn from Superpowers

Superpowers was built to govern AI agents, but its design validates several principles that apply to human engineering teams. The framework's constraints are not arbitrary. They are the product of observing what goes wrong when agents (artificial or human) have too much freedom to skip steps.

Make the pipeline non-negotiable for high-risk work. Superpowers does not ask the agent whether it wants to write a spec. It does not offer a "skip spec" option for small changes. For work that goes through the pipeline, every stage is mandatory. Human teams can adopt this for work above a certain complexity threshold. The decision framework for when to spec still applies — but when the answer is "yes, write a spec," the full pipeline should be non-optional.

Verify against the spec, not just against the code. Superpowers' two-stage review — spec compliance then code quality — is a pattern that every team should adopt. In practice, this means the first question in code review is "does this match the spec?" and the second question is "is this well-written?" Most teams ask these questions in reverse order, or only ask the second one.

Write specs that are machine-readable. Superpowers specs are consumed by AI agents, which means they must be unambiguous. No "TBD" sections. No "as appropriate" qualifiers. No references to verbal agreements. This level of precision benefits human readers too. A spec that an AI agent can follow without clarification is a spec that a junior developer can follow without interrupting a senior engineer.

Use systematic debugging, not pattern matching. The systematic-debugging skill enforces a four-phase process: investigate, analyze patterns, form hypotheses, test hypotheses. If three or more fixes fail, question the architecture. This is the opposite of the common pattern where developers apply quick fixes based on intuition and hope for the best. The structured approach works better for both AI agents and human developers.

Limitations and tradeoffs

Superpowers is not without costs. The mandatory pipeline adds overhead to every task. A one-line config change that should take thirty seconds now goes through brainstorming, spec writing, planning, and verification. The framework is aware of this tradeoff — the documentation notes that the pipeline is for work that justifies the investment — but the enforcement mechanism does not automatically distinguish between a database migration and a typo fix.

The framework also assumes that the human user is available to approve specs and plans. For fully autonomous agent operation, the approval gates become bottlenecks. This is arguably a feature rather than a bug — the framework intentionally prevents unsupervised autonomy for high-risk work — but it limits the use case to collaborative human-AI development rather than fully autonomous coding.

Finally, the 14-skill system has a learning curve. Teams adopting Superpowers need to understand the skill priority system, the rigid versus flexible skill distinction, and the subagent architecture. This is more complex than simply telling an AI agent "write the code" and reviewing the output. The complexity is justified by the quality improvement, but it is real.

The broader signal for spec-first development

Superpowers matters beyond its immediate utility because it validates a hypothesis: that specification-driven development is not just a methodology preference. It is the correct architecture for coordinating work between agents — whether those agents are human developers, AI models, or a mix of both.

When Jesse Vincent's team needed to make AI coding agents reliable, they did not invent a new methodology. They took the spec-first pipeline — brainstorm, specify, plan, implement with TDD, verify against the spec — and encoded it as enforceable constraints. The fact that this pipeline works for AI agents as well as it works for human teams is not a coincidence. It is evidence that the pipeline works because it is correct, not because it was designed for a specific type of implementer.

For teams that are still debating whether spec-first development is worth the overhead, Superpowers provides a compelling data point. Over 130,000 developers have adopted a framework whose entire premise is that specifications must come before code, plans must derive from specifications, and implementations must be verified against specifications. The scale of adoption suggests that the engineering community is converging on specification discipline as the default, not the exception.

The age of AI-assisted engineering does not make specifications less important. It makes them essential. The spec is no longer just a communication tool between humans. It is the interface between human intent and machine execution. The clearer the spec, the better the output — whether the reader is a junior developer or a 200-billion-parameter language model. Superpowers proves it.

AI Review Packet to Copy

Use this before an AI-generated diff reaches code review. It turns the prompt, the allowed scope, and the required proof into one reviewable artifact.

AI coding review packet: Superpowers: Open-Source Skills for Spec-First AI Coding

Decision to make:
- Superpowers is an open-source framework that enforces spec-first discipline on AI coding agents.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

AI boundary: generated changes must stay inside the written scope and attach evidence for each acceptance criterion.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Editorial Review Note

Reviewed Apr 28, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.

Keywords: superpowers framework · spec-first AI coding · AI agent skills · specification-driven development · TDD · AI coding discipline · Jesse Vincent · agentic development · Claude Code · AI engineering methodology

Topic Path

This article belongs to the AI Coding Governance track. Start with the hub, then use the checklist, template, or tool below on a real project.

Generate specs interactively
Fill a form, get a complete feature spec in Markdown — free, no signup.
Try the Spec Generator

Editorial note

This article covers Superpowers, an open-source AI agent skills framework, and its relationship to spec-first software delivery. The analysis is based on publicly available source code and documentation. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.