Spec Skills and Spec-First Delivery: What Actually Fits Together

Most AI coding tools are optimized to get something working in the next ten minutes. Spec Skills is optimized for something different: making sure what gets generated is what you actually asked for. This is an overview of how it fits into a spec-first workflow, and an honest account of where it helps and where it won't.

FoundationsProcess

Published on 2026-04-29 · Updated 2026-05-31 · 7 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

Use This Page When

Use this page when the question is how Spec Skills fits into a spec-first workflow: what the model can draft, what a human must approve, and where the boundary sits before code generation. If you need a concrete before/after story, read Spec Skills Case Study: From Ticket to Spec. If you need the actual generator, open AI Coding Spec Packet.

The Problem With "Just Make It Work" AI

I have watched enough teams adopt Cursor, Copilot, and chat-based coding that I can predict the first three weeks. Velocity goes up. Bugs shift from "forgot a null check" to "implemented the wrong thing, correctly". The code looks clean. The diff is small. The feature is not what product asked for.

This is not because the tools are bad. It is because they are optimized for a goal their users rarely state out loud: get something running now, figure out the spec later. When the spec is vague, the model improvises. When acceptance criteria are implicit, it invents its own. The output is plausible, which is the worst failure mode, because nothing trips your review reflex.

The Spec Skills Position

Spec Skills starts from a different premise. The prompt is not a request for code. The prompt is a contract. It carries the spec, the boundary, and the acceptance criteria, and it carries them before any generation happens. If any of those three pieces is missing, Spec Skills will not generate; it will ask for the missing section first.

That is the whole idea in one sentence: generation is gated on a complete, declared spec. Everything else — the hooks, the CI gates, the PR templates — is scaffolding around that single constraint.

The Constrained-Prompt Loop

The loop looks like this: spec, then boundaries, then prompt, then AI output, then verification against the spec, then merge or reject. The "boundaries" step is where Spec Skills differs most. You declare which files the change is allowed to touch and which public contracts may change. If the model returns a diff that edits a file outside the declared set, Spec Skills flags it and the output does not auto-merge. You can override, but the override is a recorded decision, not a default.

Verification is also explicit. Acceptance criteria are parsed as Given/When/Then blocks and mapped to tests or runtime checks. If the code passes lint but no criterion has a matching check, Spec Skills will say so. That turns review from "does this look right" into "does this match what we asked for".

A Concrete Example: The Refund Endpoint

A team I worked with used Spec Skills to ship a refund endpoint. The spec was short: refund a captured charge, idempotent on the client-supplied key, reject refunds older than 90 days, emit a refund.created event on success. Declared file set: the payments handler, the refund service, and one test file. Public contract: one new POST route.

Acceptance criteria, in the shape Spec Skills expects:

Given a captured charge from within 90 days
  And an idempotency key not seen before
When POST /refunds is called
Then a refund is issued
  And refund.created is emitted exactly once
  And the response status is 201

Given the same idempotency key is replayed
When POST /refunds is called
Then the original refund is returned
  And no new event is emitted
  And the response status is 200

Given a charge older than 90 days
When POST /refunds is called
Then the response status is 422
  And no refund is created
  And no event is emitted

The first generation passed two of three criteria. The replay case emitted a duplicate event because the model stored idempotency state after emitting. Spec Skills flagged the mismatch, the engineer corrected the ordering, and the PR landed. The value was not speed; it was that nobody had to discover the duplicate-event bug in staging.

What Spec Skills Won't Do

It will not rescue a bad spec. If your acceptance criteria say "the refund should work correctly", Spec Skills will generate something that appears to work correctly. Garbage spec, garbage output — but now the garbage is notarized.

It will not fix a pattern-hungry engineer who wants the AI to invent elegant abstractions. Boundary enforcement specifically discourages invention; if your team rewards cleverness over fit, people will route around the tool.

It will not repair a rubber-stamp review culture. Verification surfaces mismatches; a reviewer still has to read them. A team that approves PRs in under a minute will approve Spec Skills PRs in under a minute, and structure will not save them.

Who Should Adopt It, and Who Shouldn't

If your team already writes technical specs — even short ones, even informal ones — Spec Skills will sharpen what you already do. The friction is low because the habits exist. You will mostly notice that fewer PRs come back for "that is not what I meant".

If your team does not write specs and does not want to start, do not adopt this. The tool will feel like bureaucracy and slow you down. Adopting spec discipline because a tool demands it is a losing sequence; adopting a tool because the discipline finally needs support is the right one.

Integration Points

In practice, Spec Skills lives in three places. A pre-commit SCM hook that refuses commits whose generated diffs fall outside the declared file set. A CI gate that re-runs the acceptance-criteria mapping and blocks merges where criteria are unmatched. A PR template that renders the spec, the boundary, and the Given/When/Then alongside the diff. None of these are novel individually. What is novel is that they all reference the same spec document, so drift becomes visible as a diff rather than a vague feeling.

The Trust Model, and Why It's Not "Just Write a Good Prompt"

The spec is trusted. The AI output is not. That asymmetry is the whole point. A good free-form prompt can produce a good result, once, for the engineer who wrote it, in the session where they wrote it. The next engineer writes a different prompt, gets a different result, and the team's behavior drifts across weeks without anyone noticing.

Spec Skills contribution is durability. The prompt structure persists across sessions, across engineers, and across repos, because it is encoded in the spec format and enforced by the hooks. "Write a good prompt" is a personal skill. "The spec carries the prompt" is a team property. Those are different things, and only the second one survives turnover.

What This Means in Practice

Adopting Spec Skills is a process decision that ships with tooling. If you already believe specs are the unit of truth and AI output is the unit of review, the tool will feel obvious. If not, it will feel like overhead, and the honest answer is to wait until your failure modes tell you otherwise. There is no shortage of teams shipping fast with free-form AI; there is a shortage of teams whose output six months in still matches what they said they were building.

Workflow Artifact to Copy

Use this when the article becomes part of a Spec Skills run. It keeps the model output tied to a bounded source packet and gives reviewers something concrete to mark up.

Spec Skills workflow packet: Spec Skills and Spec-First Delivery: What Actually Fits Together

Decision to make:
- See how Spec Skills fits spec-first delivery through constrained prompts, spec injection, boundary enforcement, and reviewable AI output.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

Tool boundary: the model may draft structure and open questions, but owner approval is still required for scope, contract behavior, and release risk.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Keywords: Spec Skills · spec-first AI coding · constrained prompt · boundary enforcement · acceptance criteria · Given/When/Then · AI code review