Spec Skills for Acceptance Criteria Drafting

Spec Skills for Acceptance Criteria Drafting
Spec Coding Editorial Team · Spec-first engineering notes

Drafting testable acceptance criteria with Spec Skills: the structured prompt, the failure-mode pass, and the human review that catches what the AI misses.

Published on 2026-03-10 · Updated 2026-05-06 · 8 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

Why AC Drafting Is the Hardest Spec Section

I have written specs for fifteen years, and the acceptance criteria block is still the part I dread. Goals are easy. Non-goals are cathartic. The data model writes itself. But ACs are where handwaving starts: "the user should see a success message", "errors are handled gracefully", "performance is acceptable". Every one of those is a bug waiting to be filed against QA because the contract was never defined.

AC drafting is hard because it forces you to describe behavior exhaustively without drifting into implementation. You have to name failure modes you do not want to think about, decide whether an empty input is a validation error or a no-op, pick a side on whether concurrent edits overwrite or conflict. Most teams skip this and discover it in production. I use Spec Skills for AC drafting precisely because it refuses to let me skip.

How Spec Skills AC Prompt Differs From Raw LLM Output

When I paste a feature into a raw chat model and ask for acceptance criteria, I get four bullets that read like marketing copy: "user can delete accounts", "deletion is fast", "errors are shown clearly". Spec Skills AC prompt is structured. It forces every output into Given/When/Then, requires each criterion to reference a specific section of the feature spec, and flags vague predicates like "fast" or "clear" before returning the draft. If my input says "the delete endpoint should be fast", Spec Skills rejects it and asks for a numeric latency budget.

The other thing Spec Skills does that a raw prompt will not: it asks for five standard cases, every time. Happy path. Validation failure. Auth or permission failure. Concurrency edge case. Rollback or undo. If the feature genuinely does not have one, I have to write "N/A — read-only" rather than omit it silently. That one rule has caught more missing scope than any other review tactic I use.

A Concrete Example: Bulk Delete Users

A real endpoint I drafted ACs for last month: POST /admin/users/bulk-delete, taking an array of user IDs. My first pass, written by hand, was three bullets: "accepts an array of IDs", "deletes the matching users", "returns a count". Spec Skills structured pass expanded that to eleven Given/When/Then statements, including this one I would have missed:

Given an admin submits a bulk-delete request containing 500 user IDs
When 3 of those IDs belong to users with active billing subscriptions
Then the endpoint returns 409 Conflict with a body listing the 3 blocked IDs
And no users in the batch are deleted (all-or-nothing semantics)
And an audit log entry records the attempt and the blocking reason

The "all-or-nothing" decision was not in my feature spec. Spec Skills surfaced the ambiguity by asking whether partial success was acceptable, and I went back to product to decide. That conversation happened before a line of code was written, which is exactly where I want it.

The Failure-Mode Pass

After the happy-path ACs are drafted, Spec Skills runs a second pass it calls "what breaks". The prompt is explicit: for each criterion, list the ways this behavior could fail under adversarial input, network partition, race conditions, or operator error. For bulk-delete, that pass generated: what if the array contains duplicate IDs? What if the same admin submits the request twice in two seconds? What if the database connection drops after deleting 200 of 500 users? A raw LLM will happily draft ten happy-path ACs and call it done. Spec Skills makes the second pass mandatory and refuses to return a clean result until the failure cases are explicitly addressed or deferred with a written reason.

The Human Review Step: Testability or Reject

Spec Skills drafts. It does not decide. Every AC goes through a human review gate I run personally, and the rule is binary: if I cannot picture a deterministic test case that would pass or fail this criterion, I reject it and rewrite. "The user experience should feel responsive" is a wish. "The endpoint returns within 500ms at p95 under 100 concurrent requests" is an AC, because I can write a load test for it.

I reject roughly 15 to 20 percent of first-pass drafts. Usually not because the AC is wrong — because it restates implementation ("the service calls the deletion job") rather than observable behavior ("the user record is absent from subsequent GET requests within 2 seconds"). That distinction is where the tool still needs a human.

LLM Failure Modes Spec Skills Mitigates

Three patterns I see constantly in raw LLM AC output:

The AC Library: Reusable Patterns

Over time I have built a small library of AC patterns Spec Skills pulls from when it detects a matching feature shape: pagination (cursor vs offset, boundary conditions), auth failures (401 vs 403, token expiry, scope mismatch), error envelopes (consistent shape, required fields, localization), and idempotency (retry-safe keys, replay windows, conflict semantics). For a new paginated endpoint, Spec Skills suggests the pagination pattern as a starting point and I adapt specifics. This saves maybe 30 minutes per feature and keeps behavior consistent across the API surface.

Flowing ACs Into Test Scaffolding

Spec Skills emits test stubs from accepted ACs — skeleton cases with Given/When/Then mapped to describe/it blocks, assertion bodies left as TODO. For bulk-delete, the eleven ACs became eleven pending integration tests before I wrote production code. That gave me a concrete definition of done: the feature ships when all eleven pass. No scope creep, no forgotten edge cases, no "we will add tests later".

When Human Intuition Still Beats the Tool

Spec Skills misses three kinds of behavior consistently. Domain knowledge: if a user deletion has regulatory implications (GDPR erasure timelines, SOC 2 audit requirements, HIPAA retention rules), the tool does not know unless I feed the regulations in. Timing-sensitive behavior: cron interactions, webhook retry windows, cache invalidation. The tool drafts the AC if I prompt it, but it does not surface the timing concern on its own. Cross-feature interactions: how does bulk delete interact with the hourly export job? Only a human who knows the system answers that.

Metrics That Tell Me It Is Working

I track two numbers across my last six features. First-pass acceptance rate: around 82 percent. If it were 100, the tool would be drafting things I would have written anyway; if it were below 60, I would be fighting it. Second, missed edge cases found in QA after the spec was locked: down from roughly 4 per feature (pre-Spec Skills) to 1. That remaining edge case is almost always in the domain-knowledge category above, and I am fine with that — it is a human problem, not a prompting problem.

Workflow Artifact to Copy

Use this when the article becomes part of a Spec Skills run. It keeps the model output tied to a bounded source packet and gives reviewers something concrete to mark up.

Spec Skills workflow packet: Spec Skills for Acceptance Criteria Drafting

Decision to make:
- Drafting testable acceptance criteria with Spec Skills: the structured prompt, the failure-mode pass, and the human review that catches what the AI misses.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

Tool boundary: the model may draft structure and open questions, but owner approval is still required for scope, contract behavior, and release risk.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Editorial Review Note

Reviewed Apr 28, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.

Keywords: acceptance criteria drafting · Spec Skills · Given/When/Then · testable specifications · failure-mode analysis · spec-first AI

Topic Path

This article belongs to the Acceptance Criteria track. Start with the hub, then use the checklist, template, or tool below on a real project.

Editorial Note

Last reviewed Apr 28, 2026: examples, internal links, and reusable review blocks were checked for practical specificity.