Testable AI tasks

AI Coding Acceptance Criteria

AI coding goes off course when success is described as a vibe. Acceptance criteria turn the prompt into observable behavior that tests, reviewers, and agents can all use.

Last updated: May 25, 2026

What Good Criteria Cover

Happy path

The expected behavior, state transition, response, UI update, or side effect.

Failure path

Validation errors, provider failures, timeouts, conflicts, and rollback behavior.

Boundary path

Permissions, empty states, duplicate requests, concurrency, and non-goals.

Acceptance Criteria Are The Agent Contract

A prompt can be persuasive without being testable. Acceptance criteria give the agent a target and give the reviewer a refusal mechanism when the diff adds behavior nobody asked for.

For AI work, write criteria that are concrete enough to map to tests or screenshots. If a criterion cannot be proven, it is probably still a requirement draft rather than an implementation contract.

Criteria Quality Check

  • Each criterion has a Given, When, and Then or an equivalent state, trigger, and result.
  • At least one criterion covers failure behavior.
  • At least one criterion covers permission, duplicate, empty, or boundary behavior.
  • The criterion can be proven without asking the product owner what they meant.
  • Review evidence maps back to criterion IDs.

Copy-Ready Criteria Template

Use this block inside any spec.md or AI coding packet. Number each criterion so review evidence can map back to it.

acceptance-criteria.md
## Acceptance Criteria

AC-1 Happy path
- Given ...
- When ...
- Then ...
- Evidence:

AC-2 Failure path
- Given ...
- When ...
- Then ...
- Evidence:

AC-3 Permission or boundary path
- Given ...
- When ...
- Then ...
- Evidence:

AC-4 Non-goal guard
- Given the implementation is complete
- When the diff is reviewed
- Then no out-of-scope file, schema, API, dependency, or UI behavior has changed.

Filled Example

A coupon checkout request becomes useful only after it says what happens to totals, errors, and duplicate submissions.

filled-example.md
AC-1 Valid coupon applies discount
- Given cart total is $120 and coupon SAVE20 is active
- When the user applies SAVE20
- Then order subtotal is reduced by $20 and the discount line shows "SAVE20"
- Evidence: checkout coupon test and UI screenshot

AC-2 Expired coupon is rejected
- Given coupon SPRING10 expired yesterday
- When the user applies SPRING10
- Then the API returns coupon_expired and cart total is unchanged
- Evidence: API integration test

AC-3 Duplicate apply is idempotent
- Given SAVE20 is already applied
- When the user clicks Apply again
- Then no second discount line is added
- Evidence: duplicate apply test

Weak Criteria Patterns

Adjective criteria

Fast, smooth, secure, and intuitive are not enough unless the threshold or observable behavior is named.

Implementation criteria

Use a helper function is a task, not an outcome. Criteria should describe behavior.

Missing evidence

Each criterion needs a proof path: test case, screenshot, log, metric, or manual check.

Related Resources

Use the criteria as the spine for the rest of the AI coding workflow.

Given-When-Then Guide

Learn the structure behind pass/fail acceptance criteria.

Read guide

AI PR Review Checklist

Map criteria to test evidence before accepting generated code.

Open checklist

Vibe Coding vs Spec Coding

See how vague requests become criteria-driven AI tasks.

Compare

AI Acceptance Criteria FAQ

How many criteria does an AI coding task need?

Most small tasks need three to seven: happy path, failure path, and at least one boundary or permission case.

Should criteria mention implementation files?

Usually no. Put files in scope sections. Criteria should describe behavior and evidence.

Can AI write the criteria?

It can draft them, but a human should approve the behavior, risk, and evidence before code generation starts.

Before asking an agent to code, write the criteria as if a reviewer must test the feature without your help.

Generate criteria