Testable AI tasks

AI Coding Acceptance Criteria

AI coding goes off course when success is described as a vibe. Acceptance criteria turn the prompt into observable behavior that tests, reviewers, and agents can all use.

Generate criteria See PR checklist

Last updated: May 25, 2026

acceptance-criteria.mdcopy-ready

Acceptance criteria:
- [ ] Given ...
      When ...
      Then ...
- [ ] Failure path:
- [ ] Permission path:
- [ ] Empty or duplicate state:

Evidence:
- Test:
- Manual check:

What Good Criteria Cover

Happy path

The expected behavior, state transition, response, UI update, or side effect.

Failure path

Validation errors, provider failures, timeouts, conflicts, and rollback behavior.

Boundary path

Permissions, empty states, duplicate requests, concurrency, and non-goals.

Acceptance Criteria Are The Agent Contract

A prompt can be persuasive without being testable. Acceptance criteria give the agent a target and give the reviewer a refusal mechanism when the diff adds behavior nobody asked for.

For AI work, write criteria that are concrete enough to map to tests or screenshots. If a criterion cannot be proven, it is probably still a requirement draft rather than an implementation contract.

Criteria Quality Check

Each criterion has a Given, When, and Then or an equivalent state, trigger, and result.
At least one criterion covers failure behavior.
At least one criterion covers permission, duplicate, empty, or boundary behavior.
The criterion can be proven without asking the product owner what they meant.
Review evidence maps back to criterion IDs.

Copy-Ready Criteria Template

Use this block inside any spec.md or AI coding packet. Number each criterion so review evidence can map back to it.

acceptance-criteria.md

## Acceptance Criteria

AC-1 Happy path
- Given ...
- When ...
- Then ...
- Evidence:

AC-2 Failure path
- Given ...
- When ...
- Then ...
- Evidence:

AC-3 Permission or boundary path
- Given ...
- When ...
- Then ...
- Evidence:

AC-4 Non-goal guard
- Given the implementation is complete
- When the diff is reviewed
- Then no out-of-scope file, schema, API, dependency, or UI behavior has changed.

Filled Example

A coupon checkout request becomes useful only after it says what happens to totals, errors, and duplicate submissions.

filled-example.md

AC-1 Valid coupon applies discount
- Given cart total is $120 and coupon SAVE20 is active
- When the user applies SAVE20
- Then order subtotal is reduced by $20 and the discount line shows "SAVE20"
- Evidence: checkout coupon test and UI screenshot

AC-2 Expired coupon is rejected
- Given coupon SPRING10 expired yesterday
- When the user applies SPRING10
- Then the API returns coupon_expired and cart total is unchanged
- Evidence: API integration test

AC-3 Duplicate apply is idempotent
- Given SAVE20 is already applied
- When the user clicks Apply again
- Then no second discount line is added
- Evidence: duplicate apply test

Real scenario: refund action cannot run twice

A support team adds a refund button to an internal order page. The risky behavior is not the button itself; it is what happens when the first request times out, the agent clicks again, or another support user opens the same order.

Request shape

Write ACs for one captured payment, one refund action, one pending provider state, and one replayed request with the same idempotency key.

Failure behavior

Timeout does not mean safe retry. The criteria should say whether the order remains pending, whether the button is disabled, and which error the support user sees.

Evidence to keep

Attach a replay test, a permission test for non-support users, and a screenshot of the pending refund state before allowing generated code to merge.

How To Turn Criteria Into Review Evidence

Acceptance criteria are useful only when they survive the whole path from prompt to pull request. Write them so the coding agent can implement against them, the test author can prove them, and the reviewer can reject any diff that does not map back to a numbered criterion.

Name the observable result

A good criterion does not say the checkout is improved or the API is robust. It names the visible state, response field, database effect, event, permission result, or error message that must exist after the action. That makes generated code easier to test and easier to refuse.

Attach evidence early

Write the expected proof beside the criterion before implementation starts. The evidence can be a unit test, integration test, screenshot, log query, metric, migration output, or manual QA note. If no evidence path exists, the criterion is probably still too vague.

Use criteria as a scope filter

During review, every changed behavior should point to an AC number. If the agent added a helpful extra state, renamed a contract, or touched a nearby abstraction without a criterion, treat it as drift and move it into a follow-up spec instead of merging it silently.

Review Questions Before Adoption

AI Coding Acceptance Criteria is not just a prompt for an AI tool. It should help people decide whether the task is ready for implementation, who owns unresolved questions, and which evidence will remain with the pull request. Use these questions in team conventions, PR templates, or pre-implementation review.

Who owns the decision?

Before using AI Coding Acceptance Criteria, name the person or role allowed to approve scope changes. Open questions without an owner become an invitation for the agent to fill gaps, and they make reviewers discover product, data, or permission decisions only after the code exists.

What blocks implementation?

Separate questions that must be answered from questions that can move forward as accepted risk. Blocking items usually include public APIs, data migrations, permission boundaries, payment behavior, rollback paths, and user-facing copy. If these are unclear, do not ask an agent to generate production code yet.

What evidence stays with the PR?

The final pull request should link acceptance-criteria.md, list changed files, name skipped checks, and map tests, screenshots, logs, or metrics back to acceptance criteria. Without that record, future readers have to reverse-engineer the decision from the diff.

Weak Criteria Patterns

Adjective criteria

Fast, smooth, secure, and intuitive are not enough unless the threshold or observable behavior is named.

Implementation criteria

Use a helper function is a task, not an outcome. Criteria should describe behavior.

Missing evidence

Each criterion needs a proof path: test case, screenshot, log, metric, or manual check.

Related Resources

Use the criteria as the spine for the rest of the AI coding workflow.

Given-When-Then Guide

Learn the structure behind pass/fail acceptance criteria.

Read guide

AI PR Review Checklist

Map criteria to test evidence before accepting generated code.

Open checklist

Vibe Coding vs Spec Coding

See how vague requests become criteria-driven AI tasks.

Compare

AI Acceptance Criteria FAQ

How many criteria does an AI coding task need?

Most small tasks need three to seven: happy path, failure path, and at least one boundary or permission case.

Should criteria mention implementation files?

Usually no. Put files in scope sections. Criteria should describe behavior and evidence.

Can AI write the criteria?

It can draft them, but a human should approve the behavior, risk, and evidence before code generation starts.

Before asking an agent to code, write the criteria as if a reviewer must test the feature without your help.

Generate criteria