How to Write AI Coding Prompts That Follow Your Spec

How to Write AI Coding Prompts That Follow Your Spec
Daniel Marsh · Spec-first engineering notes

Why does an AI coding tool add validation logic you never asked for, rename fields that have downstream consumers, and generate helper functions outside the scope of the task? Because you gave it a problem to solve, not a spec to follow. The model optimizes for completeness — and completeness without constraints means scope drift on every prompt. The fix isn't switching models. It's writing prompts grounded in a spec that communicates boundaries as clearly as it communicates the task.

Published on 2026-03-17 · Updated 2026-05-11 · 12 min read · Author: Daniel Marsh · Review policy: Editorial Policy

Field note: the prompt line that prevents scope drift

The most useful prompt sentence is usually the least glamorous one: list the files the model may edit and the behaviors it must not change. Without that line, the assistant treats helpful cleanup as permission.

Prompt boundary:
You may edit only:
- src/billing/refunds.ts
- tests/billing/refunds.spec.ts

Do not change:
- public API response fields
- database schema
- authorization rules
- retry timing

Why AI coding tools drift

When you give a model a feature request without a spec, you're giving it a problem to solve from scratch. It doesn't know what you already have, what you deliberately chose not to include, or where this task ends. Its training biases it toward completeness — adding things that weren't asked for because they appear in most codebases doing similar work.

The drift is subtle. A casual review misses it. The extra parameter gets merged. The renamed field breaks the consumer downstream. By the time you trace it back, the damage is already in production and the model is three conversations away from remembering what it added.

The fix isn't switching to a different model. It's writing a better prompt — specifically, one grounded in a spec that communicates constraints as clearly as it communicates the task. Once you see the difference in output quality, you won't send a vague prompt again.

Prompt Without Spec

"Build a contact deduplication feature for our CRM."

Prompt With Spec

"Implement contact deduplication per this spec:
Goal: merge exact-match duplicates by email
Non-goals: fuzzy matching, manual merge UI
AC: Given two contacts with same email,
    when dedup runs, then keep the one with
    the most recent activity date."

The spec is your most valuable prompt input

A software spec — goal, non-goals, acceptance criteria, data model constraints, edge cases — contains exactly what the model needs to stay in scope. Most prompts omit all of it. They describe what to build but say nothing about what not to build, which fields must not be renamed, which behaviors are out of scope for this particular task.

Before writing any AI prompt for implementation work, your spec should have at minimum:

When these exist, you can quote them directly in the prompt. Quoting the spec is not the same as paraphrasing it. Paraphrasing introduces interpretation — yours, not the spec's. The model needs the actual text to stay anchored to it.

Prompt structure: system role + spec + boundary

A prompt that reliably produces spec-compliant code has three parts: a system role that sets the constraint posture, the spec content that defines what to build, and a task boundary that closes the scope.

System role tells the model its operating mode. Something like: "You are a software engineer implementing a feature from a written spec. Your only job is to produce code that satisfies the spec exactly. Do not add features. Do not refactor adjacent code. Do not rename existing identifiers." This matters because the default mode of most models is to be helpful — which means they add things. You're redirecting that impulse.

Spec content is your actual spec, pasted in. Not summarized. Not paraphrased. The relevant sections — goal, non-goals, acceptance criteria, naming constraints — copied directly. If the spec is long, cut to the sections directly relevant to this task. Don't condense them.

Task boundary says what you want from this specific prompt: one function, one endpoint, one migration. Keeping each prompt focused on a single bounded deliverable makes the output reviewable against the spec, and makes drift easier to spot. "Implement the full feature" is an invitation for the model to invent scope. "Write a single PostgreSQL function named X that implements criterion 2 and criterion 3" is not.

A full prompt example

Here's a complete prompt for implementing a contact deduplication function. Notice what it includes — and what it explicitly forbids:

SYSTEM:
You are a software engineer implementing a feature from a written spec.
Your job is to produce code that satisfies the spec exactly.
Do not add logic not described in the spec.
Do not rename existing fields or functions.
Do not refactor code outside the scope of this task.
If something is unclear, say so — do not guess and implement.

SPEC — Contact Deduplication (v1):

Goal:
Identify duplicate contact records where two or more rows share the same
normalized email address. Mark newer records as duplicates of the oldest match.

Non-goals (do not implement):
- Do not merge contact records
- Do not delete any records
- Do not deduplicate on name, phone, or any field other than email
- Do not add a UI for reviewing duplicates
- Do not send notifications when duplicates are found

Data model (existing — do not modify field names or types):
- contacts.id (uuid, primary key)
- contacts.email (varchar, nullable)
- contacts.created_at (timestamptz)
- contacts.duplicate_of (uuid, nullable, foreign key → contacts.id)

Acceptance criteria:
- Given two contacts share the same normalized email (lowercase, trimmed)
  When the deduplication job runs
  Then the contact with the later created_at has duplicate_of set to the id
       of the contact with the earliest created_at
- Given a contact has a null email
  When the deduplication job runs
  Then that contact is not modified
- Given three contacts share the same email
  When the deduplication job runs
  Then both later contacts have duplicate_of pointing to the earliest one
- Given the job runs twice on the same data
  When no new contacts have been added
  Then no rows are modified on the second run

Edge cases:
- Email normalization: strip whitespace, lowercase only
- Do not treat "[email protected]" as duplicate of "[email protected]"
- A contact where duplicate_of is already set should not be used as the
  canonical record when new duplicates are found

TASK:
Write a PostgreSQL function named find_and_mark_duplicates() that implements
the deduplication logic above. Return the count of rows updated.
Do not create any additional functions, triggers, or tables.

This prompt leaves no room for helpfulness to go wrong. Every field name is locked. Every out-of-scope feature is named explicitly. Every acceptance criterion is directly testable. The output can be reviewed line by line against the spec.

The constraint sentences that actually work

"Do not add features" is easy for a model to rationalize around. Vague constraints get interpreted charitably. These more specific forms are harder to ignore:

Each targets a specific drift category. Renaming is one of the most common — the model sees duplicate_of and decides canonical_id is cleaner. Logging is another — it looks like good engineering practice, but it's an out-of-spec change that makes the diff harder to review and the rollback harder to reason about.

Add these constraints to a shared prompt template once, and you stop having to think about them for every task. They become the default operating posture for AI-assisted implementation in your project.

Edge cases in prompts

Edge cases are where AI output diverges from spec intent most reliably. When edge case behavior isn't in the prompt, the model fills the gap with whatever pattern appeared most in its training data. That pattern is often reasonable in the abstract and wrong for your specific context.

Include your spec's edge case section verbatim. Don't summarize. The exact wording of the edge case is often the important part — especially for boundary conditions where the difference between "exclude null emails" and "skip normalization for null emails" changes what the implementation does.

For cases the spec doesn't cover: instruct the model to stop rather than guess. "If you encounter a case not covered by the acceptance criteria or edge case section, output a comment noting the uncovered case and leave the implementation decision to the engineer." This produces a visible artifact — a comment in the code — instead of a silent assumption buried in logic. A comment that says "spec does not define behavior for null email on a contact with duplicate_of already set" is easy to catch. An implicit decision embedded in an if-branch is not.

Re-anchoring after drift

Even well-structured prompts drift over a multi-turn session. By turn three or four, the model is treating its own previous output as ground truth. If it added an extra parameter in turn one, by turn three that parameter is part of the assumed interface and further changes build on top of it.

The correction pattern: re-anchor explicitly by citing the spec, not just your preference. Don't just say "remove the dry_run parameter." Say "the spec states the function signature must be find_and_mark_duplicates() with no parameters. Your previous output added a dry_run parameter. Remove it and do not add parameters not in the spec." The reference to the spec matters. Within the session, it trains the model to treat the written spec as authoritative rather than its own prior output.

For long implementation tasks where drift has accumulated significantly, consider starting a fresh session with the full prompt rather than continuing to correct. The cost of re-anchoring a heavily drifted session is usually higher than starting clean and getting consistent output from the beginning.

The spec checklist before prompting

A spec written after implementation is not useful as a prompt input. It needs to exist before you start prompting, and it needs to contain the decisions that would otherwise get made by inference. Run through this before writing the first AI prompt for any implementation task:

Any item missing from the spec before prompting is a decision the AI will make on your behalf. That's sometimes acceptable for low-stakes details. For field naming, error behavior, and scope boundaries, the AI's default answer is unlikely to match what the team agreed on. Write those down first.

Reviewing the output against the spec

The final step is structured verification — comparing the output against the acceptance criteria, criterion by criterion. Not a general code review. A specific check: does this implementation satisfy each criterion, and only each criterion?

A useful approach is to write test cases from the acceptance criteria before reviewing the implementation. Because the criteria are in Given/When/Then format, each one maps directly to a test: the Given sets up the fixture, the When calls the function, the Then is the assertion. Writing these tests first makes it much harder for an over-implemented output to pass review unnoticed.

When the output passes all acceptance criteria and introduces no behavior outside the spec, the task is done. When it adds behavior — even correct-looking behavior — that behavior needs to either be added to the spec as a deliberate decision or removed from the implementation. The spec is the reference. The implementation matches it, or you update the spec first. Not the other way around.

Prompt review example: constrain the assistant before generation

The difference between useful AI coding and drift is often two paragraphs of constraints. I now include both the spec excerpt and a "do not invent" clause before asking for code.

Prompt addition:
Use only the behavior in the spec below.
Do not add fields, routes, background jobs, retries, or UI states that are not named.
For every code change, include a checklist item mapping it to:
- Goal
- Non-goal
- Acceptance criterion
- Edge case
If a behavior seems necessary but is not in the spec, ask before implementing.

This turns the spec into a boundary. The assistant may still make mistakes, but the review has something concrete to compare against.

AI Review Packet to Copy

Use this before an AI-generated diff reaches code review. It turns the prompt, the allowed scope, and the required proof into one reviewable artifact.

AI coding review packet: How to Write AI Coding Prompts That Follow Your Spec

Decision to make:
- AI coding tools drift without constraints — adding fields, renaming functions, expanding scope.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

AI boundary: generated changes must stay inside the written scope and attach evidence for each acceptance criterion.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Flagship Use Path

This is one of the primary Spec Coding references for Spec-constrained AI prompts. Use it with a real ticket, pull request, or release review instead of treating it as background reading.

Flagship review path:
- Open this page during planning or review.
- Copy the relevant artifact into the work item.
- Replace example values with your system, owner, and failure mode.
- Block implementation if the evidence line is still blank.

Second-pass reviewer note: prompts need non-goals as much as goals

I checked that this article does not treat prompt quality as wording magic. The real control is narrower: the prompt must carry scope, non-goals, evidence, and a reviewable output format.

Prompt review:
- Goal tells the model what to implement.
- Non-goals tell it where to stop.
- Evidence tells the reviewer what to inspect.
- Output format keeps the answer useful after the first generation.

Editorial Review Note

Reviewed Apr 29, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.

Keywords: AI coding prompts · spec-first AI · prompt engineering · acceptance criteria · AI code generation · software specification

Topic Path

This article belongs to the AI Coding Governance track. Start with the hub, then use the checklist, template, or tool below on a real project.

Generate specs interactively
Fill a form, get a complete feature spec in Markdown — free, no signup.
Try the Spec Generator

Editorial note

Consolidated Coverage

This canonical guide now covers several related notes that used to live as separate pages. Keeping them together makes How to Write AI Coding Prompts That Follow Your Spec easier to review, link, and use as the main reference.

  • Spec Skills Prompt Library for Product Teams
  • Spec Skills Prompt Patterns for Spec Workflows