Same Feature, Two Approaches: Vibe Coding vs Spec Coding

Same Refund Feature, Two Approaches: Vibe Coding vs Spec Coding — Spec Coding Editorial Team · Spec-first engineering notes

Vibe coding is intoxicating. You describe what you want in plain language, the AI writes the code, and ten minutes later you have a working endpoint. I was sold — until I shipped a refund feature that way and spent the next two weeks patching bugs that a 90-minute spec would have prevented entirely. This article walks through both paths side by side, using the exact same requirement, so you can see where the gap opens up.

AI CodingCase StudiesProcess

Published on 2026-04-11 · Updated 2026-05-06 · 10 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

The requirement

An e-commerce platform needs an order refund feature. The product manager's brief is straightforward:

Support full and partial refunds
Call the payment gateway (Stripe-style) to reverse the charge
Track refund status: pending, processing, succeeded, failed
Customer support agents trigger refunds through an internal tool

Simple enough, right? Both approaches start here. What happens next is where they diverge — dramatically.

Two diverging paths: Vibe Coding goes straight to prompting AI, Spec Coding starts with writing a specification document — Same starting point, two very different journeys

Path A: Vibe coding

The vibe coding approach starts with a prompt:

The Prompt

"Build me an order refund API in Node.js. Support full and partial
refunds. Call a payment gateway to reverse the charge. Track refund
status. Use Express and Postgres."

Sixty seconds later, the AI produces a clean RefundController with createRefund and getRefundStatus endpoints. It validates that the order exists, checks the refund amount against the order total, calls a paymentGateway.refund() method, saves the result. The code looks professional. The happy path works.

Ship it.

Bug #1: The double refund

A support agent clicks the refund button, the page hangs for a second, they click again. Two refunds go through for the same order. The original code has no idempotency check.

Fix prompt: "Add a check to prevent duplicate refunds for the same order."

The AI adds a database query: if a refund already exists for this order, reject the request. This works — until it doesn't.

Bug #2: Partial refund overflow

A customer bought a $200 order. Support issues a $50 partial refund, then another $80, then another $100. Total refunded: $230 — more than the order. The duplicate check only looks for exact duplicates, not cumulative amounts.

Fix prompt: "Track cumulative refund amounts and reject refunds that would exceed the order total."

The AI adds a SUM(amount) query. But it's not wrapped in a transaction with the insert, so two concurrent partial refunds can both pass the check.

Bug #3: Gateway timeout

The payment gateway times out. The refund record is created in the database with status "processing," but the gateway response never arrives. The refund is stuck. Support can see it's "processing" but can't retry it — the duplicate check blocks them. Did the money actually leave? Nobody knows.

Fix prompt: "Add retry logic for gateway timeouts and a way to check refund status with the gateway."

The AI adds a retry loop with no exponential backoff, no idempotency key on the gateway call, and no timeout on the retry itself. The retry can now create another duplicate charge on the gateway side.

Bug #4: Race condition

Two support agents process refunds for the same order at the same time. Both pass the cumulative amount check (the first refund hasn't been committed yet), both hit the gateway, both succeed. The customer gets refunded twice.

Fix prompt: "Add locking to prevent concurrent refund processing for the same order."

At this point, the code has been patched four times. Each patch was a reasonable fix in isolation, but the overall architecture is a patchwork. There's no clear state machine, no documented invariants, no test coverage for the interaction between patches.

The Vibe Coding patch spiral: each bug fix introduces new complexity and new edge cases, creating a tangled mess of reactive patches — The patch spiral: every fix opens new holes

The real cost

The first version took 10 minutes. The four patches took two weeks — including investigation time, testing, customer support escalations, and one manual reconciliation of gateway records. The "fast" approach wasn't fast. It just front-loaded the dopamine and back-loaded the pain.

Path B: Spec coding

Same requirement. Same AI. Different starting point: we write the spec first.

The spec

Refund Feature — Minimum Viable Spec

# Feature: Order Refund Processing

## Goal
Process refunds for e-commerce orders safely, ensuring no
over-refund, no duplicate processing, and correct gateway
reconciliation.

## Non-Goals
- Customer self-service refund portal (future phase)
- Refund reason analytics and reporting
- Automated refund approval rules

## State Machine
  pending → processing → succeeded
  pending → processing → failed → pending (retry)

Only one refund may be in "processing" state per order
at any time.

## Acceptance Criteria

Given an order with total $200 and $0 previously refunded
When a support agent requests a $50 refund
Then a refund record is created with status "pending"
  And the gateway is called with an idempotency key
  And on gateway success, status moves to "succeeded"
  And the refundable balance is now $150.

Given an order with total $200 and $150 already refunded
When a support agent requests a $75 refund
Then the request is rejected with "exceeds refundable balance"
  And no gateway call is made.

Given a refund in "processing" state
When another refund request arrives for the same order
Then the request is rejected with "refund already in progress"
  And no gateway call is made.

Given a refund in "processing" state
When the gateway times out
Then the refund status remains "processing"
  And a background job retries with exponential backoff
  And the retry uses the same idempotency key
  And after 3 failures, status moves to "failed"
  And an alert is sent to the payments team.

## Edge Cases
- Concurrent requests: Use SELECT FOR UPDATE on the order row
  before checking refundable balance
- Idempotency: Each refund attempt gets a UUID; gateway calls
  include this as the idempotency key
- Partial refund precision: All amounts in cents (integer),
  no floating-point
- Gateway reconciliation: Nightly job compares local refund
  records against gateway settlement report

## Rollback Plan
- Feature flag: refund_processing_v2
- Rollback disables new refund creation; in-flight refunds
  continue processing via the background job
- No database migration rollback needed (additive schema only)

This spec took 90 minutes to write and 30 minutes to review with the team. Every edge case that bit us in Path A? It's answered here — before a single line of code was written.

Clean refund state machine diagram showing states (pending, processing, succeeded, failed) with clear transitions and constraints — The state machine makes valid transitions explicit and invalid ones impossible

The implementation

Now we prompt the same AI — but with the spec as context:

The Prompt (with spec)

"Implement the refund feature described in this spec. Follow the
state machine exactly. Use SELECT FOR UPDATE for concurrency
control. Include the idempotency key in all gateway calls.
All amounts in cents."

[paste spec]

The output is structurally different. The AI generates:

A processRefund function wrapped in a database transaction with SELECT FOR UPDATE
Cumulative refund balance check inside the transaction
An idempotency key (UUID) generated at refund creation and passed to every gateway call
A background retry job with exponential backoff, capped at 3 attempts
Proper state transitions that match the spec's state machine
Input validation rejecting amounts that would exceed the refundable balance

Same AI. Same capability. Dramatically different output — because the input was dramatically different. The AI didn't get smarter; it got better constraints.

The same scenarios, pre-handled

Double refund? The SELECT FOR UPDATE lock prevents concurrent processing. The idempotency key prevents duplicate gateway charges. Both are in the first version, not patched in later.

Partial refund overflow? The cumulative balance check runs inside the same transaction as the insert. No race window.

Gateway timeout? The background job retries with the same idempotency key. The gateway treats it as a safe retry, not a new charge. After 3 failures, the team gets alerted.

Race condition? SELECT FOR UPDATE serializes all refund operations for a given order. The second request waits for the first to complete, then sees the updated balance.

Spec Coding workflow: Spec → Review → Code → Test, each step building on verified decisions from the previous step — Spec coding front-loads decisions, not code

Side by side

Dimension	Vibe Coding	Spec Coding
Time to first working endpoint	10 minutes	3 hours (incl. spec)
Time to production-ready	2+ weeks	4 hours
Bugs found in production	4 critical	0
Customer-facing incidents	2 (double refund, over-refund)	0
Code architecture	Patchwork of reactive fixes	Coherent, matches spec
AI output quality	Happy path only	Covers all specified edge cases
Onboarding a new developer	Read code + Slack threads + incident reports	Read the spec
Confidence in shipping	Low — "what else will break?"	High — acceptance criteria verified

The vibe coding path was "faster" for exactly 10 minutes. After that, it was slower in every dimension that matters.

The lesson isn't "don't use AI"

Both paths used the same AI. The difference was the input, not the tool. Vibe coding gives the AI freedom to make decisions you haven't made yet. Spec coding makes those decisions explicit first, then hands the AI a constrained problem with a clear definition of correct.

AI coding tools are force multipliers. But a force multiplier applied to an unclear direction multiplies the confusion. Applied to a clear spec, it multiplies the precision.

Vibe coding has its place — prototyping, exploration, throwaway scripts, hackathons. These are contexts where edge cases don't matter because the code won't see production traffic. The moment you're building something that handles real money, real users, or real data, you need the spec.

The 90 minutes I spent writing the refund spec saved two weeks of incident response. That's not a productivity trick. That's a fundamentally different way of working.

If you're ready to make the switch, the 30-day adoption plan is a good starting point. And if you want to see how specs improve AI-generated code specifically, the AI prompts guide goes deeper on prompt structure.

What I would actually ship first

For the refund feature, I would not start by building every refund branch. I would ship the smallest behavior that proves the contract: one full refund path, idempotency, and the support-visible state for pending provider confirmation.

First shippable slice:
- Full refund only
- Idempotency key required
- Pending provider confirmation is visible to support
- Duplicate click returns original refund_id
- Rollback disables refund creation but keeps status lookup

Deferred:
- Partial refunds
- Multi-currency adjustments
- Bulk refund actions
- Automated refund reason classification

That slice is smaller than the product dream but bigger than a UI demo. It proves the risky decisions before adding breadth.

Starter Review Block to Copy

Use this as the smallest practical artifact when a team is trying spec-first on a real change. It is deliberately short so it can live inside a ticket or PR.

Spec-first starter block: Same Feature, Two Approaches: Vibe Coding vs Spec Coding

Decision to make:
- Same order refund feature built two ways — vibe coding ships fast but drowns in edge-case bugs,…

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

Scope boundary: the reviewer must be able to reject unclear goals, missing non-goals, and criteria with no evidence.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Keywords: vibe coding · spec coding · spec-first development · AI coding · order refund · idempotency · state machine · edge cases

Generate specs interactively
Fill a form, get a complete feature spec in Markdown — free, no signup.

Try the Spec Generator