Contract Testing Plan: From OpenAPI to CI

My rule: if an OpenAPI spec is not wired into CI as an executable contract, it is decoration. A spec that no test asserts against will drift within two sprints. This is how I take a YAML file sitting in a repo and turn it into a gate that blocks pull requests when the provider stops matching the promise it made to its consumers.

API ContractsProcess

Published on 2026-03-01 · Updated 2026-05-11 · 9 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

The gap unit and integration tests keep missing

Unit tests pass. Integration tests pass. Then a mobile client crashes in production because the status field your service returns as "PAID" became "paid" after someone inlined an enum refactor. No test caught it because no test was looking at the wire-level shape. That is the gap contract tests fill.

What I want contract tests to catch that the other layers cannot: field drift (a rename, a type change, a nullability flip), response shape regression (a consumer depended on items[] always being present, now it is omitted when empty), error code surface (a 409 became a 422 because a library upgrade changed default behavior), and header contracts (idempotency keys, pagination cursors, content types). None of those usually show up in a green unit suite.

Consumer-driven vs provider-driven: pick based on who calls you

Two schools, and I have shipped both. Provider-driven tools like Schemathesis and Dredd take your OpenAPI spec and pound the running server with generated requests. Consumer-driven tools like Pact flip it: each consumer writes a test against a mock of the provider, publishes the interactions to a broker, and the provider verifies those interactions against its real server on every build.

My heuristic: if you own all the consumers (one mobile app, one web app, one internal worker), go consumer-driven with Pact. You get tests that prove exactly the subset of the contract your consumers actually use, which is nearly always a small slice of the full OpenAPI surface. If you run a public API with unknown consumers, go provider-driven with Schemathesis. You have no choice but to validate the full declared contract because anyone could be depending on any field. For most internal platform teams I have worked with, the answer is "both" — Pact for known consumers, Schemathesis fuzzing against the spec for everything else.

Generating tests from the spec, not writing them by hand

Hand-written contract tests rot. Generated ones stay honest because the spec is the source. The pipeline I run:

Schemathesis reads openapi.yaml and generates property-based test cases: every endpoint, every status code branch, every schema boundary. One command, thousands of cases.
Prism stands up a mock server from the same spec so consumer tests can run without a real backend. Critically, Prism also validates the consumer's requests against the spec, so if the FE sends a malformed body, the mock rejects it.
Dredd handles examples-based checks when the spec includes explicit examples: blocks. Lower coverage than Schemathesis, but useful for pinning documented happy paths.

I keep the generated suite in its own CI job named contract, separate from unit and integration, so a failure here is immediately legible as "the wire shape moved."

Flaky fixtures are the real enemy

The fastest way to get contract tests deleted by a frustrated team is to let them flake. Every flake I have debugged traced back to one of four things, and all four have boring fixes.

Non-deterministic IDs. Freeze them. Seed the test database with known UUIDs. Never assert equality against a generated id; assert it matches ^[0-9a-f-]{36}$ via a schema matcher.
Time. Freeze the clock with libfaketime or the language's equivalent. Record pacts against a frozen instant. Every created_at assertion should be a type check, not a value check.
Ordering. Collections in responses need a stable sort from the provider. If your endpoint returns items in whatever order the DB felt like today, contract tests will catch it — but in the bad way where they flake intermittently.
External calls. Contract tests verify your contract, not Stripe's. Stub every third-party call. If a contract test reaches out to the internet, it is misdesigned.

The CI gate: fail the PR if the provider diverges

Here is the pattern that actually changes behavior. The OpenAPI spec is checked into the provider repo as openapi.yaml. On every PR, the contract job:

Boots the service against a test database with deterministic seeds.
Runs Schemathesis against the running server using the committed spec.
Pulls the latest pacts for this provider from the Pact Broker and verifies them.
Fails the build if any verification fails, with a comment on the PR naming the consumer and the broken interaction.

This is the gate. A developer changing the API sees, inside their PR, that the mobile team's checkout flow expects total_cents and their refactor renamed it to amount. They fix it before merge. They do not find out next Tuesday from a Slack message.

A concrete example of the loop working

Last quarter, the web team wanted to show a loyalty_tier field on the order confirmation page. They added it to their Pact test as an expected field in the response, ran pact publish, and pushed their PR. The web PR could not merge yet because their feature flag was off, but the pact was already in the broker.

Next day, a backend engineer opened an unrelated PR on the provider. The contract job ran the broker's pacts and failed: the web consumer expected loyalty_tier, the provider did not return it. The backend engineer saw a clear message — "[email protected] expects loyalty_tier on GET /orders/{id}, not present in response" — and either implemented the field or coordinated with the web team to gate the pact behind a version tag. Nothing shipped broken. The consumer announced its need; the provider had to respond. That is the whole value proposition.

Acceptance criteria in Given/When/Then

I write contract acceptance criteria in the same shape as BDD scenarios. It forces me to name the consumer, the trigger, and the observable.

Given the web-app consumer has published a pact expecting loyalty_tier
  on GET /orders/{id}
When the provider CI job verifies the pact against the running server
Then the response body must include loyalty_tier as a string
  And the field must be one of: "bronze", "silver", "gold", "platinum"
  And the verification must fail the PR if the field is absent or malformed

Mocks from the spec vs real server in staging: run both

There is a tension worth naming. A Prism mock generated from the spec will happily return whatever the spec says, including shapes the real server has never produced. A real staging server returns truth but is slow and stateful. I run both in sequence: spec-mock tests in the fast feedback loop (every PR, under a minute), real-server verification in a nightly job against staging. When they disagree, the spec or the implementation is wrong, and that disagreement is itself the signal.

Versioning pacts and what developers see when things fail

One pact file per consumer per consumer-version, tagged in the broker with the consumer's git SHA and environment (dev, prod). The provider verifies against the prod-tagged pacts in its own main branch CI, and against all pacts (including unreleased consumer branches) on PR builds. This lets consumers experiment without breaking provider main, while still surfacing future incompatibilities early.

When a contract test fails in CI, the developer sees three things in the PR comment: which consumer and version broke, the exact interaction (method, path, expected vs actual body diff), and a link to the broker showing the full pact. No log diving. The failure is self-explanatory or it is not worth having.

What I would not skip

If you do nothing else: commit the OpenAPI spec, run Schemathesis against it in CI on every PR, and treat any schema diff between spec and implementation as a blocking failure. That alone catches 70% of the drift. Add Pact when you have more than one consumer and the coordination cost of "tell me before you change the API" starts eating sprint capacity. The point is not coverage for its own sake — it is making the spec load-bearing so that it stays true.

Field note: the contract failure message that matters

A contract test is only useful if the developer can act on the failure quickly. I prefer PR comments that name the consumer, endpoint, expected field, actual response, and the spec line that drifted.

Contract failure:
Consumer: [email protected]
Endpoint: GET /orders/{id}
Expected: response.body.refund_status enum includes "pending"
Actual: field missing
Spec: api/orders.yaml line 87
Action: add field, version the consumer expectation, or remove the pact before merge

This avoids the worst contract-testing failure mode: a red CI job that sends engineers hunting through logs instead of fixing the contract.

Contract Review Packet to Copy

Use this when the work touches API behavior, schema, events, retries, or consumer expectations. The packet makes compatibility and release evidence explicit.

API contract review packet: Contract Testing Plan: From OpenAPI to CI

Decision to make:
- Add contract testing from OpenAPI to CI with generated tests, provider checks, consumer expectations, and reliable fixtures.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

Contract boundary: no release without compatibility classification, consumer impact, retry behavior, and rollback notes.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Flagship Use Path

This is one of the primary Spec Coding references for Contract testing rollout. Use it with a real ticket, pull request, or release review instead of treating it as background reading.

Start here when: a team wants schema changes to fail in CI before they fail in integration.
Copy this: the OpenAPI-to-CI rollout plan.
Evidence to attach: provider checks, consumer fixtures, failing-example output, and owner for fixing breaks.
Pair it with: API Contracts Hub and API Contract Checklist.

Flagship review path:
- Open this page during planning or review.
- Copy the relevant artifact into the work item.
- Replace example values with your system, owner, and failure mode.
- Block implementation if the evidence line is still blank.

Second-pass reviewer note: CI should teach the developer what broke

I reviewed this article for practical specificity. A contract test is only worth the CI time if its failure message points to the consumer, interaction, and next action.

CI failure must include:
- Consumer name and version
- Provider endpoint and method
- Expected response or request fragment
- Actual fragment
- Spec or pact location
- Suggested owner for the fix

Editorial Review Note

Reviewed Apr 29, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.

Keywords: contract testing · OpenAPI · Pact · Schemathesis · consumer-driven contracts · CI pipeline

Topic Path

This article belongs to the API Contracts track. Start with the hub, then use the checklist, template, or tool below on a real project.

Keep Reading

Editorial Note

Last reviewed Apr 29, 2026: examples, internal links, and reusable review blocks were checked for practical specificity.

Author details: Spec Coding Editorial Team
Editorial policy: How we review and update articles
Corrections: Contact the editor

Consolidated Coverage

This canonical guide now covers several related notes that used to live as separate pages. Keeping them together makes Contract Testing Plan: From OpenAPI to CI easier to review, link, and use as the main reference.

Postmortem: A Missing Contract Test That Broke Production
Spec-Driven Frontend-Backend Alignment