Contract-First SDK Generation with Human Review
How to generate SDKs from your API contract without shipping a robot-authored client that feels cold: what to generate, what to hand-author, and the review gate that matters.
Review Note
Reviewed May 6, 2026. This focused reference is now promoted as a search-indexable companion to the API Contracts Hub. It includes concrete review artifacts, failure modes, and next-step links for readers applying the topic in practice.
The SDKs I Actually Want to Use Are Never Fully Generated
I have shipped SDKs that were fully generated from an OpenAPI spec, and I have shipped SDKs that were fully hand-authored. Both were mistakes. The generated one worked, technically, but read like a translated manual and every integration engineer rebuilt the same convenience layer in their own codebase. The hand-authored one was lovely until the API added fifteen endpoints and the SDK lagged by two minor versions.
The shape I keep returning to is two layers: a generated low-level client that tracks the contract exactly, plus a hand-authored high-level surface that makes the thing feel native to its host language. The review gate sits between them.
What Generators Are Genuinely Good At
When I feed an OpenAPI 3.1 document into a generator, I want four things and nothing more: types that match the schema exactly (including nullable unions and discriminated polymorphism), method signatures that mirror operationId, request and response serialization (including the annoying bits like date-time coercion and base64 blobs), and pagination primitives that return a single page plus a cursor token.
Generators do those things faster and more correctly than I ever will. When the spec changes, I regenerate and the type errors show me every call site that moved. That alone justifies the toolchain.
What Generators Are Bad At, and Why I Stopped Asking
Error ergonomics. Retry policy. Sensible defaults. Idiomatic naming. Anything that requires taste.
A generator will happily produce a Python method called listInvoicesV2 because that is what the operationId said. It will return a HTTPValidationError object with a detail field that is a list of ValidationErrorDetail. It will not retry on 503, because the spec did not say to. It will not refresh your OAuth token, because the spec did not say to. All of that is correct behavior for a generator, and all of it is wrong behavior for an SDK.
The Two-Layer Split I Ship
The generated package lives at acme._generated. It is not part of the public import surface. Nothing in our docs references it. If a user imports from it, they are on their own and they know it.
The public surface lives at acme. Every public method is hand-authored. It calls into the generated layer, catches generated exceptions, rewraps them in acme.errors, and returns Python-native shapes. This split lets the generated code be dumb and fast-moving while the hand layer stays slow and tasteful.
A Concrete Example: Invoices, Pagination, and Pain
Say the API exposes GET /v2/invoices with a page_token query parameter and a next_page_token in the response. The generator produces something like this:
response = client.invoices.list_invoices_v2(page_token=None, limit=50) # response.data: list[Invoice] # response.next_page_token: str | None
Usable? Sure. Idiomatic Python? No. Nobody wants to hand-roll a while-loop around a cursor token in application code. So the hand layer adds:
for invoice in client.invoices.list_all(status="open"):
process(invoice)
list_all is twenty lines of hand-authored code. It calls the generated method in a loop, yields each item, handles the cursor, and raises acme.errors.RateLimitError if the generated layer surfaces a 429. The user never sees next_page_token or list_invoices_v2. They get an iterator, which is what Python people want.
Errors: From Parsed Payloads to Language-Native Exceptions
The generated layer returns an error object. My hand layer converts it into something the host language actually uses. In Python, a real exception hierarchy: AcmeError at the root, then APIError, AuthError, RateLimitError, ValidationError, NotFoundError. In Rust or Go, a Result type with a typed error enum. In TypeScript, a discriminated union the caller can narrow on.
Users should never catch a HTTPValidationError. They should catch acme.ValidationError, inspect err.field_errors, and move on. That translation is the single most valuable thing the hand layer does.
Retries, Auth, and Naming Live in the Hand Layer
I do not let generated code retry. Ever. The generated client fires one request and reports what happened. Retry policy is a product decision: which status codes retry, which methods are safe to retry, how backoff behaves, whether we emit a metric on each attempt. All of that lives in hand-authored middleware wrapping the generated calls.
Auth is the same. The generator exposes a hook to inject a header. The hand layer owns token refresh, credential storage, and rotation. When a token expires mid-request, the hand layer catches the 401, refreshes, and retries once. The generated layer has no idea any of that happened.
Naming follows the same rule. Operation IDs are written for URL routing, not for what a developer types at 2am. listInvoicesV2 becomes invoices.list. A generated name escaping into the public surface is a bug.
Versioning the SDK Separately from the API
The API version and the SDK version diverge almost immediately. The API is on v2.17 and the Python SDK is on 4.3.1 because the SDK shipped a breaking change to the pagination iterator last quarter that had nothing to do with the API. That is fine. The SDK documents which API version it was generated against via an __api_version__ constant, and the hand layer is allowed to evolve on its own cadence.
The Review Gate Before Publishing
Regeneration is automated. Publishing is not. Before anything hits PyPI or npm, a human reviews the generated diff. Not the hand layer diff (that is a normal PR), the generated diff.
I am looking for surprises. Did an endpoint rename a field? Did a response type quietly become optional? Did a new required parameter appear? Did an enum lose a variant? Most regenerations are boring, but once a quarter something shows up that would have broken users silently, and catching it here is the difference between a clean release and a support fire.
Acceptance Criteria for the SDK, in Given/When/Then
- Given an OpenAPI spec with a new required query parameter When the generator runs and the diff is reviewed Then the review gate flags the breaking change and blocks publish - Given an access token that expires mid-request When the generated layer surfaces a 401 Then the hand layer refreshes once and retries transparently - Given a paginated endpoint When a user writes `for item in client.invoices.list_all()` Then the iterator yields every item across pages without exposing cursor tokens
What I Would Tell a Team Starting Today
Do not ship the generated output as your SDK. Ship it as the engine inside your SDK. Spend the time on the hand layer, even if it means fewer endpoints at launch, because a small idiomatic SDK beats a complete robot-authored one. Put a human at the review gate before every publish. Version the layers independently. And when someone says "can we just expose the generated client directly, it is faster," say no, and mean it.
Contract Review Packet to Copy
Use this when the work touches API behavior, schema, events, retries, or consumer expectations. The packet makes compatibility and release evidence explicit.
API contract review packet: Contract-First SDK Generation with Human Review Decision to make: - How to generate SDKs from your API contract without shipping a robot-authored client that feels cold: what to generate, what to hand-author, and the review gate that matters. Owner check: - Product owner: - Engineering owner: - QA or operations reviewer: Scope boundary: - In scope: - Out of scope: - Assumption that still needs approval: Acceptance evidence: - Test or fixture: - Log, metric, or screenshot: - Manual review step: Contract boundary: no release without compatibility classification, consumer impact, retry behavior, and rollback notes. Reviewer prompt: - What would still be ambiguous to someone who missed the planning meeting? - What evidence would make this safe enough to ship?
Editorial Review Note
Reviewed Apr 28, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.
Topic Path
Keep Reading
Editorial Note
Last reviewed May 6, 2026: topic paths, examples, internal links, and reusable review blocks were checked for practical specificity.
- Author details: Spec Coding Editorial Team
- Editorial policy: How we review and update articles
- Corrections: Contact the editor