API error taxonomy: from cleanup request to contract spec

This case shows how a vague "improve API errors" request becomes a stable contract that SDKs, AI-generated clients, and reviewers can depend on.

RiskClient compatibility and retries
BoundaryEnvelope only, no endpoint behavior change
EvidenceFixtures, SDK sample, OpenAPI examples

The request before the spec

Weak ticket

Improve API errors.
Make them easier for clients to handle.

Spec-first rewrite

Feature: Stable API error taxonomy
Owner: API Platform
Status: Draft for review

Goal:
- Standardize error code, category, message, trace_id, and retryable.
- Keep existing HTTP status behavior.
- Help generated clients branch without string parsing.

Non-goals:
- No endpoint business-logic changes.
- No removal of legacy fields during the first release.
- No new auth policy.

The contract that stops hidden breakage

error-envelope.md

{
  "error": {
    "code": "ORDER_ALREADY_EXISTS",
    "category": "conflict",
    "message": "An order already exists for this idempotency key.",
    "trace_id": "req_01H...",
    "retryable": false,
    "details": {}
  }
}

acceptance-criteria.md

- Given validation fails
  When the API returns 422
  Then error.category is validation and details.field_errors exists.

- Given a duplicate idempotency key
  When the API returns 409
  Then error.retryable is false and code is stable.

compatibility.md

- Keep legacy top-level message for one release.
- Add new envelope fields behind additive response change.
- Publish SDK fixture before turning examples into docs.
- Log unknown categories during migration.

test-evidence.md

Automated:
- 400/401/409/422 contract fixtures
- SDK parser compatibility test
- OpenAPI example snapshot

Manual:
- release note reviewed by client support
- logs show category and trace_id

Why this belongs in a spec packet

It separates shape from behavior

The spec limits the change to the error envelope so implementation cannot drift into endpoint semantics.

It protects generated clients

Stable categories and retryable flags give AI-generated SDKs a machine-readable branch point.

It creates migration evidence

Fixtures, examples, and logs make compatibility visible before rollout, not after support tickets arrive.

Use this pattern before changing API responses

Generate a packet first, then pair it with the API contract checklist so every response example has evidence.

Editorial note

This case focuses on API contract safety: error formats are treated as public behavior, not as internal cleanup.