Designing API Specs for Agentic Clients

Designing API Specs for Agentic Clients
Spec Coding Editorial Team · Spec-first engineering notes

Most API specs were written for human developers who read docs carefully, infer intent from context, and pause when something feels dangerous. Agentic clients do none of that. If your spec assumes a thoughtful human reader, an LLM-driven agent will find the ambiguity and run straight at it, 400 times in a loop, with zero hesitation.

Published on 2026-03-10 · 6 min read · Updated May 6, 2026 · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

Review Note

Reviewed May 6, 2026. This focused reference is now promoted as a search-indexable companion to the AI Coding Governance Hub. It includes concrete review artifacts, failure modes, and next-step links for readers applying the topic in practice.

The Caller Is No Longer a Person

I spent a decade writing API docs for humans. A field called id was fine because any developer reading it would trace one curl example and figure out the format. Put the same field in front of an agent and you get three different hallucinated shapes in the first ten calls: UUIDs, integers, email addresses the model inferred from the resource name. The caller is a statistical pattern matcher with no shame about guessing. That single change invalidates most of the affordances we rely on.

I now assume every parameter will be read by something that will confidently invent a value if the description is thin. That assumption reshapes what I put on the page.

Descriptions Carry the Whole Contract

The biggest single improvement I have shipped is writing one full sentence of semantic description for every field, parameter, and operation. customer_id: string is useless. customer_id: Stripe customer ID, format cus_XXXX, case-sensitive, 14 to 18 chars after the prefix; obtain via /customers.list, never fabricate is usable. The "never fabricate" part is not theater. Agents read those instructions and comply more often than you would expect, especially when the description names the upstream source.

The same principle applies to enums. Do not list ["active", "pending", "cancelled"] and stop. Describe what each state means, which transitions are legal, and which one the agent is most likely to need. The model will route to the right value if you say so out loud.

Idempotency Is Not Optional Anymore

Human clients retry once, maybe twice, then escalate to a person. Agents retry aggressively, often on ambiguous responses they should have treated as success. I now require an Idempotency-Key header on every non-GET endpoint and reject requests without one. The server stores the first response for 24 hours and replays it on duplicates. This is not an agent feature, it is self-defense. Without it, a single misread 502 becomes three duplicate charges and a refund ticket.

Dry-Run By Default On Anything Destructive

Every destructive operation in my current API accepts ?dry_run=true. The response is identical in shape to the real call but with "dry_run": true and no side effect. Agents are trained, via system prompts and tool descriptions, to dry-run first on unfamiliar calls. This catches the "I think this deletes a row but let me find out" class of mistake before it becomes an incident.

The spec has to say this is free and safe. If the agent thinks dry-run costs a billable request or counts against a rate limit, it will skip it under pressure.

Two-Phase Confirmation For Irreversible Actions

Last quarter an agent integration hit our DELETE /projects/{id} endpoint 400 times in a runaway loop. The loop was triggered by a planner that misread "archived" as "deletable" and kept calling until the rate limiter stopped it. We lost no data because the endpoint was already behind a two-phase confirmation pattern, but the incident review was still ugly.

The pattern: the first POST returns HTTP 202 with a confirmation_token, a summary field describing exactly what will be destroyed, and an expires_in_seconds value (I use 30). The agent must resubmit the same call with the token to actually execute. The token is single-use and scoped to the exact resource. An agent in a tight loop burns its tokens on the first call of each pair and never reaches execution. That one pattern converted a near-disaster into a log entry.

Given an agent calling DELETE /projects/prj_123
  When the first request arrives without a confirmation_token
  Then return 202 with { confirmation_token, summary, expires_in_seconds: 30 }
  And record no destructive side effect

Given a second request arrives with a valid, unexpired token
  When the token matches the exact resource and agent identity
  Then execute the deletion and invalidate the token

Given a second request arrives with an expired or mismatched token
  When the server validates the token
  Then return 409 with a human-readable explanation and require a fresh confirmation

Capability Negotiation And Discoverability

Agents do not browse your documentation site. They load whatever the system prompt points them at and work from there. I expose a /_capabilities endpoint that returns the list of operations, the current rate limits for the calling identity, the cost per call in tokens or cents, and the feature flags in effect. Agents that check this first make dramatically fewer wrong calls.

In parallel, my OpenAPI spec uses explicit operationId values written in verb_noun form (list_projects, archive_project, not projectsGet2). The operationId is what the agent sees in its tool list, so it is the real name of the endpoint as far as the model is concerned.

Cost And Rate Visibility In Every Response

Humans learn rate limits by getting throttled once. Agents need the number in the response. Every response includes X-RateLimit-Remaining, X-RateLimit-Reset, and for paid endpoints X-Cost-Units. The spec documents these as part of the contract, not as incidental headers. When the agent can see it is at 4 remaining calls with a 60-second reset, it backs off. When it cannot, it does not.

Error Messages That Suggest The Fix

I rewrote every 4xx response body to follow a fixed shape: { error_code, message, suggested_fix, docs_url }. The suggested_fix is a sentence the agent can act on. "Invalid customer_id format" becomes "Invalid customer_id; expected format cus_XXXX, received [email protected]; call /customers.search with the email to resolve it." Agents recover from errors like that in one turn instead of five, and they never leak internal stack traces back to end users because there are none to leak.

Observability For Agent Traffic

I tag every request with the agent identity (via a required X-Agent-ID header) and log the dry-run ratio, confirmation-token burn rate, and retry count per identity. When a new integration goes live I watch those three numbers. A healthy agent integration runs at 30 to 60 percent dry-run ratio during exploration, drops to under 10 percent in steady state, and almost never fails the second phase of confirmation. Anything outside those bands is a spec problem, not an agent problem, and it tells me exactly which description to rewrite.

What I Would Tell Myself A Year Ago

Specs for agentic clients are not "specs plus examples." They have different defaults: idempotency required, dry-run free, destructive ops two-phase, every field in a full sentence, capabilities discoverable, costs visible, errors actionable, identity tagged. Once I accepted that the API is now the prompt, every other decision got simpler.

Contract Review Packet to Copy

Use this when the work touches API behavior, schema, events, retries, or consumer expectations. The packet makes compatibility and release evidence explicit.

API contract review packet: Designing API Specs for Agentic Clients

Decision to make:
- How API specs need to change for LLM-powered agentic clients: discoverability, idempotency by default, dry-run support, semantic descriptions, and the 'are you sure?' pattern for destructive ops.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

Contract boundary: no release without compatibility classification, consumer impact, retry behavior, and rollback notes.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Editorial Review Note

Reviewed Apr 28, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.

Keywords: agentic API design · LLM tool use · idempotency keys · two-phase confirmation · OpenAPI operationId · dry-run endpoints

Editorial Note

Last reviewed May 6, 2026: topic paths, examples, internal links, and reusable review blocks were checked for practical specificity.