AI Coding Risk Register Before Merge

AI Coding Risk Register Before Merge
Spec Coding Editorial Team · Spec-first engineering notes

A pre-merge risk register specifically for AI-generated code: the seven categories of risk the LLM won't flag, the one-line risk note per PR, and what blocks merge vs raises a warning.

Published on 2026-03-10 · Updated 2026-05-11 · 7 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

Why I Started Keeping a Risk Register Just for AI Code

I did not want another template. I wanted to stop getting surprised in production. After the third incident where an AI-generated PR shipped something I would never have shipped by hand, I accepted that the assistant and I were not weighing risk the same way. The assistant optimizes for plausible code that compiles. I am supposed to optimize for whether I can still sleep at night after merging it.

So I started writing a one-line risk note on every AI-assisted PR. Not a document. A single sentence that says what the change could break and what I did about it. That sentence is the entire register. It lives in the PR description, it is required by CI, and it is the thing I read first during review.

The Seven Categories the Model Will Not Flag

Seven buckets. Every PR gets scanned against all of them before a risk note is written:

What the One-Line Risk Note Actually Looks Like

Here is the exact format I use. It is boring on purpose. Boring is readable in a hurry.

Risk: adds GET endpoint with user-supplied ORDER BY — SQL injection class. Mitigation: whitelist columns.

That is the whole thing. A category cue, the specific thing that could go wrong, and what was done about it. If I cannot write that sentence, the PR is not ready. If the mitigation is "none," the PR is not ready either — that is a signal, not a shortcut.

Green, Yellow, Red: The Gate Matrix

Every PR gets a color from the seven categories. The color decides what happens next:

The Author Writes the Note. The Reviewer Verifies It.

Non-negotiable: the person who prompted the AI writes the risk note. Not the reviewer, not the model, not a bot. The author has the context — what they asked for, what came back, what they accepted on faith. The reviewer's job shifts from "find the bugs" to "challenge the risk note." Did the author miss a category? Is the mitigation real or aspirational? Is yellow actually red?

This flip matters. Reviewers who start from the risk note catch more than reviewers who start from the diff, because the note exposes the author's reasoning. If the note is lazy, the code usually is too.

Prompting the AI to Surface Its Own Risks

Before I ask for the implementation, I ask for the risks. The prompt: before writing any code, list the three highest risks in this change and how you would mitigate each. The answers are uneven, but the exercise forces the model to inspect the change instead of just emitting it. I keep those three lines in a scratch comment while I write the real risk note.

There is a trap here, and it is the whole reason the register exists. AI-generated code tends to look low-risk. Calm variable names, polite comments, short functions. A human reader pattern-matches on tone. The register is a forcing function against that bias: the note gets written regardless of how friendly the code looks.

A Real Example: The Unscoped User-Search Endpoint

A recent PR added a GET /api/users/search?q= endpoint. The code was clean, tests passed, the assistant had written a plausible parameterized LIKE query. The author added a light wrapper and opened the PR.

Writing the risk note caught it. Security tripped, then performance, then regulatory — and the author sat with that last one before realizing the query was not scoped by tenant. Anyone signed into any account could search every user in the system. The code worked exactly as written. It was also a cross-tenant data leak waiting to happen. The risk note made it red. The fix was two lines. The incident that did not happen is the one I care about.

Given/When/Then Acceptance Criteria for the Register Itself

The register only works if it is a CI check, not a polite suggestion. Here is how I specify it:

Given an AI-assisted pull request is opened
When the PR description does not contain a line matching "Risk:"
Then CI fails with a message pointing to the risk register policy

Given a PR is marked red by the risk matrix
When the author attempts to merge without a linked redesign issue
Then the merge is blocked until a reviewer with the risk-gate role approves

Given a PR passes all seven category checks with no trips
When the author writes "Risk: none"
Then the PR may proceed through normal review at yellow-equivalent speed

The check itself is dumb on purpose. It looks for the literal string Risk: in the PR body. Judgment is still human. The gate just refuses to let you skip the judgment.

The Weekly Sweep: Reading the Register as a Dataset

Every Friday I scan the week's risk notes — not the PRs, the notes. Patterns show up fast. If half the yellow notes say "N+1" we have a query-layer problem, not a review problem. If "new dependency" keeps appearing, someone needs a package-vetting policy. If regulatory trips never show up from a team, they are either lucky or not looking. The register stops being paperwork and starts being a weak-signal sensor. That is when it earns its keep — not on any single PR, but in the aggregate. My rule for the register is the rule I have for the code it guards: if it is not helping me decide something, it should not exist.

AI Review Packet to Copy

Use this before an AI-generated diff reaches code review. It turns the prompt, the allowed scope, and the required proof into one reviewable artifact.

AI coding review packet: AI Coding Risk Register Before Merge

Decision to make:
- A pre-merge risk register specifically for AI-generated code: the seven categories of risk the LLM won't flag, the one-line risk note per PR, and what blocks merge vs raises a warning.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

AI boundary: generated changes must stay inside the written scope and attach evidence for each acceptance criterion.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Editorial Review Note

Reviewed Apr 28, 2026. This update added a reusable artifact, checked the article against the related topic hub, and tightened the next-step links so the page works as a practical reference rather than a standalone essay.

Risk register example from an AI-generated change

For AI code, I want the risk note to name the exact behavior the model might have invented. “Looks okay” is not a review artifact.

RiskWhy AI may miss itMerge gate
Search endpoint returns users across tenants.The prompt said “admin search” but did not name tenant boundary.Permission test using two tenant fixtures.
Migration backfills null role as admin.The model inferred a default from sample data.Migration dry-run plus row-count diff.
Retry loop duplicates notification sends.Generated code retries the side effect, not the idempotent operation.Replay test proves one notification per event_id.
Keywords: AI code review · pre-merge risk register · AI-generated code safety · PR risk note · merge gate

Editorial Note

Last reviewed May 6, 2026: topic paths, examples, internal links, and reusable review blocks were checked for practical specificity.