AI Coding Risk Register Before Merge

A pre-merge risk register specifically for AI-generated code: the seven categories of risk the LLM won't flag, the one-line risk note per PR, and what blocks merge vs raises a warning.

ProcessResources

Published on 2026-03-10 · Updated 2026-06-02 · 7 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

Why I Started Keeping a Risk Register Just for AI Code

I did not want another template. I wanted to stop getting surprised in production. After the third incident where an AI-generated PR shipped something I would never have shipped by hand, I accepted that the assistant and I were not weighing risk the same way. The assistant optimizes for plausible code that compiles. I am supposed to optimize for whether I can still sleep at night after merging it.

So I started writing a one-line risk note on every AI-assisted PR. Not a document. A single sentence that says what the change could break and what I did about it. That sentence is the entire register. It lives in the PR description, it is required by CI, and it is the thing I read first during review.

The Seven Categories the Model Will Not Flag

Seven buckets. Every PR gets scanned against all of them before a risk note is written:

Security. Injected secret leak, unsafe eval, untrusted input reaching a shell, auth bypass behind a helper. The model reaches for exec when a parser would do.
Data loss. DB writes, destructive migrations, DELETE without WHERE, file overwrites, schema drops. Data is a one-way door.
Performance. N+1 queries, unbounded loops, sync calls in a hot path, missing pagination. Fast on a demo dataset, catastrophic on the real one.
Correctness. Silent wrong answers. Off-by-one, wrong sign, wrong rounding, swallowed exceptions. The code runs; the result is subtly wrong.
Dependency. New packages, unfamiliar packages, packages with ten weekly downloads. The assistant imports almost anything.
Regulatory. PII, GDPR, HIPAA, PCI, anything touching payments. The model does not know your jurisdiction.
Operational. New queue, new cron, new webhook — anything that now needs monitoring and nothing is watching it yet.

What the One-Line Risk Note Actually Looks Like

Here is the exact format I use. It is boring on purpose. Boring is readable in a hurry.

Risk: adds GET endpoint with user-supplied ORDER BY — SQL injection class. Mitigation: whitelist columns.

That is the whole thing. A category cue, the specific thing that could go wrong, and what was done about it. If I cannot write that sentence, the PR is not ready. If the mitigation is "none," the PR is not ready either — that is a signal, not a shortcut.

Green, Yellow, Red: The Gate Matrix

Every PR gets a color from the seven categories. The color decides what happens next:

Green. No category trips. Pure refactor in a tested module, log tweak, doc change. Merge normally. A Risk: none line is still welcome.
Yellow. One or two categories trip with a clear mitigation. Write the note, proceed through normal review. Most real PRs.
Red. Any destructive data op, any new auth surface, any regulatory touch, or three-plus categories at once. Block merge. Redesign, split, or escalate. Red never gets waived inline.

The Author Writes the Note. The Reviewer Verifies It.

Non-negotiable: the person who prompted the AI writes the risk note. Not the reviewer, not the model, not a bot. The author has the context — what they asked for, what came back, what they accepted on faith. The reviewer's job shifts from "find the bugs" to "challenge the risk note." Did the author miss a category? Is the mitigation real or aspirational? Is yellow actually red?

This flip matters. Reviewers who start from the risk note catch more than reviewers who start from the diff, because the note exposes the author's reasoning. If the note is lazy, the code usually is too.

Prompting the AI to Surface Its Own Risks

Before I ask for the implementation, I ask for the risks. The prompt: before writing any code, list the three highest risks in this change and how you would mitigate each. The answers are uneven, but the exercise forces the model to inspect the change instead of just emitting it. I keep those three lines in a scratch comment while I write the real risk note.

There is a trap here, and it is the whole reason the register exists. AI-generated code tends to look low-risk. Calm variable names, polite comments, short functions. A human reader pattern-matches on tone. The register is a forcing function against that bias: the note gets written regardless of how friendly the code looks.

A Real Example: The Unscoped User-Search Endpoint

A recent PR added a GET /api/users/search?q= endpoint. The code was clean, tests passed, the assistant had written a plausible parameterized LIKE query. The author added a light wrapper and opened the PR.

Writing the risk note caught it. Security tripped, then performance, then regulatory — and the author sat with that last one before realizing the query was not scoped by tenant. Anyone signed into any account could search every user in the system. The code worked exactly as written. It was also a cross-tenant data leak waiting to happen. The risk note made it red. The fix was two lines. The incident that did not happen is the one I care about.

Given/When/Then Acceptance Criteria for the Register Itself

The register only works if it is a CI check, not a polite suggestion. Here is how I specify it:

Given an AI-assisted pull request is opened
When the PR description does not contain a line matching "Risk:"
Then CI fails with a message pointing to the risk register policy

Given a PR is marked red by the risk matrix
When the author attempts to merge without a linked redesign issue
Then the merge is blocked until a reviewer with the risk-gate role approves

Given a PR passes all seven category checks with no trips
When the author writes "Risk: none"
Then the PR may proceed through normal review at yellow-equivalent speed

The check itself is dumb on purpose. It looks for the literal string Risk: in the PR body. Judgment is still human. The gate just refuses to let you skip the judgment.

The Weekly Sweep: Reading the Register as a Dataset

Every Friday I scan the week's risk notes — not the PRs, the notes. Patterns show up fast. If half the yellow notes say "N+1" we have a query-layer problem, not a review problem. If "new dependency" keeps appearing, someone needs a package-vetting policy. If regulatory trips never show up from a team, they are either lucky or not looking. The register stops being paperwork and starts being a weak-signal sensor. That is when it earns its keep — not on any single PR, but in the aggregate. My rule for the register is the rule I have for the code it guards: if it is not helping me decide something, it should not exist.

AI Review Packet to Copy

Use this when the assistant produced a plausible diff and the team needs to decide what could still go wrong. The risk note belongs in the PR before approval, not after a reviewer gets nervous.

AI coding review packet: AI Coding Risk Register Before Merge

Decision to make:
- A pre-merge risk register specifically for AI-generated code: the seven categories of risk the LLM won't flag, the one-line risk note per PR, and what blocks merge vs raises a warning.

Owner check:
- Product owner:
- Engineering owner:
- QA or operations reviewer:

Scope boundary:
- In scope:
- Out of scope:
- Assumption that still needs approval:

Acceptance evidence:
- Test or fixture:
- Log, metric, or screenshot:
- Manual review step:

AI boundary: generated changes must stay inside the written scope and attach evidence for each acceptance criterion.

Reviewer prompt:
- What would still be ambiguous to someone who missed the planning meeting?
- What evidence would make this safe enough to ship?

Risk register example from an AI-generated change

For AI code, I want the risk note to name the exact behavior the model might have invented. “Looks okay” is not a review artifact.

Risk	Why AI may miss it	Merge gate
Search endpoint returns users across tenants.	The prompt said “admin search” but did not name tenant boundary.	Permission test using two tenant fixtures.
Migration backfills null role as admin.	The model inferred a default from sample data.	Migration dry-run plus row-count diff.
Retry loop duplicates notification sends.	Generated code retries the side effect, not the idempotent operation.	Replay test proves one notification per event_id.

Case study: the AI-generated tenant leak

The generated code passed every happy-path test. The risk register caught the missing tenant boundary because reviewers had to name what data could cross accounts.

Risk note before merge:
- Change: add user search endpoint for support console
- AI-added query: SELECT * FROM users WHERE email ILIKE $1
- Missing boundary: tenant_id and support role scope
- Blast radius: any support user can search across all tenants
- Required evidence:
  - cross-tenant denial test
  - audit log includes actor_tenant_id
  - query plan shows tenant_id predicate
  - reviewer signs off on allowed support roles

Keywords: AI code review · pre-merge risk register · AI-generated code safety · PR risk note · merge gate