Large report export: from CSV button to async spec packet

This case turns a vague export request into a bounded background job workflow with permissions, job states, file expiry, and evidence reviewers can inspect.

RiskTimeouts, data leaks, queue pressure
BoundaryCSV export only, no report redesign
EvidenceJob tests, timing, permission checks

The request before the spec

Weak ticket

Add CSV export to the orders report.
It should work for large accounts and email the file when ready.

Spec-first rewrite

Feature: Async orders report CSV export
Owner: Reporting
Status: Draft for review

Goal:
- Let account admins export the current orders report as CSV.
- Run exports as background jobs for datasets over 10,000 rows.
- Notify the requesting user when the file is ready.

Non-goals:
- No report filter redesign.
- No new analytics columns.
- No cross-account export.
- No real-time streaming download.

The packet reviewers actually need

spec.md

Constraints:
- Export uses the same filters visible on screen.
- Requesting user must belong to the account.
- File expires after 7 days.
- CSV columns match the current report table.

Risks:
- long-running query pressure
- stale filter snapshot
- download link shared outside account

tasks.md

- [ ] Capture report filter snapshot.
- [ ] Create export_jobs table and state machine.
- [ ] Add worker for CSV generation.
- [ ] Add signed download URL with expiry.
- [ ] Add admin UI states: queued, running, ready, failed.
- [ ] Add queue and failure metrics.

acceptance-criteria.md

- Given an admin filters orders
  When they request export
  Then the job uses the same filter snapshot.

- Given a non-admin user
  When they request or download export
  Then access is denied.

- Given a 500k row report
  When the export runs
  Then the web request returns quickly and the worker completes asynchronously.

evidence.md

Automated:
- export permission test
- filter snapshot test
- job state transition test
- expired download URL test

Operational:
- 500k row timing result
- queue depth dashboard
- failed job alert link

Async workflow the spec should protect

StateAllowed transitionReviewer evidence
queuedCreated after permission and filter snapshot validation.Test proves the job stores account id, user id, and filters.
runningWorker claims one job and writes progress metadata.Test proves two workers cannot process the same job.
readyCSV is stored and a signed URL is generated.Evidence proves URL expires and account permissions are checked.
failedWorker records a safe error and user can retry.Alert and failed-job metric are linked in the PR.

Reviewer walkthrough

The weak request says "add CSV export", but the actual risk is not the button. The risk is what happens after the button: a long query blocks the web request, filters change while the job runs, a file link outlives the user's access, or a background failure silently disappears.

The spec packet makes the export a state machine instead of a UI flourish. That matters because background jobs fail in different places from normal requests. Reviewers need to see permission checks, filter snapshot behavior, file expiry, queue monitoring, and a failed-job path before they can trust the feature.

Permission check

The request and the download must both verify the user belongs to the account. Checking only at request time is not enough.

Snapshot check

The exported CSV should match the filters visible when the user clicked export, not filters edited later.

Operations check

Queue depth, failed jobs, retry rate, and worker timing should be visible before rollout.

How to adapt this case

Use this pattern for exports, imports, bulk actions, reconciliation jobs, billing runs, data backfills, and any workflow where a user starts work that finishes later. The reusable part is the state machine and evidence list. The domain can change, but the review questions remain stable: who is allowed to start it, what data snapshot is used, how failures are surfaced, and what operational signal proves it is safe.

For smaller exports, the threshold may be lower than 10,000 rows, or you may skip email notification entirely. That is fine. What should not disappear is the explicit boundary between web request and worker, the permission check at download time, and the evidence that large accounts do not degrade the main application.

When an AI coding assistant implements this work, do not let it invent a queue library, redesign the report, or add extra columns just because it sees adjacent code. Those choices need separate decisions. The spec packet keeps the first version focused on safe export behavior.

Rollout observation plan

A report export is not done when the CSV downloads once in staging. The first production release should have an observation plan because failures often appear only with real account sizes, real filters, and real retry behavior. The spec should name what the team watches for the first day and what action happens if a signal moves.

For this example, the useful rollout signals are queue depth, worker duration, failed job count, download permission failures, and support tickets mentioning missing columns or stale filters. The rollback decision should also be concrete: disable the export button, pause workers, or keep completed files available while blocking new requests.

Queue signal

Queue depth and worker duration prove the feature is not starving other background work.

Access signal

Permission failures and expired-link events prove download security is being exercised.

Support signal

Missing column or stale filter reports show whether the exported file matches user expectations.

Anti-patterns to reject

Exporting in the web request

Large accounts will hit timeouts or database pressure. Make the long work asynchronous.

Download link without permission

A signed URL still needs account-aware access rules or short expiry with a safe regeneration path.

No failed-job path

Users and support need to know whether the export failed, can be retried, or requires engineering attention.

Use this pattern before adding a background export

Generate a spec packet, then make the pull request prove job states, permissions, and operational evidence before merge.

Editorial note

This case focuses on asynchronous work: the spec is useful because it makes background states, permissions, and operational evidence visible before implementation.