How to Spec Database Schemas Before Writing Migrations
A migration file is permanent. Once it runs against production, you cannot take it back without writing another migration. Yet most teams write migrations the way they write application code: open an editor, start typing columns, and figure out constraints as they go. The result is schemas that accumulate implicit decisions nobody documented, default values nobody questioned, and nullability choices that haunt the codebase for years. Specifying your database schema before writing the migration forces those decisions into the open where they can be reviewed, challenged, and agreed upon.
Why database schemas need specs
Application code gets rewritten. Frameworks get swapped. But the database schema outlives almost everything else in the stack. A column added in 2019 is still there in 2026 unless someone explicitly removed it, and removing columns from a production database is one of the riskiest operations a team can perform. That longevity makes schema decisions high-leverage: a bad column type or a missing index costs more over its lifetime than a poorly named variable ever will.
The problem is that most schema decisions are made implicitly. An engineer opens a migration file, types status VARCHAR(255), and moves on. Nobody asks why it is a varchar instead of a smallint. Nobody asks whether 255 is the right length or just the ORM default. Nobody asks what values are valid. The migration passes code review because reviewers focus on application logic, not on whether placed_at should be TIMESTAMP or TIMESTAMPTZ.
Those implicit decisions compound. A year later the team discovers that status has 14 different string values in production, three of which are typos. They discover that placed_at stored times without timezone information, and the billing reconciliation job has been silently miscalculating for months. Each of these bugs traces back to a schema decision that was never explicitly made, just defaulted into.
A schema spec forces each decision to be stated, reviewed, and justified before the migration is written. It takes thirty minutes to write one. It saves weeks of debugging later.
The schema spec template
A schema spec is not a design document. It is a concrete description of the table you are about to create or modify, written in enough detail that a reviewer can evaluate every column, constraint, and index before a single line of SQL is written.
## Schema Spec: orders
### Table: orders
Description: Stores customer purchase orders. One row per checkout.
| Column | Type | Nullable | Default | Constraints |
|-------------|---------------|----------|----------------|------------------------|
| id | uuid | NO | gen_random_uuid() | PRIMARY KEY |
| user_id | uuid | NO | — | FK -> users(id) |
| total_cents | bigint | NO | — | CHECK (total_cents >= 0) |
| currency | char(3) | NO | 'USD' | CHECK (currency IN ('USD','EUR','GBP')) |
| status | smallint | NO | 0 | CHECK (status IN (0,1,2,3)) |
| placed_at | timestamptz | NO | now() | — |
| canceled_at | timestamptz | YES | NULL | — |
| notes | text | YES | NULL | — |
### Status mapping
0 = pending, 1 = confirmed, 2 = shipped, 3 = canceled
### Indexes
- idx_orders_user_id ON (user_id) — supports lookup by customer
- idx_orders_placed_at ON (placed_at DESC) — supports dashboard sorting
- idx_orders_status ON (status) WHERE status IN (0, 1) — partial index for active orders
### Foreign keys
- user_id -> users(id) ON DELETE RESTRICT
### Migration order
Must run after: 001_create_users
Migration file: 002_create_orders
This format does three things. First, it makes every column-level decision visible. Reviewers can see that total_cents is a bigint with a non-negative check constraint, not a decimal or a float. Second, it surfaces the indexes before the migration is written, which means the team can discuss query patterns during review rather than discovering missing indexes in production. Third, it documents the migration ordering, so the engineer writing the migration knows which tables must exist first.
Specifying column-level constraints
Every column in a table represents a decision. The spec is where you make each decision explicit instead of letting the ORM decide for you.
Nullability. The default in most databases is nullable. That means if you do not say NOT NULL, the column accepts nulls. This is almost always wrong for new columns. A null email column means you have a user without an email. Is that a valid state? If not, make it NOT NULL. The spec forces you to answer this question for every column. Write "YES" or "NO" in the nullable column. Do not leave it blank.
Defaults. A default value is not just a convenience. It defines what happens to existing rows when you add a new column to a table that already has data. If you add currency CHAR(3) NOT NULL without a default, the migration will fail on any table that has existing rows. The spec should state the default and, in a comment, explain why that default is correct. 'USD' as a default for currency is fine if all existing customers are US-based. It is wrong if you have customers in Europe.
Check constraints. A check constraint is a contract between the database and the application. CHECK (total_cents >= 0) means the database will reject negative order totals regardless of what the application code does. This is your last line of defense against bad data. The spec should list every check constraint, because each one is a business rule. If status can only be 0, 1, 2, or 3, that belongs in the spec. If the application later needs a status 4, the team will know it requires a schema change, not just a code change.
Foreign keys. A foreign key is a relationship with consequences. ON DELETE CASCADE means deleting a user deletes all their orders. ON DELETE RESTRICT means you cannot delete a user who has orders. ON DELETE SET NULL means the order stays but loses its user reference. Each option has different failure modes, and the spec is where the team decides which failure mode is acceptable. Do not leave this to whoever writes the migration file. Specify the behavior in the spec and get it reviewed.
Migration ordering and dependencies
When Table B has a foreign key referencing Table A, Table A must exist before Table B is created. This is obvious in a two-table example. It stops being obvious when you have a feature that touches five tables, three of which reference each other.
The schema spec should include a dependency section that lists which migrations must run first. This is not about the migration framework handling ordering automatically. Most frameworks do handle it. The point is that the dependency chain is a design decision that reviewers need to see.
Consider a feature that adds a subscriptions table, a subscription_items table, and a billing_events table. The dependency graph looks like this:
subscriptionsreferencesusers(must exist)subscription_itemsreferencessubscriptionsandproducts(both must exist)billing_eventsreferencessubscriptions(must exist)
If products does not exist yet and is being created in the same feature branch, the migration ordering becomes: users (already exists), products, subscriptions, subscription_items, billing_events. The spec makes this dependency chain visible. Without the spec, the engineer discovers the ordering problem when the migration fails in staging, fixes it with trial and error, and nobody learns anything from the experience.
A second ordering concern is rollback safety. If migration 003 depends on migration 002, rolling back 002 while 003 is still applied will break the foreign key constraint. The spec should note which migrations can be rolled back independently and which require rolling back in reverse order. This matters most during incident response, when the team is moving fast and does not have time to trace dependency chains manually.
Backward compatibility during migrations
In any system that cannot tolerate downtime, the migration must be backward-compatible with the currently deployed application code. This means you cannot rename a column, change a column type, or drop a column in a single migration. You need the expand-and-contract pattern.
Expand. Add the new column alongside the old one. Deploy application code that writes to both columns. Backfill existing rows. This is migration 1.
Contract. Once all rows have the new column populated and all application code reads from the new column, drop the old column. This is migration 2, deployed days or weeks after migration 1.
The schema spec should document both phases explicitly. If you are renaming placed_at to ordered_at, the spec should show:
- Phase 1: Add
ordered_at TIMESTAMPTZ, default toplaced_atvalue, deploy dual-write code - Phase 2: Drop
placed_at, remove dual-write code
Documenting both phases in the spec prevents a common failure mode: the engineer writes migration 1, ships it, and then the team forgets about migration 2. The old column stays in the schema forever, confusing every engineer who looks at the table for the next three years. The spec acts as a reminder that the work is not done until both phases are complete.
For teams practicing API versioning, the expand-and-contract pattern has a direct analog: you do not remove a field from an API response until all consumers have migrated to the new field. Schema changes and API changes often need to be coordinated, and the spec is the right place to document that coordination.
Edge cases: nullability, defaults, and data backfills
Adding a NOT NULL column to a table with existing rows is one of the most common sources of migration failures. The database will reject the migration unless every existing row has a value for the new column. You have three options, and the spec should state which one you are using and why.
Option 1: Add with a default. ALTER TABLE orders ADD COLUMN currency CHAR(3) NOT NULL DEFAULT 'USD'. This works if the default is correct for all existing rows. If it is not correct, you have just written bad data into production. The spec should justify the default: "All existing orders are from US customers. Default 'USD' is accurate for the current dataset."
Option 2: Add as nullable, backfill, then set NOT NULL. This is the safe path when the correct value varies per row. Add the column as nullable, run a backfill script that populates the correct value for each row, then alter the column to NOT NULL. The spec should describe the backfill logic: "Backfill currency from users.default_currency. For users without a default, set 'USD'." This is three separate operations and the spec should list all three.
Option 3: Add as nullable and leave it. Sometimes a column is genuinely optional. canceled_at is null for orders that have not been canceled. That is a valid state, not missing data. The spec should explain why null is acceptable: "NULL means the order has not been canceled. Only populated when status transitions to canceled."
Backfill performance is another edge case the spec should address. Backfilling 50 million rows in a single transaction will lock the table and bring the application to a halt. The spec should state the backfill strategy: batch size, whether it runs during off-peak hours, and whether it uses a background job or a one-time script. These decisions matter for QA testing too, because the test environment needs to simulate the backfill on a representative dataset. Your edge case checklist should include a row for "migration on populated table."
Schema specs and API contract alignment
A schema change almost always ripples to the API. If you add a currency column to the orders table, the GET /orders/:id response needs to include it. If you rename placed_at to ordered_at, consumers of that endpoint will break unless you maintain backward compatibility. Schema changes and API changes are two sides of the same coin, and specifying them separately leads to misalignment.
The fix is to spec them together. When you write the schema spec for a new table or a column change, include a section that maps schema columns to API response fields. This does not need to be elaborate. A simple mapping is enough:
orders.total_cents->GET /orders/:idresponse fieldtotal(converted to dollars, two decimal places)orders.currency->GET /orders/:idresponse fieldcurrency(ISO 4217 code)orders.status->GET /orders/:idresponse fieldstatus(mapped from integer to string: 0->"pending", 1->"confirmed")
This mapping catches mismatches early. If the database stores total_cents as a bigint but the API returns it as total in dollars, there is a conversion step. Where does that conversion happen? What about rounding? These questions surface during spec review, not during debugging a billing discrepancy three months after launch.
For teams that maintain versioned API contracts, the schema spec should note which API version introduces the new field. If currency is added in API v3, consumers on v2 should not see it. The spec makes this explicit. Without it, an engineer adds currency to the database, another engineer adds it to the API serializer, and nobody checks whether it should be gated behind a version flag.
The same principle applies to error handling. If a column has a check constraint, the API needs to return a meaningful error when the constraint is violated. CHECK (total_cents >= 0) should produce a 422 Unprocessable Entity with a message like "total must be non-negative," not a raw database error. The schema spec is where you connect the database constraint to the API error contract, so the engineer implementing the endpoint knows what to handle.
Before/after: a safer migration note
Database specs often fail because the migration plan is written as one step. The safer version names the compatibility window and the proof that each stage is complete.
Before: - Add status to subscriptions and backfill. After: - Add nullable status. - Deploy readers that fall back to billing_state when status is null. - Backfill 5,000 rows per batch; pause if lock wait exceeds 2s. - After 24h of zero null reads, add NOT NULL constraint. - Rollback before constraint: disable writer and drop column. - Rollback after constraint: stop writer, keep column, restore fallback.
The second version admits that rollback changes after the compatibility window closes. That is exactly the kind of decision a database spec should surface.
Topic Path
This article belongs to the API Contracts track. Start with the hub, then use the checklist, template, or tool below on a real project.
Keep reading
Fill a form, get a complete feature spec in Markdown — free, no signup.
Editorial note
- Author details: Daniel Marsh
- Editorial policy: How we review and update articles
- Corrections: Contact the editor