Data Sync Spec Between Services

Data Sync Spec Between Services
Spec Coding Editorial Team · Spec-first engineering notes

Every cross-service sync spec I've seen fail had the same shape: it documented the happy path and called itself done. The failures come from the parts the author didn't think to write down — ordering, late arrivals, partial failures, and what happens on day two when you need to backfill.

Published on 2026-03-01 · 6 min read · Author: Spec Coding Editorial Team · Review policy: Editorial Policy

Review Note

Reviewed May 3, 2026. This article is maintained as a focused companion to the API Contracts Hub. It has been expanded with review drills, acceptance criteria, and operator evidence for teams designing cross-service sync.

The first decision: push, pull, or log

The spec has to pick a sync model before anything else, because almost every other answer depends on it. I force authors to commit to one of three:

Mixing these accidentally is where grief starts. "We emit events AND expose a GET endpoint AND have a CDC stream" often means three partial implementations that agree on nothing. Pick one primary path in the spec and label any secondary path as fallback-only.

Ordering: the part that everyone skips

"Events arrive in order" is a promise almost nobody actually keeps. The spec needs to be specific about the ordering guarantee:

Whatever the spec picks, it must also say what the target does when it receives an event with a version older than what it has. My default answer: log and drop. The spec should make that explicit so reviewers don't assume it "overwrites with the latest" (a bug factory).

Event payload: thin vs. fat

Two valid patterns. The spec needs to pick one and say why:

I lean toward fat events for anything with strict latency SLAs or where the source is a legacy system that falls over. I lean toward thin events when PII or data minimization matters. Whichever you pick, write the rationale in the spec — the next person to change this will appreciate knowing why.

Conflict resolution when both sides can write

If the target is read-only, skip this section. If both sides can write the same field, the spec has to answer: who wins?

The test for the section: ask the reviewer to describe what happens when Alice updates the customer's email in Service A at 10:00:00 and Bob updates it in Service B at 10:00:01 and the events cross in flight. If you can't answer from the spec, it isn't done.

The backfill plan is part of the contract

Day one goes live. Day seven, someone notices that 50,000 records from before the sync started are missing in the target. The spec should have already answered: how do we catch up?

Reconciliation: the unsexy critical section

Every long-running sync drifts. Always. The spec must define a reconciliation job:

Acceptance criteria that catch real failures

- Given a source emits events for entity X
  When the target is offline for 30 minutes
  Then on recovery the target catches up within 5 minutes
  And no events in that window are permanently lost

- Given two events for the same entity arrive out of order
  When the older event is processed after the newer one
  Then the target state reflects the newer event
  And the older event is logged as stale

- Given the sync has been running for 24 hours
  When the reconciliation job runs
  Then fewer than 0.01% of rows show a diff
  And all diffs are auto-repaired or surfaced as alerts

The signal I look for in review

The quality signal I use: does the spec describe what the operator sees on day 30 when something is off? If it only describes the happy path on day one, it isn't a sync spec — it's a handoff note that will turn into a 3am page.

Review drill

Review a sync spec by following one record from the source service to every consumer. The weak spots are usually ownership, retries, and what happens when two systems temporarily disagree.

Put the sync contract, replay procedure, and reconciliation owner in the spec. Without those, the first incident becomes the real design document.

Example: for an account email update, the spec should say whether CRM, billing, or identity owns the value, how duplicates are ignored, and which reconciliation job fixes a missed event.

Worked Review Example

For a customer status sync, write the whole lifecycle. The billing system emits status.changed with an idempotency key, CRM stores the latest sequence number, and support tools show a stale-data badge until reconciliation completes. Deletes need the same treatment: soft delete, tombstone event, retention window, and the job that removes orphaned records after consumers acknowledge the change.

Copy This Sync Contract Block

When a spec feels abstract, I ask the author to fill in this block. It forces the decisions that usually stay hidden until the first reconciliation failure.

Sync contract

Source of truth:
- System:
- Fields owned:
- Fields derived by consumers:

Delivery model:
- Primary path: event push / target pull / shared log
- Ordering guarantee:
- Duplicate handling:
- Stale event behavior:

Recovery:
- Backfill entrypoint:
- Replay owner:
- Reconciliation frequency:
- Mismatch threshold:
- Operator alert:

Consumer impact:
- What users see during delay:
- What support sees during drift:
- What pauses the rollout:

The block is short enough to fit in a pull request description, but it changes the review. Instead of debating architecture labels, reviewers can point at a blank line and ask for the missing behavior.

One practical review move: run the block against a deleted record, not only an updated record. Deletes expose the weakest assumptions in sync systems. If the source emits a tombstone, the target needs to know whether to hide the record, mark it archived, keep it for compliance, or purge it after a retention window. If the spec cannot answer that, the sync is not finished.

Keywords: cross-service sync · event ordering · backfill plan · reconciliation job · change data capture

Editorial Note