ALL-1875 PR-1: DuplicateCandidate table + detection service (backend)
completedAgent: will-engineer
Priority: 3
Branch:
wilbo/all-1875-duplicate-candidate-modelPR: #12365
Linear: ALL-1875
## ALL-1875: Contact merge UI + auto duplicate detection — PR 1 of 5
### Goal
Create the database persistence layer and detection service for duplicate contact candidates.
### Context
- Linear issue: ALL-1875
- Branch: `wilbo/all-1875-duplicate-candidate-model`
- Worktree: `mono-all-1875-duplicate-candidate-model`
- Codebase: `/home/agent/agents/wilbo/mono` (identity domain)
### What to build
1. **Prisma migration** in `domains/identity/prisma/migrations/` — new `DuplicateCandidate` table:
- `id` TEXT PK (nanoid)
- `workspaceId` TEXT NOT NULL (workspace-scoped)
- `contactAId` TEXT NOT NULL → FK `CanonicalContact(id)`
- `contactBId` TEXT NOT NULL → FK `CanonicalContact(id)`
- `confidenceScore` FLOAT NOT NULL (0.0–1.0)
- `matchSignals` JSONB NOT NULL (array of signal names: `"EMAIL_MATCH"`, `"PHONE_MATCH"`, `"NAME_ADDRESS_FUZZY"`, `"ACCOUNT_NUMBER_MATCH"`)
- `detectedAt` TIMESTAMP NOT NULL DEFAULT now()
- `dismissedAt` TIMESTAMP nullable
- `dismissedBy` TEXT nullable
- unique constraint on `(contactAId, contactBId)` with `contactAId < contactBId` enforced by CHECK constraint
- Index on `workspaceId` + `dismissedAt`
2. **Detection service** at `domains/identity/src/services/canonicalContact/detectDuplicates.ts`:
- Query indexed fields: email (CanonicalContactEmail), phone (CanonicalContactPhone), accountNumber (ThirdParty.accountNumber)
- Use fuzzy name+address only when two contacts are already linked to the same site (skip expensive cross-product)
- Write candidates to `DuplicateCandidate` via upsert (update confidenceScore/matchSignals on re-detection)
- Score: EMAIL_MATCH=0.9, PHONE_MATCH=0.8, ACCOUNT_NUMBER_MATCH=0.85, NAME_ADDRESS_FUZZY=0.6 (max of all signals that fire)
3. **Repository** at `domains/identity/src/repositories/duplicateCandidate/index.ts`:
- `findByWorkspace(workspaceId, { includeDismissed?: boolean })` → paginated
- `dismiss(id, dismissedBy)` → sets dismissedAt
- `upsertCandidate(data)` → for the detection service
### Key existing files
- `domains/identity/src/services/canonicalContact/normalization.ts` — email/phone normalization patterns
- `domains/identity/src/commands/backfill/deduplicate-third-parties.ts` — existing dedup logic to reference
- `domains/identity/prisma/migrations/20260420000000_add_account_number_to_third_party/` — accountNumber exists on ThirdParty
- `domains/identity/src/jobs/index.ts` — BullMQ job registration pattern
### Acceptance test
- Migration runs cleanly: `yarn prisma migrate dev` in identity domain
- Unit test: detectDuplicates() groups two contacts with the same normalized email into a candidate with EMAIL_MATCH signal and score ≥ 0.9
- No regression on existing CanonicalContact tests
### Depends on
- Nothing (this is PR-1)
### Out of scope
- Scheduling the job (PR-2)
- GraphQL exposure (PR-3)
- Any frontend work
Event Timeline
created
status_change
queued → in_progress
failed
lease expired — re-queued for retry
in_progress → queued
status_change
queued → completed