Test Data Strategies and Fixtures

Your test suite uses a 500-line seed file copied from production three years ago. Half the tests depend on “John Doe” existing with email “john@test.com” and user ID 1. When someone adds a unique constraint on email, forty tests break. When the schema changes, the seed file requires a manual update that takes half a day. Test data is the foundation of your testing infrastructure — and most teams treat it as an afterthought.

What You’ll Walk Away With

Factory pattern implementations that generate consistent, realistic test data on demand
Fixture management strategies for different test layers (unit, integration, E2E)
Privacy-compliant data generation that mimics production patterns without real PII
Database seeding workflows for integration and E2E tests
Prompts for generating domain-specific test data factories

The Factory Pattern

Factories replace hardcoded test data with configurable generators that produce fresh, unique data for every test.

@src/lib/db/schema.ts
Generate test data factories for every entity in our database schema:

For each table, create a factory at /tests/factories/{entity}.factory.ts that:
1. Uses @faker-js/faker for realistic data generation
2. Has build() - returns a plain object (for unit tests, no DB)
3. Has create(db) - inserts into the database and returns the record
4. Has buildList(count) - generates an array of objects
5. Has createList(db, count) - inserts multiple records
6. Supports overrides: build({ email: 'specific@test.com' })
7. Supports traits: build('admin'), build('suspended')
8. Handles foreign keys: OrderFactory.create() auto-creates a User if needed
9. Generates UNIQUE values (no collisions even with 1000 records)

Create a factory index at /tests/factories/index.ts that exports all factories.
Follow our Drizzle ORM patterns for database insertions.

claude "Read our database schema at /src/lib/db/schema.ts and generate
test data factories for every entity.

Save to /tests/factories/ with one file per entity.
Each factory must:
- Use faker.js for data generation
- Support build() and create() patterns
- Handle foreign key relationships automatically
- Generate unique values per invocation
- Support overrides and traits

After creating factories, verify they work by:
1. Building 100 of each entity (no collisions)
2. Creating 10 of each in the test database (relationships resolve)
3. Running existing tests to verify compatibility"

Generate test data factories for all database entities:
1. Read the database schema
2. Create factory files with build/create patterns
3. Handle entity relationships and foreign keys
4. Verify factories produce valid data
5. Create a PR with the factory infrastructure

Copy-paste prompt for domain-specific factories:

Create test data factories for our e-commerce domain:

UserFactory:
- Traits: 'admin', 'premium', 'suspended', 'new' (created today)
- Generates realistic names, emails, and addresses
- hashedPassword defaults to bcrypt hash of "Test1234!"

ProductFactory:
- Traits: 'outOfStock' (stock=0), 'onSale' (discount > 0), 'digital' (no shipping)
- Price between $1-999 (realistic distribution, not random)
- SKU format: "PRD-XXXXX" (unique)
- Category from our actual category list

OrderFactory:
- Auto-creates a User and 1-5 Products when needed
- Traits: 'pending', 'shipped', 'delivered', 'cancelled', 'refunded'
- Total calculated from line items (not random)
- Timestamps are chronologically consistent (created < shipped < delivered)

PaymentFactory:
- Linked to an Order (auto-creates if needed)
- Traits: 'succeeded', 'failed', 'pending', 'refunded'
- Amount matches the associated order total

All factories should use faker.js and support overrides.
Save to /tests/factories/

Fixture Strategies by Test Layer

Unit Tests: Build, Don’t Create

Unit tests should never touch the database. Use build() to create plain objects.

// Good: plain objects, no database
const user = UserFactory.build({ role: 'admin' });
const result = authService.checkPermission(user, 'delete_users');
expect(result).toBe(true);

Integration Tests: Create with Isolation

Integration tests need database records but must not leak state between tests.

Copy-paste prompt for integration test data setup:

Create a test data seeding utility for our integration tests:

Requirements:
1. A seed() function that creates a consistent baseline dataset:
   - 3 users (admin, regular, suspended)
   - 5 products (mix of categories, some out of stock)
   - 10 orders (various statuses across users)
   - Associated payments for completed orders

2. Runs inside a database transaction that rolls back after each test

3. Returns references to all created entities:
   const { users, products, orders } = await seed(db);
   // users.admin, users.regular, users.suspended
   // products[0], products[1], etc.

4. Relationships are consistent:
   - Orders belong to specific users
   - Payments match order totals
   - Stock levels reflect order history

Save to /tests/helpers/seed.ts

E2E Tests: Stable Reference Data

E2E tests need predictable data that persists across the test run.

Copy-paste prompt for E2E test data:

Create an E2E test data management system:

1. A global setup script that:
   - Resets the test database to a clean state
   - Seeds reference data (categories, settings, feature flags)
   - Creates test users with known credentials:
     - admin@test.com / TestAdmin123! (admin role)
     - user@test.com / TestUser123! (regular user)
     - premium@test.com / TestPremium123! (premium subscriber)

2. A per-test data helper that:
   - Creates test-specific data (e.g., orders for "test order cancellation" flow)
   - Tags data with the test name for cleanup
   - Cleans up after the test completes

3. A cleanup script that removes all test-tagged data without touching reference data

Save to /tests/e2e/data/
Include usage examples in each E2E test.

Privacy-Compliant Data Generation

Production-Like Without Real PII

Copy-paste prompt for production-mirror data generation:

Create a data generation script that produces a production-realistic dataset
WITHOUT using any real customer data:

Requirements:
1. User distribution matches production:
   - 70% free tier, 20% premium, 10% enterprise
   - 60% US, 15% EU, 10% APAC, 15% other
   - Age distribution: normal around 32 with std dev 10
   - 30% have profile pictures, 70% do not

2. Order patterns match production:
   - Average 2.3 orders per user
   - Order values: log-normal distribution, mean $47, median $32
   - 15% of orders have been refunded
   - Seasonal pattern: 40% more orders in Q4

3. Generate 10,000 users with associated data
4. All PII is synthetic (faker.js) - zero real data
5. Save as a SQL dump for seeding staging environments

Output the generation script to /scripts/generate-staging-data.ts
Include a README explaining the data distribution assumptions.

When This Breaks

“Tests fail because factories generate data that violates business rules.” Your factories need domain knowledge. Add validation to the factory: if a product is “outOfStock”, stock must be 0. If an order is “delivered”, it must have a shipped_at date before delivered_at. Encode business rules in the factory, not just random data.

“Creating test data is slow because of foreign key chains.” Batch insertions instead of creating one record at a time. Pre-create shared reference data (categories, roles) once in a beforeAll hook. Only create entity-specific data per test.

“The seed file keeps getting out of sync with schema changes.” Generate factories from the schema, not by hand. When the schema changes, regenerate: “Read the updated schema and update all factories to match. Add default values for new required columns.”

“We need production data for debugging but cannot use it because of PII.” Use the privacy-compliant generation approach. Create synthetic data that matches production distributions. For specific bug reproduction, anonymize the production record manually: replace the email with a faker email, the name with a faker name, but keep the structural data that reproduces the bug.

What’s Next

Unit Testing Use factories in unit tests for clean, isolated testing.

Integration Testing Database-backed tests with proper data isolation.

Privacy and Data Handling Enterprise data governance for test environments.