Add your description here

Project description

Factoreally

Generate realistic test data from your actual production data patterns

Factoreally automatically analyzes your real data to create intelligent factories that generate statistically accurate test data. Instead of manually crafting test fixtures or using random data that doesn't reflect reality, leverage the patterns hidden in your production datasets to build better tests, faster development cycles, and more reliable systems.

Why Factoreally?

🎯 Production-Grade Realism

Data-driven accuracy: Learns from your actual data patterns instead of generating random noise
Statistical fidelity: Preserves field distributions, value frequencies, and structural relationships
Pattern recognition: Automatically detects UUIDs, timestamps, email formats, and custom patterns
Relationship preservation: Maintains correlations between nested fields and optional data
Pydantic model support: Uses dictionary type annotations to identify dynamic object fields

⚡ Accelerated Development

Zero configuration: Point at your data, get a working factory instantly
Type-safe integration: Works seamlessly with Pydantic models and data classes
Flexible generation: Create single objects, batches, or infinite streams
Override capabilities: Customize specific fields while preserving realistic defaults

🔧 Testing Excellence

Comprehensive coverage: Generate edge cases and realistic data distributions automatically
Consistent reproducibility: Deterministic generation for reliable test suites
Performance at scale: Generate thousands of realistic records efficiently
Integration ready: Drop-in replacement for manual fixture creation

📊 Business Impact

Risk reduction: Catch data-related bugs before they reach production
Time savings: Eliminate hours of manual test data creation and maintenance
Quality assurance: Test against realistic data scenarios, not toy examples
Compliance confidence: Generate test data that mirrors production complexity without exposing sensitive information

Quick Start

1. Generate a factory specification from your data

# Basic spec generation
factoreally create --in real_user_payloads.json --out user.spec.json

# With Pydantic model for dynamic field detection
factoreally create \
  --in user_payloads.json \
  --out user.spec.json \
  --model myapp.models.UserModel

2. Create intelligent factories

from factoreally import Factory

# From saved specification file
user_factory = Factory("user.spec.json")

# Or directly from spec dictionary
user_factory = Factory(spec)

# Or with overrides built-in
admin_factory = Factory(spec, role="admin", permissions__level="high")

3. Generate realistic test data

# Single realistic user
user_data = user_factory.build()

# Batch generation for performance tests
users = user_factory[:1000]

# Infinite stream for stress testing
for user in user_factory:
    process_user(user)

4. Integrate with your models

# Works with Pydantic, dataclasses, or any validation library
user = UserModel.model_validate(user_factory.build())
users = [UserModel.model_validate(u) for u in user_factory[:100]]

5. Customize while preserving realism

# Override specific fields
admin_factory = user_factory.copy(role="admin", permissions__level="high")

# Per-generation overrides
user = admin_factory.build(email="specific@example.com")

# Nested field overrides
user = user_factory.build(address__country="US", profile__verified=True)

# Array field overrides (applies to all array elements)
users = user_factory.build(actions__data__patient__id="fixed-id")

# Array element overrides (target specific array indices)
user = user_factory.build(items__0__name="first_item", items__1__value=999)

Dynamic Overrides with Callables

Override values can be callables for dynamic, context-aware customization:

# No arguments - static value generator
user = user_factory.build(id=lambda: str(uuid.uuid4()))

# One argument (factoreally's generated field value) - transform existing value
user = user_factory.build(name=lambda value: value.upper())

# Two arguments (field value, entire object) - complex logic
user = user_factory.build(
    display_name=lambda value, obj: f"{value} ({obj['role']})"
)

# Keyword-only arguments for clarity
user = user_factory.build(
    full_name=lambda *, obj: f"{obj['first_name']} {obj['last_name']}"
)

# Mixed callable and static overrides
user = user_factory.build(
    name=lambda value: value.title(),  # Callable
    email="admin@example.com",         # Static
    created_at=lambda: datetime.now()  # Callable
)

Pydantic Model Integration

Factoreally's Pydantic integration helps identify dynamic dictionary fields in your data structure. By providing your Pydantic model, Factoreally can distinguish between static nested objects and dynamic dictionaries.

from pydantic import BaseModel
from datetime import datetime
from typing import Dict, List, Optional

class UserEvent(BaseModel):
    user_id: str
    event_type: str
    metadata: Dict[str, str]  # Factoreally knows this is a dynamic dictionary
    profiles: List[UserProfile]  # Array of objects (no special handling)
    created_at: datetime
    settings: Optional[Dict[str, int]] = None  # Dynamic dict (Optional ignored)

# Without model: Factoreally guesses from data patterns
spec_basic = create_spec(sample_events)

# With model: Factoreally knows which fields are dynamic dictionaries
spec_enhanced = create_spec(sample_events, model=UserEvent)

Note: The actual key generation and pattern detection works the same regardless of whether you provide a model or not

⚡ Command Line Usage

# Basic analysis
factoreally create --in api_logs.json --out api.spec.json

# With Pydantic model for dynamic field detection
factoreally create \
  --in api_logs.json \
  --out api.spec.json \
  --model myproject.schemas.APILogEvent

# Works with any importable Pydantic model
factoreally create \
  --in user_data.json \
  --out user.spec.json \
  --model backend.models.user.UserAccount

Real-World Example: E-commerce Events

from pydantic import BaseModel
from datetime import datetime
from typing import Dict, List, Optional
from decimal import Decimal

class ProductMetrics(BaseModel):
    views: int
    clicks: int
    conversions: float

class EcommerceEvent(BaseModel):
    event_id: str  # UUID pattern
    user_id: str   # UUID pattern
    session_id: str  # Different ID pattern

    # Dynamic object - product IDs as keys
    product_metrics: Dict[str, ProductMetrics]

    # Dynamic object - feature flags
    experiments: Dict[str, bool]

    # Static nested structure
    user_profile: UserProfile

    # Optional dynamic fields
    custom_attributes: Optional[Dict[str, str]] = None
    ab_tests: Optional[Dict[str, str]] = None

    timestamp: datetime
    revenue: Optional[Decimal] = None

# Sample data
sample_data = [
    {
        "event_id": "550e8400-e29b-41d4-a716-446655440000",
        "user_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "session_id": "sess_abc123xyz789",
        "product_metrics": {
            "prod_1": {"views": 45, "clicks": 12, "conversions": 0.27},
            "prod_2": {"views": 33, "clicks": 8, "conversions": 0.24}
        },
        "experiments": {
            "new_checkout": True,
            "recommended_products": False
        },
        "user_profile": {"tier": "premium", "region": "US"},
        "custom_attributes": {"source": "mobile", "campaign": "summer_sale"},
        "timestamp": "2024-01-15T10:30:00Z",
        "revenue": "125.99"
    }
]

# Create spec with model intelligence
spec = create_spec(sample_data, model=EcommerceEvent)
factory = Factory(spec)

# Generate realistic test events
event = factory.build()
# ✅ Realistic UUIDs for event_id, user_id
# ✅ Session ID following pattern sess_*
# ✅ Dynamic product_metrics with varied product IDs
# ✅ Realistic experiment flag combinations
# ✅ Consistent user_profile structure
# ✅ Optional fields respect original probabilities

Advanced Features

Pattern Recognition

Factoreally automatically detects and generates:

Temporal patterns: ISO timestamps, durations, date formats
Structured identifiers: UUIDs, Auth0 IDs, MAC addresses
Semantic patterns: Email formats, phone numbers, custom schemas
Numerical distributions: Maintains statistical properties of your data

Dynamic Object Support

Automatic detection: Recognizes dynamic dict fields vs static nested objects
Key pattern analysis: Learns from UUID keys, date keys, or custom patterns
Flexible generation: Creates realistic dynamic keys while preserving value patterns
Pydantic integration: Use Dict[K, V] type annotations to identify dynamic fields

Smart Null Handling

Preserves optional field probabilities from your data
Maintains conditional presence (nested objects appear when parents exist)
Respects field interdependencies
Gracefully handles None values in array overrides

Performance Optimized

Lazy evaluation for memory efficiency
Batch generation capabilities
Streaming interfaces for large datasets
Minimal overhead factory creation

Real-World Applications

API Testing with Pydantic Models: Generate test payloads with realistic data patterns. Use Pydantic model validation to ensure generated data matches your schemas.

Load Testing: Generate millions of realistic user profiles that stress-test your system with production-like data patterns.

Integration Testing: Create test data that matches your data patterns automatically. Generated data can be validated against your Pydantic models.

Development Environments: Populate local databases with realistic data for feature development without production data access.

Compliance Testing: Generate test datasets that mirror production complexity while maintaining data privacy.

Performance Benchmarking: Test with realistic data distributions instead of uniform random data that doesn't reflect actual usage patterns, leveraging your domain models for accuracy.

Microservices Testing: Generate consistent, realistic test data for distributed system testing using shared data patterns.

Project details

Release history Release notifications | RSS feed

0.6.1

Oct 23, 2025

0.6.0

Oct 21, 2025

0.5.1

Oct 2, 2025

0.5.0

Oct 2, 2025

0.4.0

Oct 2, 2025

0.3.0

Sep 30, 2025

This version

0.1.0

Sep 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factoreally-0.1.0.tar.gz (89.7 kB view details)

Uploaded Sep 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

factoreally-0.1.0-py3-none-any.whl (48.2 kB view details)

Uploaded Sep 30, 2025 Python 3

File details

Details for the file factoreally-0.1.0.tar.gz.

File metadata

Download URL: factoreally-0.1.0.tar.gz
Upload date: Sep 30, 2025
Size: 89.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.6.5

File hashes

Hashes for factoreally-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6e0bdc440f3eec6e268d404ea1909b9453a4b473e3f88425a34af25428f5076f`
MD5	`b4ef878ad96fd78bf3d0e15febd159dc`
BLAKE2b-256	`ad3e78eadd3048ac49a901dbb18c358d29b3b67ab634b57bbc8f81d2aac65a4c`

See more details on using hashes here.

File details

Details for the file factoreally-0.1.0-py3-none-any.whl.

File metadata

Download URL: factoreally-0.1.0-py3-none-any.whl
Upload date: Sep 30, 2025
Size: 48.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.6.5

File hashes

Hashes for factoreally-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`745293f6312c5fc7c2f5f75482f993059d9ff50e60c5d4c5ea3523fc8d9f0014`
MD5	`07e84d6b6e8d53dd861ce1a1acf3fd36`
BLAKE2b-256	`8368d37d35695dc13208a4726a0949159afed73411aa71984351445e19956650`

See more details on using hashes here.

factoreally 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Factoreally

Why Factoreally?

🎯 Production-Grade Realism

⚡ Accelerated Development

🔧 Testing Excellence

📊 Business Impact

Quick Start

1. Generate a factory specification from your data

2. Create intelligent factories

3. Generate realistic test data

4. Integrate with your models

5. Customize while preserving realism

Dynamic Overrides with Callables

Pydantic Model Integration

Real-World Example: E-commerce Events

Advanced Features

Pattern Recognition

Dynamic Object Support

Smart Null Handling

Performance Optimized

Real-World Applications

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes