Skip to main content

No project description provided

Project description

DATAMIMIC — Deterministic Synthetic Test Data That Makes Sense

Generate realistic, interconnected, and reproducible test data for finance, healthcare, and beyond.

Faker gives you random data. DATAMIMIC gives you consistent, explainable datasets that respect business logic and domain constraints.

  • 🧬 Patient medical histories that match age and demographics
  • 💳 Bank transactions that obey balance constraints
  • 🛡 Insurance policies aligned with real risk profiles

CI Coverage Maintainability Python License: MIT


🧠 What Problem DATAMIMIC Solves

Typical data generators (like Faker) produce isolated random values. That’s fine for unit tests — but meaningless for system, analytics, or compliance testing.

Example:

# Faker – broken relationships
patient_name = fake.name()
patient_age = fake.random_int(1, 99)
conditions = [fake.word()]
# "25-year-old with Alzheimer's" – nonsense data.

DATAMIMIC – contextual realism

from datamimic_ce.domains.healthcare.services import PatientService

patient = PatientService().generate()
print(f"{patient.full_name}, {patient.age}, {patient.conditions}")
# "Shirley Thompson, 72, ['Diabetes', 'Hypertension']"

⚙️ Quickstart (Community Edition)

Install and run:

pip install datamimic-ce

Deterministic Data Generation

DATAMIMIC lets you generate the same data, every time across machines, environments, or CI pipelines. Seeds, clocks, and UUIDv5 namespaces ensure your synthetic datasets remain reproducible and traceable, no matter where or when they’re generated.

from datamimic_ce.domains.facade import generate_domain

request = {
    "domain": "person",
    "version": "v1",
    "count": 1,
    "seed": "docs-demo",                # identical seed → identical output
    "locale": "en_US",
    "clock": "2025-01-01T00:00:00Z"     # fixed clock = stable time context
}

response = generate_domain(request)
print(response["items"][0]["id"])

Result: Same input → same output.

Behind the scenes, every deterministic request combines:

  • A stable seed (for idempotent randomness),
  • A frozen clock (for time-dependent values), and
  • A UUIDv5 namespace (for globally consistent identifiers).

Together, they form a reproducibility contract. Ideal for CI/CD pipelines, agentic pipelines, and analytics verification.

Agents can safely re-invoke the same generation call and receive byte-for-byte identical data.


🧩 Domains & Examples

🏥 Healthcare

from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(patient.full_name, patient.conditions)
  • PatientService – Demographically realistic patients
  • DoctorService – Specialties match conditions
  • HospitalService – Realistic bed capacities and types
  • MedicalRecordService – Longitudinal health records

💰 Finance

from datamimic_ce.domains.finance.services import BankAccountService
account = BankAccountService().generate()
print(account.account_number, account.balance)
  • Balances respect transactions
  • Card/IBAN formats per locale
  • Distributions tuned for fraud analytics and reconciliation

👤 Demographics

  • PersonService – Culturally consistent names, addresses, phone patterns
  • Locale packs for DE / US / VN, versioned and auditable

🔒 Deterministic by Design

  • Frozen clocks and canonical hashing → reproducible IDs
  • Seeded random generators → identical outputs across runs
  • Schema validation (XSD, JSONSchema) → structural integrity
  • Provenance hashing → audit-friendly lineage

📘 See Developer Guide


🧮 XML / Python Model Workflow

Python-based generation:

from random import Random
from datamimic_ce.domains.common.models.demographic_config import DemographicConfig
from datamimic_ce.domains.healthcare.services import PatientService

cfg = DemographicConfig(age_min=70, age_max=75)
svc = PatientService(dataset="US", demographic_config=cfg, rng=Random(1337))
print(svc.generate().to_dict())

Equivalent XML model:

<setup>
  <generate name="seeded_seniors" count="3" target="CSV">
    <variable name="patient" entity="Patient" dataset="US" ageMin="70" ageMax="75" rngSeed="1337" />
    <key name="full_name" script="patient.full_name" />
    <key name="age" script="patient.age" />
    <array name="conditions" script="patient.conditions" />
  </generate>
</setup>

⚖️ CE vs EE Comparison

Feature Community (CE) Enterprise (EE)
Deterministic domain generation
XML + Python pipelines
Healthcare & Finance domains
Multi-user collaboration
Governance & lineage dashboards
ML engines (Mostly AI, Synthcity, ... )
RBAC & audit logging (HIPAA/GDPR/PCI)
Managed EDIFACT / SWIFT adapters

👉 Compare editionsBook a strategy call


🧰 CLI & Automation

# Run instant healthcare demo
datamimic demo create healthcare-example
datamimic run ./healthcare-example/datamimic.xml

# Verify version
datamimic version

🧭 Architecture Snapshot

  • Core pipeline: Determinism kit + domain services + schema validators
  • Governance layer: Group tables, linkage audits, provenance hashing
  • Execution layer: CLI, API, and XML runners

🌍 Industry Blueprints

Finance

  • Simulate SWIFT / ISO 20022 flows
  • Replay hashed PCI transaction histories
  • Validate fraud and reconciliation pipelines

Healthcare

  • Generate deterministic patient journeys
  • Integrate HL7/FHIR/EDIFACT exchanges
  • Reproduce QA datasets for regression testing

📚 Documentation & Community


🚀 Get Started

pip install datamimic-ce

Generate data that makes sense — deterministically. ⭐ Star us on GitHub if DATAMIMIC improves your testing workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamimic_ce-2.1.0.tar.gz (12.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamimic_ce-2.1.0-py3-none-any.whl (13.5 MB view details)

Uploaded Python 3

File details

Details for the file datamimic_ce-2.1.0.tar.gz.

File metadata

  • Download URL: datamimic_ce-2.1.0.tar.gz
  • Upload date:
  • Size: 12.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datamimic_ce-2.1.0.tar.gz
Algorithm Hash digest
SHA256 c3de2ea9f90e575d874736af31504ab1402d750bce4b0e18ba40de1a80864471
MD5 d0cb12fe331aaa2081b15b614ec28e9e
BLAKE2b-256 a3cacdde0037c9d32b3dc2e914a64f638f6e1476b8f9d6ca989c613485e32e67

See more details on using hashes here.

File details

Details for the file datamimic_ce-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: datamimic_ce-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datamimic_ce-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e23eb1de5a6612b63c20b9aba382d7630935d9e5427cdc3371cf47b03bf829e
MD5 534832ad1b8e5dafda1bf40c85fc5baa
BLAKE2b-256 347d85f86d87cf196fc532cf38077f20f05d58926a31e27eb2aa661d3683706a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page