No project description provided
Project description
DATAMIMIC — Deterministic Synthetic Test Data That Makes Sense
Generate realistic, interconnected, and reproducible test data for finance, healthcare, and beyond.
Faker gives you random data. DATAMIMIC gives you consistent, explainable datasets that respect business logic and domain constraints.
- 🧬 Patient medical histories that match age and demographics
- 💳 Bank transactions that obey balance constraints
- 🛡 Insurance policies aligned with real risk profiles
🧠 What Problem DATAMIMIC Solves
Typical data generators (like Faker) produce isolated random values. That’s fine for unit tests — but meaningless for system, analytics, or compliance testing.
Example:
# Faker – broken relationships
patient_name = fake.name()
patient_age = fake.random_int(1, 99)
conditions = [fake.word()]
# "25-year-old with Alzheimer's" – nonsense data.
DATAMIMIC – contextual realism
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(f"{patient.full_name}, {patient.age}, {patient.conditions}")
# "Shirley Thompson, 72, ['Diabetes', 'Hypertension']"
⚙️ Quickstart (Community Edition)
Install and run:
pip install datamimic-ce
Deterministic Data Generation
DATAMIMIC lets you generate the same data, every time across machines, environments, or CI pipelines. Seeds, clocks, and UUIDv5 namespaces ensure your synthetic datasets remain reproducible and traceable, no matter where or when they’re generated.
from datamimic_ce.domains.facade import generate_domain
request = {
"domain": "person",
"version": "v1",
"count": 1,
"seed": "docs-demo", # identical seed → identical output
"locale": "en_US",
"clock": "2025-01-01T00:00:00Z" # fixed clock = stable time context
}
response = generate_domain(request)
print(response["items"][0]["id"])
Result:
Same input → same output.
Behind the scenes, every deterministic request combines:
- A stable seed (for idempotent randomness),
- A frozen clock (for time-dependent values), and
- A UUIDv5 namespace (for globally consistent identifiers).
Together, they form a reproducibility contract. Ideal for CI/CD pipelines, agentic pipelines, and analytics verification.
Agents can safely re-invoke the same generation call and receive byte-for-byte identical data.
🧩 Domains & Examples
🏥 Healthcare
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(patient.full_name, patient.conditions)
- PatientService – Demographically realistic patients
- DoctorService – Specialties match conditions
- HospitalService – Realistic bed capacities and types
- MedicalRecordService – Longitudinal health records
💰 Finance
from datamimic_ce.domains.finance.services import BankAccountService
account = BankAccountService().generate()
print(account.account_number, account.balance)
- Balances respect transactions
- Card/IBAN formats per locale
- Distributions tuned for fraud analytics and reconciliation
👤 Demographics
PersonService– Culturally consistent names, addresses, phone patterns- Locale packs for DE / US / VN, versioned and auditable
🔒 Deterministic by Design
- Frozen clocks and canonical hashing → reproducible IDs
- Seeded random generators → identical outputs across runs
- Schema validation (XSD, JSONSchema) → structural integrity
- Provenance hashing → audit-friendly lineage
📘 See Developer Guide
🧮 XML / Python Model Workflow
Python-based generation:
from random import Random
from datamimic_ce.domains.common.models.demographic_config import DemographicConfig
from datamimic_ce.domains.healthcare.services import PatientService
cfg = DemographicConfig(age_min=70, age_max=75)
svc = PatientService(dataset="US", demographic_config=cfg, rng=Random(1337))
print(svc.generate().to_dict())
Equivalent XML model:
<setup>
<generate name="seeded_seniors" count="3" target="CSV">
<variable name="patient" entity="Patient" dataset="US" ageMin="70" ageMax="75" rngSeed="1337" />
<key name="full_name" script="patient.full_name" />
<key name="age" script="patient.age" />
<array name="conditions" script="patient.conditions" />
</generate>
</setup>
⚖️ CE vs EE Comparison
| Feature | Community (CE) | Enterprise (EE) |
|---|---|---|
| Deterministic domain generation | ✅ | ✅ |
| XML + Python pipelines | ✅ | ✅ |
| Healthcare & Finance domains | ✅ | ✅ |
| Multi-user collaboration | ❌ | ✅ |
| Governance & lineage dashboards | ❌ | ✅ |
| ML engines (Mostly AI, Synthcity, ... ) | ❌ | ✅ |
| RBAC & audit logging (HIPAA/GDPR/PCI) | ❌ | ✅ |
| Managed EDIFACT / SWIFT adapters | ❌ | ✅ |
👉 Compare editions • Book a strategy call
🧰 CLI & Automation
# Run instant healthcare demo
datamimic demo create healthcare-example
datamimic run ./healthcare-example/datamimic.xml
# Verify version
datamimic version
🧭 Architecture Snapshot
- Core pipeline: Determinism kit + domain services + schema validators
- Governance layer: Group tables, linkage audits, provenance hashing
- Execution layer: CLI, API, and XML runners
🌍 Industry Blueprints
Finance
- Simulate SWIFT / ISO 20022 flows
- Replay hashed PCI transaction histories
- Validate fraud and reconciliation pipelines
Healthcare
- Generate deterministic patient journeys
- Integrate HL7/FHIR/EDIFACT exchanges
- Reproduce QA datasets for regression testing
📚 Documentation & Community
🚀 Get Started
pip install datamimic-ce
Generate data that makes sense — deterministically. ⭐ Star us on GitHub if DATAMIMIC improves your testing workflow.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamimic_ce-2.1.0.tar.gz.
File metadata
- Download URL: datamimic_ce-2.1.0.tar.gz
- Upload date:
- Size: 12.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3de2ea9f90e575d874736af31504ab1402d750bce4b0e18ba40de1a80864471
|
|
| MD5 |
d0cb12fe331aaa2081b15b614ec28e9e
|
|
| BLAKE2b-256 |
a3cacdde0037c9d32b3dc2e914a64f638f6e1476b8f9d6ca989c613485e32e67
|
File details
Details for the file datamimic_ce-2.1.0-py3-none-any.whl.
File metadata
- Download URL: datamimic_ce-2.1.0-py3-none-any.whl
- Upload date:
- Size: 13.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e23eb1de5a6612b63c20b9aba382d7630935d9e5427cdc3371cf47b03bf829e
|
|
| MD5 |
534832ad1b8e5dafda1bf40c85fc5baa
|
|
| BLAKE2b-256 |
347d85f86d87cf196fc532cf38077f20f05d58926a31e27eb2aa661d3683706a
|