No project description provided
Project description
DATAMIMIC — Deterministic Synthetic Test Data That Makes Sense
Generate realistic, interconnected, and reproducible test data for finance, healthcare, and beyond.
Faker gives you random data. DATAMIMIC gives you consistent, explainable datasets that respect business logic and domain constraints.
- 🧬 Patient medical histories that match age and demographics
- 💳 Bank transactions that obey balance constraints
- 🛡 Insurance policies aligned with real risk profiles
✨ Why DATAMIMIC?
Typical data generators produce isolated random values. That’s fine for unit tests — but meaningless for system, analytics, or compliance testing.
# Faker — broken relationships
patient_name = fake.name()
patient_age = fake.random_int(1, 99)
conditions = [fake.word()]
# "25-year-old with Alzheimer's" — nonsense data
# DATAMIMIC — contextual realism
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(f"{patient.full_name}, {patient.age}, {patient.conditions}")
# "Shirley Thompson, 72, ['Diabetes', 'Hypertension']"
⚙️ Quickstart (Community Edition)
Install and run:
pip install datamimic-ce
Deterministic Generation
DATAMIMIC produces the same data for the same request, across machines and CI runs. Seeds, clocks, and UUIDv5 namespaces enforce reproducibility.
from datamimic_ce.domains.facade import generate_domain
request = {
"domain": "person",
"version": "v1",
"count": 1,
"seed": "docs-demo", # identical seed → identical output
"locale": "en_US",
"clock": "2025-01-01T00:00:00Z" # fixed clock = stable time context
}
response = generate_domain(request)
print(response["items"][0]["id"])
# Same input → same output
Determinism Contract
- Inputs:
{seed, clock, uuidv5-namespace, request body} - Guarantees: byte-identical payloads + stable
determinism_proof.content_hash - Scope: all CE domains (see docs for domain-specific caveats)
⚡ MCP (Model Context Protocol)
Run DATAMIMIC as an MCP server so Claude / Cursor (and agents) can call deterministic data tools.
Install
pip install datamimic-ce[mcp]
# Development
pip install -e .[mcp]
Run (SSE transport)
export DATAMIMIC_MCP_HOST=127.0.0.1
export DATAMIMIC_MCP_PORT=8765
# Optional auth; clients must send the same token via Authorization: Bearer or X-API-Key
export DATAMIMIC_MCP_API_KEY=changeme
datamimic-mcp
In-proc example (determinism proof)
import anyio, json
from fastmcp.client import Client
from datamimic_ce.mcp.models import GenerateArgs
from datamimic_ce.mcp.server import create_server
async def main():
args = GenerateArgs(domain="person", locale="en_US", seed=42, count=2)
payload = args.model_dump(mode="python")
async with Client(create_server()) as c:
a = await c.call_tool("generate", {"args": payload})
b = await c.call_tool("generate", {"args": payload})
print(json.loads(a[0].text)["determinism_proof"]["content_hash"]
== json.loads(b[0].text)["determinism_proof"]["content_hash"]) # True
anyio.run(main)
Config keys
DATAMIMIC_MCP_HOST(default127.0.0.1)DATAMIMIC_MCP_PORT(default8765)DATAMIMIC_MCP_API_KEY(unset = no auth)- Requests over cap (
count > 10_000) are rejected with422.
➡️ Full guide, IDE configs (Claude/Cursor), transports, errors: docs/mcp_quickstart.md
🧩 Domains & Examples
🏥 Healthcare
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(patient.full_name, patient.conditions)
- Demographically realistic patients
- Doctor specialties match conditions
- Hospital capacities and types
- Longitudinal medical records
💰 Finance
from datamimic_ce.domains.finance.services import BankAccountService
account = BankAccountService().generate()
print(account.account_number, account.balance)
- Balances respect transaction histories
- Card/IBAN formats per locale
- Distributions tuned for fraud/reconciliation tests
🌐 Demographics
PersonServicewith locale packs (DE / US / VN), versioned and auditable
🔒 Deterministic by Design
- Frozen clocks + canonical hashing → reproducible IDs
- Seeded RNG → identical outputs across runs
- Schema validation (XSD/JSONSchema) → structural integrity
- Provenance hashing → audit-ready lineage
📘 See Developer Guide
🧮 XML / Python Parity
Python:
from random import Random
from datamimic_ce.domains.common.models.demographic_config import DemographicConfig
from datamimic_ce.domains.healthcare.services import PatientService
cfg = DemographicConfig(age_min=70, age_max=75)
svc = PatientService(dataset="US", demographic_config=cfg, rng=Random(1337))
print(svc.generate().to_dict())
Equivalent XML:
<setup>
<generate name="seeded_seniors" count="3" target="CSV">
<variable name="patient" entity="Patient" dataset="US" ageMin="70" ageMax="75" rngSeed="1337" />
<key name="full_name" script="patient.full_name" />
<key name="age" script="patient.age" />
<array name="conditions" script="patient.conditions" />
</generate>
</setup>
🧰 CLI
# Run instant healthcare demo
datamimic demo create healthcare-example
datamimic run ./healthcare-example/datamimic.xml
# Verify version
datamimic version
Quality gates (repo):
make typecheck # mypy --strict
make lint # pylint (≥9.0 score target)
make coverage # target ≥ 90%
🧭 Architecture Snapshot
- Core pipeline: Determinism kit • Domain services • Schema validators
- Governance layer: Group tables • Linkage audits • Provenance hashing
- Execution layer: CLI • API • XML runners • MCP server
⚖️ CE vs EE
| Feature | Community (CE) | Enterprise (EE) |
|---|---|---|
| Deterministic domain generation | ✅ | ✅ |
| XML + Python pipelines | ✅ | ✅ |
| Healthcare & Finance domains | ✅ | ✅ |
| Multi-user collaboration | ❌ | ✅ |
| Governance & lineage dashboards | ❌ | ✅ |
| ML engines (Mostly AI, Synthcity, …) | ❌ | ✅ |
| RBAC & audit logging (HIPAA/GDPR/PCI) | ❌ | ✅ |
| EDIFACT / SWIFT adapters | ❌ | ✅ |
👉 Compare editions • Book a strategy call
📚 Documentation & Community
🚀 Get Started
pip install datamimic-ce
Generate data that makes sense — deterministically. ⭐ Star us on GitHub if DATAMIMIC improves your testing workflow.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamimic_ce-2.2.0.tar.gz.
File metadata
- Download URL: datamimic_ce-2.2.0.tar.gz
- Upload date:
- Size: 12.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e10eb8fb46f41392780bce8f71b33fe206a55038c47e04e2203712531c00ca04
|
|
| MD5 |
46ac8918b589aef5f1bfb33a40c5e3b0
|
|
| BLAKE2b-256 |
e8c11d7d1ec0c51652630ab966c31b37a4cca182d37c30bf625519311d51274a
|
File details
Details for the file datamimic_ce-2.2.0-py3-none-any.whl.
File metadata
- Download URL: datamimic_ce-2.2.0-py3-none-any.whl
- Upload date:
- Size: 13.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71c5d60b10193ff6ffb396132902ccfc1b7be626457f040ae332a80e8428da95
|
|
| MD5 |
3a0f720c3bc3b5433183fdd1a0648a78
|
|
| BLAKE2b-256 |
888f97cd4b1d8649b99e77c6e6c9420ecdbdca07fecb32d21c80d8f3b748e4ae
|