A standalone, schema-based data generator and bulk ingestion utility for MongoDB
Project description
mongo-synth: MongoDB Schema-Based Data Generator & Ingester
mongo-synth is a standalone Python utility and command-line tool designed to generate high-fidelity, deterministic synthetic datasets from JSON Schemas (or Pydantic models) and seed them directly into MongoDB collections at scale.
Whether you are performing database index optimization, latency stress testing, schema validation, or writing integration tests, mongo-synth allows you to rapidly populate mock databases with realistic data, statistical distributions, and edge-case anomalies.
Key Features
- 🧬 JSON Schema Synthesis: Translates arbitrary JSON Schema specifications (Draft 2020-12) into deterministic property-based generation strategies using
hypothesis-jsonschema. - 🍃 Native BSON Type Mapping: Supports MongoDB-specific types (
ObjectId,ISODate,Decimal128,BinData) via custom"bsonType"schema annotations. - 📊 Statistical Value Profiling: Inject real-world data properties by defining relative probability weights for specific fields (e.g., status field containing 80%
active/ 20%inactive). - ⚡ High-Performance Bulk Ingestion: Iterates over generated streams and inserts them in configurable batch chunks via PyMongo's unordered
insert_manyfor maximum throughput. - 🚨 Anomaly & Schema Drift Injection: Test system resilience under fire by injecting whitespace key anomalies, mixed-type arrays, extreme nesting depths, emojis, or string type impersonations.
- 🔒 Production Safety Lock: Protects production environments by automatically asserting connection strings against a configured live database URI and blocking execution on a match.
Installation
pip install .
Quick Start
1. CLI Usage
Generate and ingest 10,000 orders into a local database using a schema:
mongo-synth \
--schema path/to/order_schema.json \
--uri mongodb://localhost:27017 \
--db testing_db \
--collection orders \
--count 10000 \
--clear
2. Python API Usage
from pymongo import MongoClient
from mongo_synth.generators import JsonSchemaGenerator
from mongo_synth.ingestion import DataIngester
# 1. Define your blueprint and schema
blueprint = {
"schema": {
"type": "object",
"properties": {
"_id": {"type": "string", "bsonType": "objectId"},
"device_id": {"type": "string"},
"status": {"type": "string", "enum": ["online", "offline"]},
"timestamp": {"type": "string", "bsonType": "date"}
},
"required": ["device_id", "status"]
},
"metadata": {
"profile": {
"status": {"online": 0.9, "offline": 0.1} # 90% online, 10% offline
}
}
}
# 2. Generate synthetic data
generator = JsonSchemaGenerator(blueprint, documents_per_collection=5000, seed=42)
documents = generator.generate_batch()
# 3. Bulk ingest into MongoDB
client = MongoClient("mongodb://localhost:27017")
collection = client["iot_db"]["devices"]
ingester = DataIngester(
target_collection=collection,
target_uri="mongodb://localhost:27017",
batch_size=1000,
live_source_uri="mongodb+srv://prod-cluster" # Safety guardrail
)
inserted_count = ingester.ingest(documents)
print(f"Successfully seeded {inserted_count} documents.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mongo_synth-1.0.0.tar.gz.
File metadata
- Download URL: mongo_synth-1.0.0.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aba89ed891847947fbed682b59e10c9f9bf932c9c13eff958a9217d8d3342b93
|
|
| MD5 |
6dbda3d848889dbcaca553eccbf6de95
|
|
| BLAKE2b-256 |
9999e3105c30663403117440b35d1df6bef7304d0c7e812f9f7fc23eabdca2de
|
Provenance
The following attestation bundles were made for mongo_synth-1.0.0.tar.gz:
Publisher:
publish.yml on JMartynov/mongo-synth
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mongo_synth-1.0.0.tar.gz -
Subject digest:
aba89ed891847947fbed682b59e10c9f9bf932c9c13eff958a9217d8d3342b93 - Sigstore transparency entry: 1703619114
- Sigstore integration time:
-
Permalink:
JMartynov/mongo-synth@b2abfb7ebbf049f26528def130b2244e1e125582 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/JMartynov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b2abfb7ebbf049f26528def130b2244e1e125582 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mongo_synth-1.0.0-py3-none-any.whl.
File metadata
- Download URL: mongo_synth-1.0.0-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0cf71654cc6256fa7cc6f251eca51702652d8bfca8bf04a99ba8ec3ab89c11b
|
|
| MD5 |
0d2bfedde5be0afd12752905bbac7d0c
|
|
| BLAKE2b-256 |
18fa59012b2e761a6df33fa248b2dd15c94458bab0c074cab3dc3bb991945982
|
Provenance
The following attestation bundles were made for mongo_synth-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on JMartynov/mongo-synth
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mongo_synth-1.0.0-py3-none-any.whl -
Subject digest:
d0cf71654cc6256fa7cc6f251eca51702652d8bfca8bf04a99ba8ec3ab89c11b - Sigstore transparency entry: 1703619297
- Sigstore integration time:
-
Permalink:
JMartynov/mongo-synth@b2abfb7ebbf049f26528def130b2244e1e125582 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/JMartynov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b2abfb7ebbf049f26528def130b2244e1e125582 -
Trigger Event:
push
-
Statement type: