Skip to main content

A standalone, schema-based data generator and bulk ingestion utility for MongoDB

Project description

mongo-synth: MongoDB Schema-Based Data Generator & Ingester

mongo-synth is a standalone Python utility and command-line tool designed to generate high-fidelity, deterministic synthetic datasets from JSON Schemas (or Pydantic models) and seed them directly into MongoDB collections at scale.

Whether you are performing database index optimization, latency stress testing, schema validation, or writing integration tests, mongo-synth allows you to rapidly populate mock databases with realistic data, statistical distributions, and edge-case anomalies.


Key Features

  • 🧬 JSON Schema Synthesis: Translates arbitrary JSON Schema specifications (Draft 2020-12) into deterministic property-based generation strategies using hypothesis-jsonschema.
  • 🍃 Native BSON Type Mapping: Supports MongoDB-specific types (ObjectId, ISODate, Decimal128, BinData) via custom "bsonType" schema annotations.
  • 📊 Statistical Value Profiling: Inject real-world data properties by defining relative probability weights for specific fields (e.g., status field containing 80% active / 20% inactive).
  • High-Performance Bulk Ingestion: Iterates over generated streams and inserts them in configurable batch chunks via PyMongo's unordered insert_many for maximum throughput.
  • 🚨 Anomaly & Schema Drift Injection: Test system resilience under fire by injecting whitespace key anomalies, mixed-type arrays, extreme nesting depths, emojis, or string type impersonations.
  • 🔒 Production Safety Lock: Protects production environments by automatically asserting connection strings against a configured live database URI and blocking execution on a match.

Installation

pip install .

Quick Start

1. CLI Usage

Generate and ingest 10,000 orders into a local database using a schema:

mongo-synth \
  --schema path/to/order_schema.json \
  --uri mongodb://localhost:27017 \
  --db testing_db \
  --collection orders \
  --count 10000 \
  --clear

2. Python API Usage

from pymongo import MongoClient
from mongo_synth.generators import JsonSchemaGenerator
from mongo_synth.ingestion import DataIngester

# 1. Define your blueprint and schema
blueprint = {
    "schema": {
        "type": "object",
        "properties": {
            "_id": {"type": "string", "bsonType": "objectId"},
            "device_id": {"type": "string"},
            "status": {"type": "string", "enum": ["online", "offline"]},
            "timestamp": {"type": "string", "bsonType": "date"}
        },
        "required": ["device_id", "status"]
    },
    "metadata": {
        "profile": {
            "status": {"online": 0.9, "offline": 0.1} # 90% online, 10% offline
        }
    }
}

# 2. Generate synthetic data
generator = JsonSchemaGenerator(blueprint, documents_per_collection=5000, seed=42)
documents = generator.generate_batch()

# 3. Bulk ingest into MongoDB
client = MongoClient("mongodb://localhost:27017")
collection = client["iot_db"]["devices"]

ingester = DataIngester(
    target_collection=collection,
    target_uri="mongodb://localhost:27017",
    batch_size=1000,
    live_source_uri="mongodb+srv://prod-cluster" # Safety guardrail
)

inserted_count = ingester.ingest(documents)
print(f"Successfully seeded {inserted_count} documents.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mongo_synth-1.0.2.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mongo_synth-1.0.2-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file mongo_synth-1.0.2.tar.gz.

File metadata

  • Download URL: mongo_synth-1.0.2.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mongo_synth-1.0.2.tar.gz
Algorithm Hash digest
SHA256 786f32aab2221f8f7b64f7be13f9fd67ea62a495dff9a0edf3854a3febcbb74f
MD5 6c9a04987078bc2cd335f2326f3d5455
BLAKE2b-256 24dd3e026709102eef5f9c5cc4629eddb8f698fdb4d430e004615ba2c098a2c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for mongo_synth-1.0.2.tar.gz:

Publisher: publish.yml on JMartynov/mongo-synth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mongo_synth-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: mongo_synth-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mongo_synth-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dee4a1f2c41fda972fa7af5738fbcd2970be988e7a1dbdc79104aedbce9b5fdc
MD5 0ce4b75c2594a714e0720d31c8c53ebe
BLAKE2b-256 51fb67a46865af65e1323617d0d89b38fc9df674e48f371e60caa4a75581ca5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mongo_synth-1.0.2-py3-none-any.whl:

Publisher: publish.yml on JMartynov/mongo-synth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page