Skip to main content

MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the `faker` library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with `motor` for efficiency.

Project description

MongoMasker

MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the faker library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with motor for efficiency.

Features

  • Anonymizes specified fields with realistic fake data
  • Supports nested fields and fields within objects in arrays
  • Processes documents in batches for better performance
  • Uses asynchronous processing for efficiency
  • Allows warnings for missing fields or unsupported structures

Requirements

  • Python 3.6+
  • motor library
  • pymongo library
  • faker library
  • typer library

Installation

Install the required libraries using pip:

poetry install

Usage

Command-Line Example To anonymize fields in a MongoDB collection, run the following command:

mongomasker \
    "mongodb://your_username:your_password@your_host:your_port" \
    source_database \
    source_collection \
    target_database \
    target_collection \
    fields_to_anonymize.json \
    --batch-size 100 \
    --show-warnings \
    --mongo-filter '{"status": "active", "createdAt": {"$gt": "2023-01-01"}}'

Arguments

  • mongo_uri: MongoDB connection URI (e.g., "mongodb://localhost:27017")
  • source_db: Name of the source database
  • source_collection: Name of the source collection
  • target_db: Name of the target database
  • target_collection: Name of the target collection
  • fields_to_anonymize_file: Path to the JSON file specifying the fields to anonymize
  • --batch-size: (Optional) Number of documents to process in each batch (default: 100)
  • --show-warnings: (Optional) Show warnings for missing fields or unsupported structures
  • --mongo-filter: (Optional) MongoDB filter as JSON string to filter source documents (default: "{}") Example fields_to_anonymize.json Create a JSON file specifying the fields to anonymize and their corresponding data types. For example:
{
    "name": "name",
    "email": "email",
    "address.street": "address",
    "address.city": "city",
    "address.zipcode": "zipcode",
    "user.stateCode": "statecode",
    "user.lastname": "lastname",
    "user.fullname": "lastnamefirstname",
    "createdAt": "date",
    "updatedAt": "datestr",
    "order.id": "id"
}

Explanation of Transformations

The fields_to_anonymize.json file maps field names to the type of fake data to generate. Below are examples of transformations for various data types:

Field Name Data Type Example Transformation
name name "John" → "Alice"
email email "john.doe@example.com" → "alice@example.com"
address.street address "123 Main St" → "456 Elm St"
address.city city "New York" → "Los Angeles"
address.zipcode zipcode "10001" → "90210"
user.stateCode statecode "NY" → "CA"
user.lastname lastname "Doe" → "Smith"
user.fullname lastnamefirstname "Doe, John" → "Smith, Alice"
createdAt date "2023-01-01" → "2025-03-22"
updatedAt datestr "2023-01-01" → "2025-03-22"
order.id id "1234567890" → "9876543210"

Sample Workflow Prepare the fields_to_anonymize.json file: Create a JSON file specifying the fields to anonymize and their corresponding data types.

Run the MongoMasker CLI: Use the command-line tool to anonymize the data in the source collection and copy it to the target collection.

Verify the Results: Check the target collection to ensure the data has been anonymized as expected.

Example

Input Document (Source Collection)

{
    "_id": "12345",
    "name": "John Doe",
    "email": "john.doe@example.com",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "zipcode": "10001"
    },
    "user": {
        "stateCode": "NY",
        "lastname": "Doe",
        "fullname": "Doe, John"
    },
    "createdAt": "2023-01-01",
    "updatedAt": "2023-01-02",
    "order": {
        "id": "1234567890"
    }
}

Output Document (Target Collection)

{
    "_id": "12345",
    "name": "Alice Smith",
    "email": "alice@example.com",
    "address": {
        "street": "456 Elm St",
        "city": "Los Angeles",
        "zipcode": "90210"
    },
    "user": {
        "stateCode": "CA",
        "lastname": "Smith",
        "fullname": "Smith, Alice"
    },
    "createdAt": "2025-03-22",
    "updatedAt": "2025-03-22",
    "order": {
        "id": "9876543210"
    }
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mongomasker_cli-0.5.2.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mongomasker_cli-0.5.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file mongomasker_cli-0.5.2.tar.gz.

File metadata

  • Download URL: mongomasker_cli-0.5.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.9 Darwin/24.4.0

File hashes

Hashes for mongomasker_cli-0.5.2.tar.gz
Algorithm Hash digest
SHA256 e6082cc46ed3837c4aa1958a1846bc3b67ac3f5ffe57bce7099b9180a866fd69
MD5 52ed9c2945ac4081256b3d2873f23fee
BLAKE2b-256 4e23388a874471ff7c85415d33fff92002e55615725abe0979ebcf7ce817528f

See more details on using hashes here.

File details

Details for the file mongomasker_cli-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: mongomasker_cli-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.9 Darwin/24.4.0

File hashes

Hashes for mongomasker_cli-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 32aea2e041374e6359f576fe8328fc77960c744bf5fd253fdad6b90a423bc189
MD5 aac7c58c29d551480ec4cf8da2c3efec
BLAKE2b-256 04ef98cc45f8df6faec0b41dbd060afb7a9a670da95ac531c42500c7c642022e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page