Skip to main content

MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the `faker` library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with `motor` for efficiency.

Project description

MongoMasker

MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the faker library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with motor for efficiency.

Features

  • Anonymizes specified fields with realistic fake data
  • Supports nested fields and fields within objects in arrays
  • Processes documents in batches for better performance
  • Uses asynchronous processing for efficiency
  • Allows warnings for missing fields or unsupported structures

Requirements

  • Python 3.6+
  • motor library
  • pymongo library
  • faker library
  • typer library

Installation

Install the required libraries using pip:

poetry install

Usage

Command-Line Example To anonymize fields in a MongoDB collection, run the following command:

mongomasker \
    "mongodb://your_username:your_password@your_host:your_port" \
    source_database \
    source_collection \
    target_database \
    target_collection \
    fields_to_anonymize.json \
    --batch-size 100 \
    --show-warnings \
    --mongo-filter '{"status": "active", "createdAt": {"$gt": "2023-01-01"}}'

Arguments

  • mongo_uri: MongoDB connection URI (e.g., "mongodb://localhost:27017")
  • source_db: Name of the source database
  • source_collection: Name of the source collection
  • target_db: Name of the target database
  • target_collection: Name of the target collection
  • fields_to_anonymize_file: Path to the JSON file specifying the fields to anonymize
  • --batch-size: (Optional) Number of documents to process in each batch (default: 100)
  • --show-warnings: (Optional) Show warnings for missing fields or unsupported structures
  • --mongo-filter: (Optional) MongoDB filter as JSON string to filter source documents (default: "{}") Example fields_to_anonymize.json Create a JSON file specifying the fields to anonymize and their corresponding data types. For example:
{
    "name": "name",
    "email": "email",
    "address.street": "address",
    "address.city": "city",
    "address.zipcode": "zipcode",
    "user.stateCode": "statecode",
    "user.lastname": "lastname",
    "user.fullname": "lastnamefirstname",
    "createdAt": "date",
    "updatedAt": "datestr",
    "order.id": "id"
}

Explanation of Transformations

The fields_to_anonymize.json file maps field names to the type of fake data to generate. Below are examples of transformations for various data types:

Field Name Data Type Example Transformation
name name "John" → "Alice"
email email "john.doe@example.com" → "alice@example.com"
address.street address "123 Main St" → "456 Elm St"
address.city city "New York" → "Los Angeles"
address.zipcode zipcode "10001" → "90210"
user.stateCode statecode "NY" → "CA"
user.lastname lastname "Doe" → "Smith"
user.fullname lastnamefirstname "Doe, John" → "Smith, Alice"
createdAt date "2023-01-01" → "2025-03-22"
updatedAt datestr "2023-01-01" → "2025-03-22"
order.id id "1234567890" → "9876543210"

Sample Workflow Prepare the fields_to_anonymize.json file: Create a JSON file specifying the fields to anonymize and their corresponding data types.

Run the MongoMasker CLI: Use the command-line tool to anonymize the data in the source collection and copy it to the target collection.

Verify the Results: Check the target collection to ensure the data has been anonymized as expected.

Example

Input Document (Source Collection)

{
    "_id": "12345",
    "name": "John Doe",
    "email": "john.doe@example.com",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "zipcode": "10001"
    },
    "user": {
        "stateCode": "NY",
        "lastname": "Doe",
        "fullname": "Doe, John"
    },
    "createdAt": "2023-01-01",
    "updatedAt": "2023-01-02",
    "order": {
        "id": "1234567890"
    }
}

Output Document (Target Collection)

{
    "_id": "12345",
    "name": "Alice Smith",
    "email": "alice@example.com",
    "address": {
        "street": "456 Elm St",
        "city": "Los Angeles",
        "zipcode": "90210"
    },
    "user": {
        "stateCode": "CA",
        "lastname": "Smith",
        "fullname": "Smith, Alice"
    },
    "createdAt": "2025-03-22",
    "updatedAt": "2025-03-22",
    "order": {
        "id": "9876543210"
    }
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mongomasker_cli-0.5.3.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mongomasker_cli-0.5.3-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file mongomasker_cli-0.5.3.tar.gz.

File metadata

  • Download URL: mongomasker_cli-0.5.3.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.9 Darwin/24.5.0

File hashes

Hashes for mongomasker_cli-0.5.3.tar.gz
Algorithm Hash digest
SHA256 5f8aae9aaaf7f6cd8ff7ebae31d36fb2a31a3f2613db876905acbaddc5f78e94
MD5 6553ac3280712b1d21289ec52e2a257a
BLAKE2b-256 83baa886516da3e170ad6641deb15dc0e4b0bf041ac7a8b9adf32bb73bc8100d

See more details on using hashes here.

File details

Details for the file mongomasker_cli-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: mongomasker_cli-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.9 Darwin/24.5.0

File hashes

Hashes for mongomasker_cli-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fc00b9a05f1073fc4b48c7616a6621021b5707b2cadbe3a1b6a6c31482c62d87
MD5 8fea503bd23cd2915add5429a332a9e6
BLAKE2b-256 1d4aaef2572984f320357a8fa0cdfb421d2ec409f36908e47fd379fc0b1cd547

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page