Skip to main content

MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the `faker` library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with `motor` for efficiency.

Project description

MongoMasker

MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the faker library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with motor for efficiency.

Features

  • Anonymizes specified fields with realistic fake data
  • Supports nested fields and fields within objects in arrays
  • Processes documents in batches for better performance
  • Uses asynchronous processing for efficiency
  • Allows warnings for missing fields or unsupported structures

Requirements

  • Python 3.6+
  • motor library
  • pymongo library
  • faker library
  • typer library

Installation

Install the required libraries using pip:

poetry install

Usage

Command-Line Example To anonymize fields in a MongoDB collection, run the following command:

mongomasker \
    "mongodb://your_username:your_password@your_host:your_port" \
    source_database \
    source_collection \
    target_database \
    target_collection \
    fields_to_anonymize.json \
    --batch-size 100 \
    --show-warnings

Arguments

  • mongo_uri: MongoDB connection URI (e.g., "mongodb://localhost:27017")
  • source_db: Name of the source database
  • source_collection: Name of the source collection
  • target_db: Name of the target database
  • target_collection: Name of the target collection
  • fields_to_anonymize_file: Path to the JSON file specifying the fields to anonymize
  • --batch-size: (Optional) Number of documents to process in each batch (default: 100)
  • --show-warnings: (Optional) Show warnings for missing fields or unsupported structures Example fields_to_anonymize.json Create a JSON file specifying the fields to anonymize and their corresponding data types. For example:
{
    "name": "name",
    "email": "email",
    "address.street": "address",
    "address.city": "city",
    "address.zipcode": "zipcode",
    "user.stateCode": "statecode",
    "user.lastname": "lastname",
    "user.fullname": "lastnamefirstname",
    "createdAt": "date",
    "updatedAt": "datestr",
    "order.id": "id"
}

Explanation of Transformations

The fields_to_anonymize.json file maps field names to the type of fake data to generate. Below are examples of transformations for various data types:

Field Name Data Type Example Transformation
name name "John" → "Alice"
email email "john.doe@example.com" → "alice@example.com"
address.street address "123 Main St" → "456 Elm St"
address.city city "New York" → "Los Angeles"
address.zipcode zipcode "10001" → "90210"
user.stateCode statecode "NY" → "CA"
user.lastname lastname "Doe" → "Smith"
user.fullname lastnamefirstname "Doe, John" → "Smith, Alice"
createdAt date "2023-01-01" → "2025-03-22"
updatedAt datestr "2023-01-01" → "2025-03-22"
order.id id "1234567890" → "9876543210"

Sample Workflow Prepare the fields_to_anonymize.json file: Create a JSON file specifying the fields to anonymize and their corresponding data types.

Run the MongoMasker CLI: Use the command-line tool to anonymize the data in the source collection and copy it to the target collection.

Verify the Results: Check the target collection to ensure the data has been anonymized as expected.

Example

Input Document (Source Collection)

{
    "_id": "12345",
    "name": "John Doe",
    "email": "john.doe@example.com",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "zipcode": "10001"
    },
    "user": {
        "stateCode": "NY",
        "lastname": "Doe",
        "fullname": "Doe, John"
    },
    "createdAt": "2023-01-01",
    "updatedAt": "2023-01-02",
    "order": {
        "id": "1234567890"
    }
}

Output Document (Target Collection)

{
    "_id": "12345",
    "name": "Alice Smith",
    "email": "alice@example.com",
    "address": {
        "street": "456 Elm St",
        "city": "Los Angeles",
        "zipcode": "90210"
    },
    "user": {
        "stateCode": "CA",
        "lastname": "Smith",
        "fullname": "Smith, Alice"
    },
    "createdAt": "2025-03-22",
    "updatedAt": "2025-03-22",
    "order": {
        "id": "9876543210"
    }
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mongomasker_cli-0.5.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mongomasker_cli-0.5.1-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file mongomasker_cli-0.5.1.tar.gz.

File metadata

  • Download URL: mongomasker_cli-0.5.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.4.0

File hashes

Hashes for mongomasker_cli-0.5.1.tar.gz
Algorithm Hash digest
SHA256 4e4acf01e8b2cae6c5e09a20f4657b11949e91861c9fb65afb3149db8044b478
MD5 b557794ecde7f6ccb54749fcfc3059fd
BLAKE2b-256 0e5dfb1ae327727ce4aff50b53190c04baf9c088b8671ee101651a66e23064ca

See more details on using hashes here.

File details

Details for the file mongomasker_cli-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: mongomasker_cli-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.4.0

File hashes

Hashes for mongomasker_cli-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ff5e9c05e04c60c785742f525f146bb74e4a73336c88ff2dd1fc0dd75cb8c417
MD5 9fb6690dc2550c40533de914bbf0ba3f
BLAKE2b-256 f7d49bcd476c433bc1fdf6bb11bb0e7ed3e34fda153603772e71ca7f0b779d2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page