MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the `faker` library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with `motor` for efficiency.
Project description
MongoMasker
MongoMasker is a tool designed to anonymize specified fields in a MongoDB collection. It uses the faker library to generate realistic fake data, processes documents in batches for improved performance, and leverages asynchronous processing with motor for efficiency.
Features
- Anonymizes specified fields with realistic fake data
- Supports nested fields and fields within objects in arrays
- Processes documents in batches for better performance
- Uses asynchronous processing for efficiency
- Allows warnings for missing fields or unsupported structures
Requirements
- Python 3.6+
motorlibrarypymongolibraryfakerlibrarytyperlibrary
Installation
Install the required libraries using pip:
poetry install
Usage
Command-Line Example To anonymize fields in a MongoDB collection, run the following command:
mongomasker \
"mongodb://your_username:your_password@your_host:your_port" \
source_database \
source_collection \
target_database \
target_collection \
fields_to_anonymize.json \
--batch-size 100 \
--show-warnings \
--mongo-filter '{"status": "active", "createdAt": {"$gt": "2023-01-01"}}'
Arguments
mongo_uri: MongoDB connection URI (e.g.,"mongodb://localhost:27017")source_db: Name of the source databasesource_collection: Name of the source collectiontarget_db: Name of the target databasetarget_collection: Name of the target collectionfields_to_anonymize_file: Path to the JSON file specifying the fields to anonymize--batch-size: (Optional) Number of documents to process in each batch (default: 100)--show-warnings: (Optional) Show warnings for missing fields or unsupported structures--mongo-filter: (Optional) MongoDB filter as JSON string to filter source documents (default: "{}") Examplefields_to_anonymize.jsonCreate a JSON file specifying the fields to anonymize and their corresponding data types. For example:
{
"name": "name",
"email": "email",
"address.street": "address",
"address.city": "city",
"address.zipcode": "zipcode",
"user.stateCode": "statecode",
"user.lastname": "lastname",
"user.fullname": "lastnamefirstname",
"createdAt": "date",
"updatedAt": "datestr",
"order.id": "id"
}
Explanation of Transformations
The fields_to_anonymize.json file maps field names to the type of fake data to generate. Below are examples of transformations for various data types:
| Field Name | Data Type | Example Transformation |
|---|---|---|
| name | name | "John" → "Alice" |
| "john.doe@example.com" → "alice@example.com" | ||
| address.street | address | "123 Main St" → "456 Elm St" |
| address.city | city | "New York" → "Los Angeles" |
| address.zipcode | zipcode | "10001" → "90210" |
| user.stateCode | statecode | "NY" → "CA" |
| user.lastname | lastname | "Doe" → "Smith" |
| user.fullname | lastnamefirstname | "Doe, John" → "Smith, Alice" |
| createdAt | date | "2023-01-01" → "2025-03-22" |
| updatedAt | datestr | "2023-01-01" → "2025-03-22" |
| order.id | id | "1234567890" → "9876543210" |
Sample Workflow Prepare the fields_to_anonymize.json file: Create a JSON file specifying the fields to anonymize and their corresponding data types.
Run the MongoMasker CLI: Use the command-line tool to anonymize the data in the source collection and copy it to the target collection.
Verify the Results: Check the target collection to ensure the data has been anonymized as expected.
Example
Input Document (Source Collection)
{
"_id": "12345",
"name": "John Doe",
"email": "john.doe@example.com",
"address": {
"street": "123 Main St",
"city": "New York",
"zipcode": "10001"
},
"user": {
"stateCode": "NY",
"lastname": "Doe",
"fullname": "Doe, John"
},
"createdAt": "2023-01-01",
"updatedAt": "2023-01-02",
"order": {
"id": "1234567890"
}
}
Output Document (Target Collection)
{
"_id": "12345",
"name": "Alice Smith",
"email": "alice@example.com",
"address": {
"street": "456 Elm St",
"city": "Los Angeles",
"zipcode": "90210"
},
"user": {
"stateCode": "CA",
"lastname": "Smith",
"fullname": "Smith, Alice"
},
"createdAt": "2025-03-22",
"updatedAt": "2025-03-22",
"order": {
"id": "9876543210"
}
}
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mongomasker_cli-0.5.2.tar.gz.
File metadata
- Download URL: mongomasker_cli-0.5.2.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.12.9 Darwin/24.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6082cc46ed3837c4aa1958a1846bc3b67ac3f5ffe57bce7099b9180a866fd69
|
|
| MD5 |
52ed9c2945ac4081256b3d2873f23fee
|
|
| BLAKE2b-256 |
4e23388a874471ff7c85415d33fff92002e55615725abe0979ebcf7ce817528f
|
File details
Details for the file mongomasker_cli-0.5.2-py3-none-any.whl.
File metadata
- Download URL: mongomasker_cli-0.5.2-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.12.9 Darwin/24.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32aea2e041374e6359f576fe8328fc77960c744bf5fd253fdad6b90a423bc189
|
|
| MD5 |
aac7c58c29d551480ec4cf8da2c3efec
|
|
| BLAKE2b-256 |
04ef98cc45f8df6faec0b41dbd060afb7a9a670da95ac531c42500c7c642022e
|