A Python library for anonymizing and de-anonymizing text
Project description
Text Anonymizer
Text Anonymizer is a Python library that anonymizes text by replacing entities with placeholders and allows for de-anonymization. It uses the spaCy library for natural language processing and entity recognition.
Features
- Text anonymization by replacing entities with placeholders
- De-anonymization to restore original text
- Entity recognition for various types (PERSON, ORGANIZATION, LOCATION, EMAIL, URL, etc.)
- Command-line interface for easy usage
Installation
You can install the Text Anonymizer library using pip:
pip install text-anonymizer
This will also install the required dependencies, including spaCy.
After installation, you need to download the spaCy English language model:
python -m spacy download en_core_web_sm
Usage
As a Python Library
Here's a simple example of how to use the Text Anonymizer in your Python code:
from text_anonymizer import anonymize, deanonymize
# Original text
text = "John Smith from Acme Corporation called me at john.smith@acme.com."
# Anonymize the text
anonymized_text, anonymization_map = anonymize(text)
print("Anonymized text:", anonymized_text)
# Output: [ENTITY_PERSON_1] from [ENTITY_ORG_1] called me at [ENTITY_EMAIL_1].
# De-anonymize the text
original_text = deanonymize(anonymized_text, anonymization_map)
print("Original text:", original_text)
# Output: John Smith from Acme Corporation called me at john.smith@acme.com.
Command-line Interface
Text Anonymizer also provides a command-line interface for easy usage:
# Anonymize a text file
text-anonymizer anonymize input.txt output.txt
# De-anonymize a text file
text-anonymizer deanonymize input.txt output.txt --map_file map.json
Documentation
For detailed documentation, including API reference and advanced usage, please visit our documentation page.
Development
To set up the development environment:
-
Clone the repository:
git clone https://github.com/viktorbezdek/text-anonymizer.git cd text-anonymizer
-
Install Poetry (if not already installed):
pip install poetry
-
Install dependencies:
poetry install
-
Activate the virtual environment:
poetry shell
-
Run tests:
pytest
Contributing
Contributions are welcome! Please see our Contributing Guide for more details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a detailed history of changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file text_anonymizer-0.1.0.tar.gz
.
File metadata
- Download URL: text_anonymizer-0.1.0.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb949b52832be637435ec4a65134bb43030c6fe8d9d81da84aa2d6d55031ab15 |
|
MD5 | 84767df423e279e2df202d396fb06aee |
|
BLAKE2b-256 | 58e0f1f0f31283329054989aebf762bda1b8ecda27d6a30171624d004f845afa |
File details
Details for the file text_anonymizer-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: text_anonymizer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96d2deeda36575427097de671a52bc6ec55709c27f11d73e0da9cfb4ec24a6d0 |
|
MD5 | 188ed7f7a8ed38ff9f2bf68a20f692c9 |
|
BLAKE2b-256 | 0f456abb291862d7b52af7c07e68f93c4988381fac95b872e14cc34bced96a75 |