A Python library for anonymizing and de-anonymizing text
Project description
Text Anonymizer
Text Anonymizer is a Python library that anonymizes text by replacing entities with placeholders and allows for de-anonymization. It uses the spaCy library for natural language processing and entity recognition.
Features
- Text anonymization by replacing entities with placeholders
- De-anonymization to restore original text
- Entity recognition for various types (PERSON, ORGANIZATION, LOCATION, EMAIL, URL, etc.)
- Command-line interface for easy usage
Installation
You can install the Text Anonymizer library using pip:
pip install text-anonymizer
This will also install the required dependencies, including spaCy.
After installation, you need to download the spaCy English language model:
python -m spacy download en_core_web_sm
Usage
As a Python Library
Here's a simple example of how to use the Text Anonymizer in your Python code:
from text_anonymizer import anonymize, deanonymize
# Original text
text = "John Smith from Acme Corporation called me at john.smith@acme.com."
# Anonymize the text
anonymized_text, anonymization_map = anonymize(text)
print("Anonymized text:", anonymized_text)
# Output: [ENTITY_PERSON_1] from [ENTITY_ORG_1] called me at [ENTITY_EMAIL_1].
# De-anonymize the text
original_text = deanonymize(anonymized_text, anonymization_map)
print("Original text:", original_text)
# Output: John Smith from Acme Corporation called me at john.smith@acme.com.
Command-line Interface
Text Anonymizer also provides a command-line interface for easy usage:
# Anonymize a text file
text-anonymizer anonymize input.txt output.txt
# De-anonymize a text file
text-anonymizer deanonymize input.txt output.txt --map_file map.json
Documentation
For detailed documentation, including API reference and advanced usage, please visit our documentation page.
Development
To set up the development environment:
-
Clone the repository:
git clone https://github.com/viktorbezdek/text-anonymizer.git cd text-anonymizer
-
Install Poetry (if not already installed):
pip install poetry
-
Install dependencies:
poetry install
-
Activate the virtual environment:
poetry shell
-
Run tests:
pytest
Contributing
Contributions are welcome! Please see our Contributing Guide for more details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a detailed history of changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for text_anonymizer-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96d2deeda36575427097de671a52bc6ec55709c27f11d73e0da9cfb4ec24a6d0 |
|
MD5 | 188ed7f7a8ed38ff9f2bf68a20f692c9 |
|
BLAKE2b-256 | 0f456abb291862d7b52af7c07e68f93c4988381fac95b872e14cc34bced96a75 |