Skip to main content

A Python library for anonymizing and de-anonymizing text

Project description

Text Anonymizer

Text Anonymizer is a Python library that anonymizes text by replacing entities with placeholders and allows for de-anonymization. It uses the spaCy library for natural language processing and entity recognition.

Features

  • Text anonymization by replacing entities with placeholders
  • De-anonymization to restore original text
  • Entity recognition for various types (PERSON, ORGANIZATION, LOCATION, EMAIL, URL, etc.)
  • Command-line interface for easy usage

Installation

You can install the Text Anonymizer library using pip:

pip install text-anonymizer

This will also install the required dependencies, including spaCy.

After installation, you need to download the spaCy English language model:

python -m spacy download en_core_web_sm

Usage

As a Python Library

Here's a simple example of how to use the Text Anonymizer in your Python code:

from text_anonymizer import anonymize, deanonymize

# Original text
text = "John Smith from Acme Corporation called me at john.smith@acme.com."

# Anonymize the text
anonymized_text, anonymization_map = anonymize(text)

print("Anonymized text:", anonymized_text)
# Output: [ENTITY_PERSON_1] from [ENTITY_ORG_1] called me at [ENTITY_EMAIL_1].

# De-anonymize the text
original_text = deanonymize(anonymized_text, anonymization_map)

print("Original text:", original_text)
# Output: John Smith from Acme Corporation called me at john.smith@acme.com.

Command-line Interface

Text Anonymizer also provides a command-line interface for easy usage:

# Anonymize a text file
text-anonymizer anonymize input.txt output.txt

# De-anonymize a text file
text-anonymizer deanonymize input.txt output.txt --map_file map.json

Documentation

For detailed documentation, including API reference and advanced usage, please visit our documentation page.

Development

To set up the development environment:

  1. Clone the repository:

    git clone https://github.com/viktorbezdek/text-anonymizer.git
    cd text-anonymizer
    
  2. Install Poetry (if not already installed):

    pip install poetry
    
  3. Install dependencies:

    poetry install
    
  4. Activate the virtual environment:

    poetry shell
    
  5. Run tests:

    pytest
    

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a detailed history of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_anonymizer-0.1.0.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

text_anonymizer-0.1.0-py3-none-any.whl (6.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page