Skip to main content

A Python library for anonymizing and de-anonymizing text

Project description

Text Anonymizer

Text Anonymizer is a Python library that anonymizes text by replacing entities with placeholders and allows for de-anonymization. It uses the spaCy library for natural language processing and entity recognition.

Features

  • Text anonymization by replacing entities with placeholders
  • De-anonymization to restore original text
  • Entity recognition for various types (PERSON, ORGANIZATION, LOCATION, EMAIL, URL, etc.)
  • Command-line interface for easy usage

Installation

You can install the Text Anonymizer library using pip:

pip install text-anonymizer

This will also install the required dependencies, including spaCy.

After installation, you need to download the spaCy English language model:

python -m spacy download en_core_web_sm

Usage

As a Python Library

Here's a simple example of how to use the Text Anonymizer in your Python code:

from text_anonymizer import anonymize, deanonymize

# Original text
text = "John Smith from Acme Corporation called me at john.smith@acme.com."

# Anonymize the text
anonymized_text, anonymization_map = anonymize(text)

print("Anonymized text:", anonymized_text)
# Output: [ENTITY_PERSON_1] from [ENTITY_ORG_1] called me at [ENTITY_EMAIL_1].

# De-anonymize the text
original_text = deanonymize(anonymized_text, anonymization_map)

print("Original text:", original_text)
# Output: John Smith from Acme Corporation called me at john.smith@acme.com.

Command-line Interface

Text Anonymizer also provides a command-line interface for easy usage:

# Anonymize a text file
text-anonymizer anonymize input.txt output.txt

# De-anonymize a text file
text-anonymizer deanonymize input.txt output.txt --map_file map.json

Documentation

For detailed documentation, including API reference and advanced usage, please visit our documentation page.

Development

To set up the development environment:

  1. Clone the repository:

    git clone https://github.com/viktorbezdek/text-anonymizer.git
    cd text-anonymizer
    
  2. Install Poetry (if not already installed):

    pip install poetry
    
  3. Install dependencies:

    poetry install
    
  4. Activate the virtual environment:

    poetry shell
    
  5. Run tests:

    pytest
    

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a detailed history of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_anonymizer-0.1.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

text_anonymizer-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file text_anonymizer-0.1.0.tar.gz.

File metadata

  • Download URL: text_anonymizer-0.1.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for text_anonymizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cb949b52832be637435ec4a65134bb43030c6fe8d9d81da84aa2d6d55031ab15
MD5 84767df423e279e2df202d396fb06aee
BLAKE2b-256 58e0f1f0f31283329054989aebf762bda1b8ecda27d6a30171624d004f845afa

See more details on using hashes here.

File details

Details for the file text_anonymizer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for text_anonymizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96d2deeda36575427097de671a52bc6ec55709c27f11d73e0da9cfb4ec24a6d0
MD5 188ed7f7a8ed38ff9f2bf68a20f692c9
BLAKE2b-256 0f456abb291862d7b52af7c07e68f93c4988381fac95b872e14cc34bced96a75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page