Skip to main content

A Python library for text redaction and anonymization

Project description

Redactor

PyPI version Python versions License: MIT

A Python library for text redaction and anonymization, built on top of Microsoft Presidio. A super useful tool for both individuals and enterprises to build custom anonymization layer for prompt and deanonymization for LLM Inference.

Features

  • 🔒 Automatic detection and redaction of sensitive information
  • 🔄 Reversible redaction with mapping preservation
  • 🎯 Custom entity recognition
  • 🛠 Configurable fuzzy matching
  • 📝 Support for custom word lists
  • 🔧 Extensible recognizer framework

Installation

pip install redactor

After installation, you'll need to download the required spaCy model:

python -m spacy download en_core_web_sm

Quick Start

from redactor import Redactor

# Initialize redactor
redactor = Redactor()

# Redact text
text = "My name is John Doe and my email is john.doe@example.com"
redacted_text, mappings = redactor.redact(text)
print(redacted_text)
# Output: "My name is PERSON_1 and my email is EMAIL_ADDRESS_1"

# Restore original text
original_text = redactor.restore(redacted_text, mappings)
print(original_text)
# Output: "My name is John Doe and my email is john.doe@example.com"

Advanced Usage

Custom Word Lists

# Initialize with custom words to redact
redactor = Redactor(custom_words=["PROJECT-X", "OPERATION-Y"])

text = "Discussing PROJECT-X details"
redacted_text, mappings = redactor.redact(text)
print(redacted_text)
# Output: "Discussing CUSTOM_1 details"

Custom Entity Recognition

from redactor import RecognizerBuilder

# Create custom recognizer for product codes
product_recognizer = (RecognizerBuilder("PRODUCT")
                     .with_pattern(r"Product-[A-Z]+")
                     .with_context(["released", "shipment"])
                     .build())

redactor = Redactor()
redactor.add_recognizer(product_recognizer)

text = "The new Product-ALPHA will be released next month"
redacted_text, mappings = redactor.redact(text)
print(redacted_text)
# Output: "The new PRODUCT_1 will be released next month"

Fuzzy Matching

# Enable fuzzy matching
redactor = Redactor(fuzzy_mapping=1)

text = """
John Smith is the CEO.
Jon Smith signed the document.
"""
redacted_text, mappings = redactor.redact(text)
# Similar names will use the same replacement

Supported Entity Types

Default entities include:

  • PERSON
  • EMAIL_ADDRESS
  • PHONE_NUMBER
  • CREDIT_CARD
  • DATE_TIME
  • LOCATION
  • ORGANIZATION

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/xaisr/redactor.git
cd redactor

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements-dev.txt

# Install package in editable mode
pip install -e .

Run Tests

pytest

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redactor-0.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redactor-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file redactor-0.1.0.tar.gz.

File metadata

  • Download URL: redactor-0.1.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for redactor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 faa21ad252084d1ce956380c9139a4ca63db6b6a0a87d2676957084c3b3d3430
MD5 60ab511b2dc0206ba293cbeeb533610b
BLAKE2b-256 d44a27209e3d2f08c47b39a5c33308e9dc6d0eae6f974396a4170380c70b8ed1

See more details on using hashes here.

File details

Details for the file redactor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: redactor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for redactor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0caf9057caacc3b2ff6cde28a5f30794f38cfe584c8937c049415ad6a34fd6d2
MD5 ba469f5df386687d1687176c779a53dd
BLAKE2b-256 486284243103d9f3e324f3f729c718784edb0f7be731549e5bb5d7b3c6f422b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page