A Python library for text redaction and anonymization
Project description
Redactor
A Python library for text redaction and anonymization, built on top of Microsoft Presidio. A super useful tool for both individuals and enterprises to build custom anonymization layer for prompt and deanonymization for LLM Inference.
Features
- 🔒 Automatic detection and redaction of sensitive information
- 🔄 Reversible redaction with mapping preservation
- 🎯 Custom entity recognition
- 🛠 Configurable fuzzy matching
- 📝 Support for custom word lists
- 🔧 Extensible recognizer framework
Installation
pip install redactor
After installation, you'll need to download the required spaCy model:
python -m spacy download en_core_web_sm
Quick Start
from redactor import Redactor
# Initialize redactor
redactor = Redactor()
# Redact text
text = "My name is John Doe and my email is john.doe@example.com"
redacted_text, mappings = redactor.redact(text)
print(redacted_text)
# Output: "My name is PERSON_1 and my email is EMAIL_ADDRESS_1"
# Restore original text
original_text = redactor.restore(redacted_text, mappings)
print(original_text)
# Output: "My name is John Doe and my email is john.doe@example.com"
Advanced Usage
Custom Word Lists
# Initialize with custom words to redact
redactor = Redactor(custom_words=["PROJECT-X", "OPERATION-Y"])
text = "Discussing PROJECT-X details"
redacted_text, mappings = redactor.redact(text)
print(redacted_text)
# Output: "Discussing CUSTOM_1 details"
Custom Entity Recognition
from redactor import RecognizerBuilder
# Create custom recognizer for product codes
product_recognizer = (RecognizerBuilder("PRODUCT")
.with_pattern(r"Product-[A-Z]+")
.with_context(["released", "shipment"])
.build())
redactor = Redactor()
redactor.add_recognizer(product_recognizer)
text = "The new Product-ALPHA will be released next month"
redacted_text, mappings = redactor.redact(text)
print(redacted_text)
# Output: "The new PRODUCT_1 will be released next month"
Fuzzy Matching
# Enable fuzzy matching
redactor = Redactor(fuzzy_mapping=1)
text = """
John Smith is the CEO.
Jon Smith signed the document.
"""
redacted_text, mappings = redactor.redact(text)
# Similar names will use the same replacement
Supported Entity Types
Default entities include:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- DATE_TIME
- LOCATION
- ORGANIZATION
Development
Setup Development Environment
# Clone the repository
git clone https://github.com/xaisr/redactor.git
cd redactor
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements-dev.txt
# Install package in editable mode
pip install -e .
Run Tests
pytest
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on top of Microsoft Presidio
- Uses spaCy for NLP tasks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redactor-0.1.0.tar.gz.
File metadata
- Download URL: redactor-0.1.0.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
faa21ad252084d1ce956380c9139a4ca63db6b6a0a87d2676957084c3b3d3430
|
|
| MD5 |
60ab511b2dc0206ba293cbeeb533610b
|
|
| BLAKE2b-256 |
d44a27209e3d2f08c47b39a5c33308e9dc6d0eae6f974396a4170380c70b8ed1
|
File details
Details for the file redactor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: redactor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0caf9057caacc3b2ff6cde28a5f30794f38cfe584c8937c049415ad6a34fd6d2
|
|
| MD5 |
ba469f5df386687d1687176c779a53dd
|
|
| BLAKE2b-256 |
486284243103d9f3e324f3f729c718784edb0f7be731549e5bb5d7b3c6f422b8
|