PyO3 bindings for sci-anonymizer — reversible entity anonymization for LLM round-trips
Project description
sci-anonymizer-py
PyO3 bindings for sci-anonymizer — reversible entity anonymization for LLM round-trips.
Compiled to platform-specific wheels using maturin with abi3 (stable ABI), so a single wheel covers Python 3.10+.
Installation (from source)
Prerequisites
- Python 3.10+
- Rust 1.95+
maturin(install viapipx install maturinorpip install maturin)
Build and install in a venv
cd core/crates/sci-anonymizer-py
# Create or activate a Python venv
python3 -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
# Install maturin if not already present
pip install maturin
# Build and install the wheel into the venv
maturin develop
# Verify installation
python -c "import sci_anonymizer; print(sci_anonymizer.SESSION_FORMAT_VERSION)"
Quick Start
from sci_anonymizer import (
anonymize, deanonymize,
Entity, EntityType,
)
# Anonymize text
text = "Email casey@example.com about the Acme project deal"
result = anonymize(text)
print(result.text)
# Output: Email EMAIL_1 about the PROJECT_1 deal
# Deanonymize (reverse the tokens back to real entities)
model_reply = "I'll contact EMAIL_1 about PROJECT_1 next week"
restored = deanonymize(model_reply, result.token_map)
print(restored)
# Output: I'll contact casey@example.com about Acme next week
# Use custom entities (domain-specific terms)
custom = [Entity("InternalCodeXYZ", EntityType.Secret)]
result = anonymize(text, custom_entities=custom)
API Overview
Core Functions
anonymize(text, existing=None)— Detect entities and replace with tokens.anonymize_with_custom(text, existing=None, custom_entities=None)— Same + custom entities.deanonymize(text, token_map)— Reverse: tokens → entities.build_token_map(entities, existing=None)— Lower-level: build a token map from entities.apply_token_map(text, token_map)— Lower-level: apply substitutions to text.
Types
EntityType— Enum:Person,Place,Org,Project,Email,Phone,Url,Handle,Secret,IpAddress.Entity— A detected span:Entity(text, entity_type).TokenMap— Bidirectional mapping. Can serialize/deserialize:token_map.to_session_json()→ JSON string (versioned envelope).TokenMap.from_session_json(json_str)→ TokenMap (raisesValueErrorif unsupported version).
AnonymizeResult— Output ofanonymize*with.text,.token_map,.entity_count,.entities.
Constants
SESSION_FORMAT_VERSION— Current session format version (int). See session serialization contract in../sci-anonymizer/API.md.
Session Persistence
# Serialize a token map for storage
json_str = result.token_map.to_session_json()
# Save json_str to disk/database
# Later, restore and extend
token_map = TokenMap.from_session_json(json_str)
next_result = anonymize(new_text, existing=token_map)
# Same entity will get the same token as before
Testing
Run the Python smoke test:
cd core/crates/sci-anonymizer-py
python tests/test_smoke.py
The smoke test validates:
- Round-trip fidelity:
deanonymize(anonymize(text).text, map) == text - Multiple entity types detected correctly
- Session serialization/deserialization
- Custom entities
Limitations
This binding wraps the portable regex and CamelCase entity detection from sci-anonymizer. It does not include:
- NLP NER (Named Entity Recognition for PERSON/PLACE/ORG): Tracked in SCI-123. The Rust port uses a CamelCase heuristic to catch compound proper nouns, but bare "John Doe" style names are not detected without an NER model.
- Custom entity loading from identity_facts: Tracked in SCI-124.
Users supply custom entities via the
custom_entitiesparameter.
For production use with full NER, integrate with the Rust core directly or patch this layer with the SCI-123/124 implementations when available.
License
Licensed under Apache-2.0 OR MIT, same as sci-anonymizer.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sci_anonymizer-0.2.0.tar.gz.
File metadata
- Download URL: sci_anonymizer-0.2.0.tar.gz
- Upload date:
- Size: 99.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5220eca5db3f691918efe9a071597c700c226e2acbe2300aa7b631097d35b1f6
|
|
| MD5 |
697d710016bcce385ca4c5e5c4756b95
|
|
| BLAKE2b-256 |
3be29f1a08e12af5299d7d6d38a78008c3d8a25069e43b4bb9c9393775c5f1b8
|
File details
Details for the file sci_anonymizer-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: sci_anonymizer-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 693.3 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c90122230408bf265e2fab5167e1530519ae426e7c63744320a79eb67cabbf58
|
|
| MD5 |
a82ebb6fa35b47146bc14add0ab2bc2d
|
|
| BLAKE2b-256 |
81cac5d05582e97b8c6e227f1c8b96459a0aacc469d610d7884378a9ac0e2944
|
File details
Details for the file sci_anonymizer-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: sci_anonymizer-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 682.2 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4011b7afa886f3d8ae2b1d883e0b1500c1c60622dac59ac48e2a1acfbae35cb6
|
|
| MD5 |
e508bf3393781d88ef863ddc84654f6c
|
|
| BLAKE2b-256 |
2ae4c5c4c62f9d3e4fa97dfe7d3fff4789986bc2c5af661b62929f9af2cc6876
|
File details
Details for the file sci_anonymizer-0.2.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: sci_anonymizer-0.2.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 596.5 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb88ab9b142e30fb8907fb090568b57307d9e3bfb4c80971745d90bfa1357992
|
|
| MD5 |
a2008aecffcb14fdae37c36e7f6ebd1a
|
|
| BLAKE2b-256 |
00e55e343b30453dcf1546354e08edd68d671828094e8e31df3275ed5de9684a
|
File details
Details for the file sci_anonymizer-0.2.0-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: sci_anonymizer-0.2.0-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 622.0 kB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
884aaf24dacc69e526dcc451077922397219b06444d6ab9bc3f5dc9db9f710d8
|
|
| MD5 |
69b25a40767eeaf06da0486d1a448e25
|
|
| BLAKE2b-256 |
c454b6cee1b59362176d658bea86b2a6ae6a16c07a081d7a37e12dbf4cfff31f
|