Python bindings for privacy-filter.cpp — fast GGML PII/NER token classification
Project description
privacy-filter
Fast PII/NER detection for Python — thin bindings over
privacy-filter.cpp, a minimal GGML inference engine for OpenAI's privacy-filter token-classification models.
Detect names, emails, phone numbers and other PII with precise UTF-8 byte offsets,
far faster than a stock Hugging Face Transformers pipeline. The upstream pf library and
ggml are statically linked into a single compiled extension via
nanobind +
scikit-build-core, so an installed
wheel is fully self-contained.
Features
- 🔎 Entity spans with byte offsets — every detection carries
start/end/score/label. - 🧩 Merging & dedup helpers — fold token-level spans into
PERSON/MONEY/ADDRESSand collapse repeats. - 🌍 Multilingual — works on non-ASCII text, offsets stay correct.
- 📦 Self-contained wheels — one
abi3wheel per platform, no externallibggmlto locate. - ⚡ Releases the GIL during inference; load a model once and reuse it.
Scope: this release is CPU-only and targets CPython 3.12+. Prebuilt wheels are published for Linux x86_64 (manylinux) and macOS arm64 (Apple Silicon).
Installation
From PyPI (coming soon)
🚧 Not yet published. Once released, install with:
pip install pii-inference
From a prebuilt wheel (GitHub Releases)
Each release attaches abi3 wheels
for Linux x86_64 and macOS arm64. Grab the one for your platform:
pip install https://github.com/solipsy/pii-inference/releases/download/v0.1.0/privacy_filter-0.1.0-cp312-abi3-macosx_11_0_arm64.whl
From source
Building from source compiles the C++ engine, so you need a C++17 compiler and
CMake ≥ 3.21 (CMake/Ninja are fetched automatically by the build). The upstream
engine and its nested ggml are git submodules:
git clone --recursive https://github.com/solipsy/pii-inference.git
cd pii-inference
pip install . # or: uv sync && uv pip install -e . --no-build-isolation
Already cloned without --recursive? Run git submodule update --init --recursive.
Quick start
You supply a GGUF model at runtime (it is not bundled — see Getting a model):
from privacy_filter import PrivacyFilter
text = "Contact Jane Doe at jane.doe@acme.com or +1-202-555-0142."
with PrivacyFilter("model.gguf", device="cpu", n_threads=0) as pf:
for e in pf.classify(text, threshold=0.5):
print(f"{e.label:12} {e.score:.3f} {e.text(text)!r}")
FIRSTNAME 0.589 'Jane'
LASTNAME 0.855 'Doe'
EMAIL 0.991 'jane.doe@acme.com'
PHONE 0.987 '+1-202-555-0142'
Entity exposes .start/.end (UTF-8 byte offsets), .score, .label, and
.text(source). There are also merge_entities() / dedupe_entities() post-processing
helpers, tokenization, long-document windowing, and device selection.
Documentation
Full usage and API reference live in docs/privacy-filter.md:
| Topic | |
|---|---|
| Quick start · Detecting entities | classify, thresholds |
| Byte offsets & redaction | non-ASCII-safe slicing |
| Merging & deduplication | PERSON / MONEY / ADDRESS, dedup |
| Tokenization · Windowing | lower-level access |
| API reference | PrivacyFilter, Entity, Span, functions |
| Building wheels | cibuildwheel, CI |
Testing
uv run pytest # model-free tests
PF_TEST_MODEL=/path/to/model.gguf uv run pytest # full suite incl. classify/tokenize
Maintainers: release/publishing steps live in PUBLISHING.md.
Roadmap
- Publish to PyPI
- Windows wheel
- Optional GPU builds (CUDA / Vulkan / Metal)
- Expose per-token
logits(needs a small upstream addition)
License
MIT — see LICENSE. Bindings for the upstream
privacy-filter.cpp project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.12+, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1614da06bf90d0a001a843479d472e82aeabe1d3707759a326209fa3be1c290d
|
|
| MD5 |
4e17c378a0136c24a870f2462de328a9
|
|
| BLAKE2b-256 |
b77347d85ffff89f55bc9d130325e00a6b29e97c83efdee033c7f2567641f300
|
Provenance
The following attestation bundles were made for pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
wheels.yml on solipsy/pii-inference
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
1614da06bf90d0a001a843479d472e82aeabe1d3707759a326209fa3be1c290d - Sigstore transparency entry: 2036496287
- Sigstore integration time:
-
Permalink:
solipsy/pii-inference@195725bb5468a87e2592182041d2f66701e98eea -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/solipsy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@195725bb5468a87e2592182041d2f66701e98eea -
Trigger Event:
push
-
Statement type:
File details
Details for the file pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.12+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2359dfeea1c3b84d5e8c46bda19a90d84e706a0071acf551f83dc3d38824b05c
|
|
| MD5 |
1459229084a31ca1fcb6365cb1dbc0a8
|
|
| BLAKE2b-256 |
86f380b3612db733e4d5936bc8264c6acf8ee5b1b6cba6fb2402f7f5eb1553c6
|
Provenance
The following attestation bundles were made for pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl:
Publisher:
wheels.yml on solipsy/pii-inference
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl -
Subject digest:
2359dfeea1c3b84d5e8c46bda19a90d84e706a0071acf551f83dc3d38824b05c - Sigstore transparency entry: 2036496687
- Sigstore integration time:
-
Permalink:
solipsy/pii-inference@195725bb5468a87e2592182041d2f66701e98eea -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/solipsy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@195725bb5468a87e2592182041d2f66701e98eea -
Trigger Event:
push
-
Statement type: