Skip to main content

Python bindings for privacy-filter.cpp — fast GGML PII/NER token classification

Project description

privacy-filter

Fast PII/NER detection for Python — thin bindings over privacy-filter.cpp, a minimal GGML inference engine for OpenAI's privacy-filter token-classification models.

Wheels Python License: MIT Platforms

Detect names, emails, phone numbers and other PII with precise UTF-8 byte offsets, far faster than a stock Hugging Face Transformers pipeline. The upstream pf library and ggml are statically linked into a single compiled extension via nanobind + scikit-build-core, so an installed wheel is fully self-contained.

Features

  • 🔎 Entity spans with byte offsets — every detection carries start/end/score/label.
  • 🧩 Merging & dedup helpers — fold token-level spans into PERSON / MONEY / ADDRESS and collapse repeats.
  • 🌍 Multilingual — works on non-ASCII text, offsets stay correct.
  • 📦 Self-contained wheels — one abi3 wheel per platform, no external libggml to locate.
  • Releases the GIL during inference; load a model once and reuse it.

Scope: this release is CPU-only and targets CPython 3.12+. Prebuilt wheels are published for Linux x86_64 (manylinux) and macOS arm64 (Apple Silicon).

Installation

From PyPI (coming soon)

🚧 Not yet published. Once released, install with:

pip install pii-inference

From a prebuilt wheel (GitHub Releases)

Each release attaches abi3 wheels for Linux x86_64 and macOS arm64. Grab the one for your platform:

pip install https://github.com/solipsy/pii-inference/releases/download/v0.1.0/privacy_filter-0.1.0-cp312-abi3-macosx_11_0_arm64.whl

From source

Building from source compiles the C++ engine, so you need a C++17 compiler and CMake ≥ 3.21 (CMake/Ninja are fetched automatically by the build). The upstream engine and its nested ggml are git submodules:

git clone --recursive https://github.com/solipsy/pii-inference.git
cd pii-inference
pip install .          # or: uv sync && uv pip install -e . --no-build-isolation

Already cloned without --recursive? Run git submodule update --init --recursive.

Quick start

You supply a GGUF model at runtime (it is not bundled — see Getting a model):

from privacy_filter import PrivacyFilter

text = "Contact Jane Doe at jane.doe@acme.com or +1-202-555-0142."

with PrivacyFilter("model.gguf", device="cpu", n_threads=0) as pf:
    for e in pf.classify(text, threshold=0.5):
        print(f"{e.label:12} {e.score:.3f}  {e.text(text)!r}")
FIRSTNAME    0.589  'Jane'
LASTNAME     0.855  'Doe'
EMAIL        0.991  'jane.doe@acme.com'
PHONE        0.987  '+1-202-555-0142'

Entity exposes .start/.end (UTF-8 byte offsets), .score, .label, and .text(source). There are also merge_entities() / dedupe_entities() post-processing helpers, tokenization, long-document windowing, and device selection.

Documentation

Full usage and API reference live in docs/privacy-filter.md:

Topic
Quick start · Detecting entities classify, thresholds
Byte offsets & redaction non-ASCII-safe slicing
Merging & deduplication PERSON / MONEY / ADDRESS, dedup
Tokenization · Windowing lower-level access
API reference PrivacyFilter, Entity, Span, functions
Building wheels cibuildwheel, CI

Testing

uv run pytest                                   # model-free tests
PF_TEST_MODEL=/path/to/model.gguf uv run pytest # full suite incl. classify/tokenize

Maintainers: release/publishing steps live in PUBLISHING.md.

Roadmap

  • Publish to PyPI
  • Windows wheel
  • Optional GPU builds (CUDA / Vulkan / Metal)
  • Expose per-token logits (needs a small upstream addition)

License

MIT — see LICENSE. Bindings for the upstream privacy-filter.cpp project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

File details

Details for the file pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1614da06bf90d0a001a843479d472e82aeabe1d3707759a326209fa3be1c290d
MD5 4e17c378a0136c24a870f2462de328a9
BLAKE2b-256 b77347d85ffff89f55bc9d130325e00a6b29e97c83efdee033c7f2567641f300

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_inference-0.1.1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: wheels.yml on solipsy/pii-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2359dfeea1c3b84d5e8c46bda19a90d84e706a0071acf551f83dc3d38824b05c
MD5 1459229084a31ca1fcb6365cb1dbc0a8
BLAKE2b-256 86f380b3612db733e4d5936bc8264c6acf8ee5b1b6cba6fb2402f7f5eb1553c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_inference-0.1.1-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: wheels.yml on solipsy/pii-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page