Skip to main content

PII round-trip for LLM APIs: anonymize before the request, de-anonymize the stream on the way back.

Project description

gheim

gheim (Python). PII round-trip for LLM APIs.

PyPI Python 3.11+ Apache 2.0

Detect PII in text, substitute it with stable sentinels (<PERSON_1>, <EMAIL_2>, ...), send the redacted text to any LLM, and restore the originals on the way back, including in streamed responses. The package is framework-agnostic and ships a drop-in openai client wrapper for zero-effort integration.

See the monorepo README for the cross-language overview and architecture.

Install

uv add gheim                          # core: pairs with a RemoteDetector or GHEIM_API_KEY
uv add "gheim[local]"                 # + torch and transformers for on-device detection
uv add "gheim[openai]"                # + drop-in OpenAI client
uv add "gheim[local,openai]"          # both

Model choice

LocalDetector runs a token-classification model in process. Two models are recommended depending on the deployment context. Both implement the same 33-class BIOES output schema and are interchangeable in LocalDetector and the rest of the API.

Model Best for Parameters Notes
joelbarmettler/gheim-ch-560m Swiss-market text (de_CH, fr_CH, it_CH, rm, en) with CH-format account numbers (IBAN, AHV, VAT-CHE) 560M Apache 2.0. Test F1 = 0.916 on Swiss text.
openai/privacy-filter English-first or general use, long-context (up to 128k tokens) 1.4B (50M active, MoE) Apache 2.0. Wider language coverage, larger weights.
from gheim import LocalDetector

# Recommended for Swiss-market text:
det = LocalDetector(model_id="joelbarmettler/gheim-ch-560m")

# Alternative for English or general use:
det = LocalDetector(model_id="openai/privacy-filter")

The package default model_id is openai/privacy-filter. Set GHEIM_DEFAULT_MODEL=joelbarmettler/gheim-ch-560m in the environment, or pass model_id explicitly, to opt into the Swiss model.

Drop-in OpenAI client

from gheim.openai import OpenAI

client = OpenAI()  # same constructor args as openai.OpenAI
r = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hi, my name is Joel"}],
)
# r.choices[0].message.content contains "Joel".
# OpenAI only ever saw "<PERSON_1>".

Streaming:

stream = client.chat.completions.create(..., stream=True)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Async:

from gheim.openai import AsyncOpenAI
client = AsyncOpenAI()
r = await client.chat.completions.create(...)

Per-call overrides:

from gheim import Session
session = Session()  # reuse across calls for multi-turn coherent sentinels
r = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    gheim_session=session,    # or gheim_detector=...
)

Framework-agnostic

from gheim import Session, LocalDetector, anonymize_text, deanonymize_text

session = Session(detector=LocalDetector(model_id="joelbarmettler/gheim-ch-560m"))
clean = anonymize_text("Hi, my name is Joel", session)
# ... call any LLM with clean ...
final = deanonymize_text(response_text, session)

Streaming deanonymizer:

from gheim import deanonymize_stream
for chunk in deanonymize_stream(my_chunk_iterator, session):
    print(chunk, end="", flush=True)

Chat-message helpers:

from gheim import anonymize_messages

redacted = anonymize_messages(messages, session)  # preserves role, name, tool_call_id

Wrapped endpoints

The drop-in OpenAI / AsyncOpenAI clients automatically protect every text-carrying endpoint: chat.completions, responses, completions (legacy), embeddings, moderations, audio.speech, audio.transcriptions, audio.translations, images.generate, images.edit. Tool-call arguments and SSE delta chunks are restored on the way back. See the monorepo README for the full coverage matrix and the embeddings caveat.

Strict mode

gheim_strict=True (default) raises RuntimeError if you call an unwrapped endpoint (beta.assistants, batches, files, uploads, fine_tuning, vector_stores). The error message names client.raw.<path> as the documented escape hatch.

client = OpenAI(gheim_strict=False)  # downgrade to one-time warnings
client.raw.beta.assistants.create(...)  # always works regardless of strict mode

Detector backends

from gheim import LocalDetector, RemoteDetector, default_detector

# Local inference. Weights download to the HF cache on first use.
det = LocalDetector(model_id="joelbarmettler/gheim-ch-560m",
                    device="auto", dtype=torch.bfloat16)

# Remote inference against your own gheim-server or api.gheim.ch.
det = RemoteDetector(base_url="http://your-host:8080", api_key="...")

# default_detector() picks remote if GHEIM_API_KEY is set, else local.
det = default_detector()

Composite detector (recommended for production)

For categories where structure is verifiable by checksum (CH-IBAN, AHV, VAT-CHE, credit cards, common token formats) the package ships a regex catalogue under gheim.detectors.composite that pairs with the model detector. The composite detector applies regex first, masks matched spans, then runs the model on the remainder. This pushes effective recall on account_number, private_phone, and private_url close to 1.0 with high precision; the underlying ML model handles person names, addresses, and dates.

License

Apache 2.0. Bundled model weights are inherited from the upstream license of the model you select.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gheim-0.1.0.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gheim-0.1.0-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file gheim-0.1.0.tar.gz.

File metadata

  • Download URL: gheim-0.1.0.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gheim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a3b44067d1da7e43e4e5d31ba7601bd9665fd59093f417b37a234515cd2f6a24
MD5 4cae58bac1a0340f8ab7497d471df53e
BLAKE2b-256 c3a4b07a474e25fab44713bfb806f59c4a9cb52b0e3d559ebba1a9f96343f866

See more details on using hashes here.

File details

Details for the file gheim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gheim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gheim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8dafa8d31ea258215db30476a09e921d210ddd84127490dc53d6c7ad871aadd6
MD5 00d3e74057a7266d6b2d5113ad696b8b
BLAKE2b-256 6fce6ad94475146d1d2cf7bdbaa6f303e4c79b87e47af7d13e667a7b7616dbc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page