gheim

PII round-trip for LLM APIs: anonymize before the request, de-anonymize the stream on the way back.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

joelbarmettlerUZH

These details have not been verified by PyPI

Project links

Project description

gheim

gheim (Python). PII round-trip for LLM APIs.

Detect PII in text, substitute it with stable sentinels (<PERSON_1>, <EMAIL_2>, ...), send the redacted text to any LLM, and restore the originals on the way back, including in streamed responses. The package is framework-agnostic and ships a drop-in openai client wrapper for zero-effort integration.

See the monorepo README for the cross-language overview and architecture.

Install

uv add gheim                          # core: pairs with a RemoteDetector or GHEIM_API_KEY
uv add "gheim[local]"                 # + torch and transformers for on-device detection
uv add "gheim[openai]"                # + drop-in OpenAI client
uv add "gheim[local,openai]"          # both

Model choice

LocalDetector runs a token-classification model in process. The package's default model is joelbarmettler/gheim-ch-560m — a 560M xlm-roberta-large fine-tune optimised for Swiss-market PII (strict-span F1 0.916 on Swiss text, see MODEL_CARD.md). Any HuggingFace token-classification model that emits the same 33-class BIOES schema can be substituted via the model_id constructor arg.

Model	Best for	Parameters	Notes
`joelbarmettler/gheim-ch-560m` (default)	Swiss-market text (de_CH, fr_CH, it_CH, rm, en) with CH-format account numbers (IBAN, AHV, VAT-CHE)	560M	Apache 2.0. Test F1 0.916.
`openai/privacy-filter`	English-first or general use, long-context (up to 128k tokens)	1.4B (50M active, MoE)	Apache 2.0. Wider language coverage, larger weights.

from gheim import LocalDetector

# Default — Swiss-tuned, 560M:
det = LocalDetector()

# Alternative for English or general use:
det = LocalDetector(model_id="openai/privacy-filter")

Drop-in OpenAI client

from gheim.openai import OpenAI

client = OpenAI()  # same constructor args as openai.OpenAI
r = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hi, my name is Joel"}],
)
# r.choices[0].message.content contains "Joel".
# OpenAI only ever saw "<PERSON_1>".

Custom endpoint or key (e.g. OpenRouter, local vLLM):

client = OpenAI(api_key="sk-or-...", base_url="https://openrouter.ai/api/v1")

Streaming:

stream = client.chat.completions.create(..., stream=True)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Async:

from gheim.openai import AsyncOpenAI
client = AsyncOpenAI()
r = await client.chat.completions.create(...)

Per-call overrides:

from gheim import Session
session = Session()  # reuse across calls for multi-turn coherent sentinels
r = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    gheim_session=session,    # or gheim_detector=...
)

Framework-agnostic

from gheim import Session, LocalDetector, anonymize_text, deanonymize_text

session = Session(detector=LocalDetector())  # gheim-ch-560m by default
clean = anonymize_text("Hi, my name is Joel", session)
# ... call any LLM with clean ...
final = deanonymize_text(response_text, session)

Streaming deanonymizer:

from gheim import deanonymize_stream
for chunk in deanonymize_stream(my_chunk_iterator, session):
    print(chunk, end="", flush=True)

Chat-message helpers:

from gheim import anonymize_messages

redacted = anonymize_messages(messages, session)  # preserves role, name, tool_call_id

Wrapped endpoints

The drop-in OpenAI / AsyncOpenAI clients automatically protect every text-carrying endpoint: chat.completions, responses, completions (legacy), embeddings, moderations, audio.speech, audio.transcriptions, audio.translations, images.generate, images.edit. Tool-call arguments and SSE delta chunks are restored on the way back. See the monorepo README for the full coverage matrix and the embeddings caveat.

Strict mode

gheim_strict=True (default) raises RuntimeError if you call an unwrapped endpoint (beta.assistants, batches, files, uploads, fine_tuning, vector_stores). The error message names client.raw.<path> as the documented escape hatch.

client = OpenAI(gheim_strict=False)  # downgrade to one-time warnings
client.raw.beta.assistants.create(...)  # always works regardless of strict mode

Detector backends

import torch
from gheim import LocalDetector, RemoteDetector, default_detector

# Local inference. Weights download to the HF cache on first use.
# `model_id` defaults to "joelbarmettler/gheim-ch-560m"; pass
# `dtype=torch.bfloat16` for half-precision GPU inference.
det = LocalDetector(device="auto", dtype=torch.bfloat16)

# Remote inference against your own gheim-server or api.gheim.ch.
det = RemoteDetector(base_url="http://your-host:8080", api_key="...")

# default_detector() picks remote if GHEIM_API_KEY is set, else local.
det = default_detector()

Composite detector (recommended for production)

For categories where structure is verifiable by checksum (CH-IBAN, AHV, VAT-CHE, credit cards, common token formats) the package ships a regex catalogue under gheim.detectors.composite that pairs with the model detector. The composite detector applies regex first, masks matched spans, then runs the model on the remainder. This pushes effective recall on account_number, private_phone, and private_url close to 1.0 with high precision; the underlying ML model handles person names, addresses, and dates.

License

Apache 2.0. Bundled model weights are inherited from the upstream license of the model you select.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

joelbarmettlerUZH

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

May 12, 2026

0.1.5

May 12, 2026

This version

0.1.4

May 11, 2026

0.1.3

May 11, 2026

0.1.2

May 11, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gheim-0.1.4.tar.gz (35.7 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gheim-0.1.4-py3-none-any.whl (47.0 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file gheim-0.1.4.tar.gz.

File metadata

Download URL: gheim-0.1.4.tar.gz
Upload date: May 11, 2026
Size: 35.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gheim-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`0e7bb3ee5c3b90e48e0effe0d6295b1b50dc30e39218bc58da3c1dd25f7ee6d3`
MD5	`45bf79250cb8da971dd35ef648c9d947`
BLAKE2b-256	`c4f61ae44655ae6d2be5481bd7c66150b035e5de64a65752a5483c94288bab99`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gheim-0.1.4.tar.gz:

Publisher: release.yml on joelbarmettlerUZH/gheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gheim-0.1.4.tar.gz
- Subject digest: 0e7bb3ee5c3b90e48e0effe0d6295b1b50dc30e39218bc58da3c1dd25f7ee6d3
- Sigstore transparency entry: 1511228607
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: joelbarmettlerUZH/gheim@2a38cc9b29786b9df3c259435b50fdb8fa9e2738
- Branch / Tag: refs/heads/main
- Owner: https://github.com/joelbarmettlerUZH
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2a38cc9b29786b9df3c259435b50fdb8fa9e2738
- Trigger Event: push

File details

Details for the file gheim-0.1.4-py3-none-any.whl.

File metadata

Download URL: gheim-0.1.4-py3-none-any.whl
Upload date: May 11, 2026
Size: 47.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gheim-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`597d6b0824b7d8fc549d23d7cbf69305b4c4b4e769bd7a15ce5ccf0d1ae0a5b1`
MD5	`49e624ff8e0beb63b070439a7ef470d7`
BLAKE2b-256	`bf9631fbbd75419c2e4018d8719f41d159ede90236eba864cb7babab7fbe4c4d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gheim-0.1.4-py3-none-any.whl:

Publisher: release.yml on joelbarmettlerUZH/gheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gheim-0.1.4-py3-none-any.whl
- Subject digest: 597d6b0824b7d8fc549d23d7cbf69305b4c4b4e769bd7a15ce5ccf0d1ae0a5b1
- Sigstore transparency entry: 1511229620
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: joelbarmettlerUZH/gheim@2a38cc9b29786b9df3c259435b50fdb8fa9e2738
- Branch / Tag: refs/heads/main
- Owner: https://github.com/joelbarmettlerUZH
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2a38cc9b29786b9df3c259435b50fdb8fa9e2738
- Trigger Event: push

gheim 0.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Model choice

Drop-in OpenAI client

Framework-agnostic

Wrapped endpoints

Strict mode

Detector backends

Composite detector (recommended for production)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance