The inference boundary layer between your data and outbound AI requests

These details have not been verified by PyPI

Project description

datagate-llm

The inference boundary layer between your data and outbound AI requests.

Scan text for sensitive data — PII, secrets, credentials, and sector-specific identifiers — before it leaves your system and reaches an LLM API.

The Problem

In 2023, Samsung engineers accidentally leaked proprietary source code and internal meeting notes by pasting them into ChatGPT. The data was retained and potentially used for training. This is not a hypothetical risk — it is the default behavior when you send unrestricted text to an external AI model.

datagate-llm is the layer you put in front of that API call. It checks what you are about to send, tells you what it found, and lets you decide: flag it, redact it, or block it.

Install

pip install datagate-llm

Zero dependencies. Python 3.9+. Works offline.

Quickstart

from datagate_llm import scan

# Basic scan
result = scan("Contact Alice at alice@company.com or call 415-555-0192")
print(result["safe"])        # False
print(result["risk_score"])  # 0.8 (or similar)
print(result["findings"])    # list of matched spans

# Redact mode — replace PII before sending to an LLM
result = scan(
    "My SSN is 123-45-6789 and card number 4111111111111111",
    mode="redact"
)
print(result["redacted_text"])
# "My SSN is [REDACTED:universal/ssn] and card number [REDACTED:universal/credit_card]"

# Block mode — hard stop on high-risk content
result = scan("AKIAIOSFODNN7EXAMPLEKEY", sectors=["technology"], mode="block")
if result["action"] == "block":
    raise ValueError("Refusing to send credentials to LLM")

# Multi-sector scan
result = scan(
    "Patient MRN: AB12345, account 123456789012",
    sectors=["healthcare", "finance"]
)
for finding in result["findings"]:
    print(finding["rule_id"], finding["severity"], finding["confidence"])

What It Detects

Category	Rule ID	Severity
Email address	`universal/email`	high
US phone number	`universal/phone_us`	medium
Social Security Number	`universal/ssn`	critical
Credit card number	`universal/credit_card`	critical
IP address	`universal/ip_address`	low
AWS access key	`technology/aws_access_key`	critical
OpenAI API key	`technology/openai_key`	critical
Anthropic API key	`technology/anthropic_key`	critical
GitHub token	`technology/github_token`	critical
Stripe key	`technology/stripe_key`	critical
JWT token	`technology/jwt_token`	high
Private key (PEM)	`technology/private_key`	critical
Database connection string	`technology/connection_string`	critical
NPI number	`healthcare/npi_number`	high
ICD-10 diagnosis code	`healthcare/icd10_code`	medium
Insurance member ID	`healthcare/insurance_member_id`	high
Medical record number	`healthcare/medical_record_number`	critical
DEA number	`healthcare/dea_number`	critical
IBAN	`finance/iban`	high
SWIFT/BIC code	`finance/swift_bic`	medium
ABA routing number	`finance/routing_number`	high
Bank account number	`finance/bank_account`	high
Tax ID / EIN	`finance/tax_id_ein`	critical
Bitcoin address	`finance/crypto_btc`	medium
Ethereum address	`finance/crypto_eth`	medium

How It Works

text input
    │
    ▼
tokenize()          ← NFKC normalization, zero-width char removal
    │
    ▼
match()             ← regex scan against compiled rule set
    │
    ▼
score()             ← context-aware confidence (boost / suppress words)
    │
    ▼
resolve()           ← remove overlapping spans, keep highest confidence
    │
    ▼
aggregate()         ← single risk_score in [0.0, 1.0]
    │
    ▼
build_result()      ← assemble final dict with action, findings, fingerprint

Every step is a pure function. No network calls. No disk writes. No global state except the in-process rule cache.

Scan Modes

Mode	When risk > 0	Use case
`flag` (default)	`action = "flag"`	Log and review before sending
`redact`	`action = "flag"`, spans replaced in `redacted_text`	Strip PII, send cleaned text
`block`	`action = "block"`	Hard stop — raise an error upstream

Honest Limits

Regex-only: datagate-llm uses deterministic pattern matching. It will not catch PII embedded in obfuscated prose, paraphrased content, or novel formats it has never seen.
English-centric: Phone and ID patterns currently target US formats. International variants may be missed.
No semantic understanding: "The patient's temperature was 98.6" will not be flagged as health data because there is no pattern for it. Semantic scanning requires the optional onnxruntime layer (not yet released).
False positives are possible: Short patterns like SWIFT codes can match arbitrary uppercase strings. Use context.suppress words in your rule JSON to reduce noise.
Not a compliance tool: Passing a scan does not mean a document is HIPAA, GDPR, or PCI-DSS compliant. Use this as one layer of defense, not the only one.

Contributing

See CONTRIBUTING.md. In short: add rules in JSON, add tests, open a PR.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagate_llm-0.1.0.tar.gz (13.8 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datagate_llm-0.1.0-py3-none-any.whl (11.3 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file datagate_llm-0.1.0.tar.gz.

File metadata

Download URL: datagate_llm-0.1.0.tar.gz
Upload date: Jun 27, 2026
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datagate_llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8372c3ed289d11d10c48282dda71a028025f46b52b615449e674c92be63c191c`
MD5	`b1d269934f66cc3931c7722172b1b798`
BLAKE2b-256	`57f30a5963cde7a0f7e2e9f24e8e6439db712f31fe626f77412eb0ec509dbd17`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datagate_llm-0.1.0.tar.gz:

Publisher: publish.yml on PreethiAndichamy342/datagate-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datagate_llm-0.1.0.tar.gz
- Subject digest: 8372c3ed289d11d10c48282dda71a028025f46b52b615449e674c92be63c191c
- Sigstore transparency entry: 1973930973
- Sigstore integration time: Jun 27, 2026
Source repository:
- Permalink: PreethiAndichamy342/datagate-llm@f2cc4eb2bc4b8c1956c1ede878313a2e25e99bef
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/PreethiAndichamy342
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f2cc4eb2bc4b8c1956c1ede878313a2e25e99bef
- Trigger Event: release

File details

Details for the file datagate_llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: datagate_llm-0.1.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 11.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datagate_llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ad3955409838b45201e1a54c8554c837d3e10ec7838436b7a23dc1d2652b923`
MD5	`5e7c7a3fcdd94af72cc8512c51d362c8`
BLAKE2b-256	`a220be4ed04a399cbd6e18a53d9df51f204588085cd73b1ece9798079b6dd2e0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datagate_llm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on PreethiAndichamy342/datagate-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datagate_llm-0.1.0-py3-none-any.whl
- Subject digest: 2ad3955409838b45201e1a54c8554c837d3e10ec7838436b7a23dc1d2652b923
- Sigstore transparency entry: 1973931108
- Sigstore integration time: Jun 27, 2026
Source repository:
- Permalink: PreethiAndichamy342/datagate-llm@f2cc4eb2bc4b8c1956c1ede878313a2e25e99bef
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/PreethiAndichamy342
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f2cc4eb2bc4b8c1956c1ede878313a2e25e99bef
- Trigger Event: release

datagate-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

datagate-llm

The Problem

Install

Quickstart

What It Detects

How It Works

Scan Modes

Honest Limits

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance