Skip to main content

Local-first AI security gateway: PII redaction, token compression, and audit ledger

Project description

Airlock

CI Crates.io PyPI License Rust

Local-first AI security gateway. Redact PII, cut LLM token costs, and maintain a full audit trail — without sending a single byte to the cloud.

pip install airlock-rs       # Python SDK
cargo install airlock-rs     # CLI

The Problem

You have logs, support tickets, or user data you want to send to an AI model. But that data contains names, emails, phone numbers, and credit card numbers. Sending it to OpenAI or Claude as-is creates GDPR, HIPAA, and SOC2 exposure.

Airlock sits between your data and the model. It scrubs PII, compresses the JSON to cut token costs, and writes an auditable record of what was processed — all locally, zero network calls.


Python SDK

import airlock, json

records = [
    {"user": "Alice Johnson", "email": "alice@corp.com", "action": "login",  "ip": "192.168.1.1"},
    {"user": "Bob Smith",     "email": "bob@corp.com",   "action": "logout", "ip": "10.0.0.2"},
]

result = airlock.scrub(json.dumps(records), salt="my-org-secret")

print(result.pii_count)      # 6
print(result.risk_score)     # 75.0
print(result.reduction_pct)  # 38.4

for swap in result.swaps:
    print(f"{swap['original']}{swap['synthetic']}")
# alice@corp.com    → alias_a@redacted.dev
# Alice Johnson     → User_A
# 192.168.1.1       → IP_A
# bob@corp.com      → alias_b@redacted.dev
# Bob Smith         → User_B
# 10.0.0.2          → IP_B

# Feed the clean JSON directly to your LLM
response = openai_client.chat(messages=[{"role": "user", "content": result.json_str}])

API

# Scrub PII + compress
result = airlock.scrub(
    json_str,
    salt=None,          # str  — secret for stable cross-run aliases
    db_path=None,       # str  — path to SQLite audit ledger
    # Toggle individual entity types (all True by default):
    names=True,
    emails=True,
    phones=True,
    ssns=True,
    credit_cards=True,
    ip_addresses=True,
    jwt_tokens=True,
    aws_keys=True,
    env_secrets=True,
)
result.json_str       # str   — scrubbed, compressed JSON
result.pii_count      # int   — total PII instances found
result.risk_score     # float — 0–100 density score
result.reduction_pct  # float — token reduction percentage
result.swaps          # list[dict] — [{original, synthetic, entity_type}]
result.ledger_id      # int | None — SQLite row ID if db_path was set

# Compress only (no PII detection)
result = airlock.compress(json_str)
result.json_str
result.tokens_before
result.tokens_after
result.reduction_pct
result.entry_count

CLI

# Scrub PII from a JSON or NDJSON file
airlock scrub logs.json --diff

# Stable cross-run aliases (same person → same alias across files)
airlock scrub logs.json --salt my-secret --diff > clean.json

# Compress only
airlock compress logs.json

# View audit history
airlock ledger --last 20

All flags

airlock scrub <FILE>
  --salt <SALT>       Cross-run stable aliases via SHA-256(salt ‖ entity ‖ token)
  --diff              Print every original → alias swap to stderr
  --db <FILE>         SQLite ledger path [default: airlock_ledger.db]
  --output <FORMAT>   pretty (default) | compact
  -v / -vv / -vvv     Verbosity (info / debug / trace)

airlock compress <FILE>
  --output <FORMAT>   pretty | compact

airlock ledger
  --last <N>          Show N most recent entries [default: 10]
  --db <FILE>         SQLite ledger path

What Gets Redacted

PII Type Standard Example Input Alias
Full name Alice Johnson User_A
Email RFC 5322 alice@corp.com alias_a@redacted.dev
Phone NANP + E.164 555-867-5309, +44 7911 123456 Phone_A
SSN SSA format 123-45-6789 SSN_A
Credit card ISO/IEC 7812 (Luhn) 4111 1111 1111 1111 Card_A
IPv4 address RFC 791 192.168.1.100 IP_A
JWT token eyJhbGci... Token_A
AWS access key AKIAIOSFODNN7EXAMPLE AwsKey_A
Env secret API_KEY=sk-abc123API_KEY=Secret_A Secret_A

All 9 types are enabled by default and individually toggleable. Aliases are consistent within a run — User_A always refers to the same person, so AI models can still reason about behavior patterns without seeing real identities.


Config File

Drop a .airlock.toml in your project directory to set defaults:

[scrub]
salt = "my-org-secret"          # stable cross-run aliases
db   = "~/.airlock/ledger.db"   # shared ledger location

[redact]
ip_addresses = false            # keep IPs as-is

[[rules]]
name         = "EmployeeId"
pattern      = "EMP-\\d{5}"
alias_prefix = "Emp"            # EMP-00042 → Emp_A

CLI flags always take precedence over the config file.


Token Compression

Repeated JSON keys are expensive for LLMs. Airlock extracts them into a single schema header:

Before (keys repeated on every row):

[
  {"timestamp": "2026-01-01T10:00:00Z", "user": "User_A", "action": "login"},
  {"timestamp": "2026-01-01T10:01:00Z", "user": "User_B", "action": "logout"}
]

After (keys extracted once, 43% fewer tokens):

{
  "__airlock_schema": ["timestamp", "user", "action"],
  "__airlock_rows": [
    ["2026-01-01T10:00:00Z", "User_A", "login"],
    ["2026-01-01T10:01:00Z", "User_B", "logout"]
  ],
  "__airlock_meta": { "tokens_before": 120, "tokens_after": 68, "reduction_pct": "43.3" }
}

Typical savings: 20–60% on structured log data.


Cross-Run Stable Aliases (--salt)

By default, aliases are assigned in encounter order: the first name seen becomes User_A, the second User_B. This is consistent within a run but may differ between runs.

Pass --salt <secret> to enable cross-run stability: every alias is derived from SHA-256(salt ‖ entity_type ‖ token) fed into a ChaCha8Rng. The same real identity always produces the same alias, regardless of which file is processed or what order records appear in.

airlock scrub january.json --salt prod-2026 > jan_clean.json
airlock scrub february.json --salt prod-2026 > feb_clean.json
# "Alice Johnson" → "User_GKQT" in both files

Keep your salt secret. It is the only thing preventing alias reversal.


Audit Ledger

Every airlock scrub run writes a row to a local SQLite database:

  ╔══════╦════════════════════╦══════════╦═════════╦══════════╦══════════════════╗
  ║  ID  ║  Timestamp         ║ Entries  ║  PII    ║  Risk    ║  Compression     ║
  ╠══════╬════════════════════╬══════════╬═════════╬══════════╬══════════════════╣
  ║    1 ║ 2026-01-15T10:00   ║      500 ║      84 ║  42/100  ║          38.4%   ║
  ║    2 ║ 2026-01-15T14:22   ║     1200 ║     203 ║  71/100  ║          51.2%   ║
  ╚══════╩════════════════════╩══════════╩═════════╩══════════╩══════════════════╝

The ledger stores counts and statistics only — never the original PII values.


Security Guarantees

Zero network calls Airlock never opens a socket. All processing is in-process on your machine.
No PII on disk The ledger stores counts and risk scores only — never names, emails, or the aliases themselves.
Alias irreversibility In seeded mode, reversing an alias requires knowledge of your salt.
Deterministic The same input + same salt always produces the same output. Fully auditable.
No third-party AI The NER engine runs locally via compiled regex patterns. Your data never touches an external API.

Installation

Python (recommended)

pip install airlock-rs

Requires Python 3.8+. Pre-built wheels for Linux, macOS, and Windows.

CLI — Cargo

cargo install airlock-rs

CLI — pre-built binary

Download from GitHub Releases. Single static binary, no runtime dependencies.


Building from Source

git clone https://github.com/OxideOps/airlock
cd airlock

# Run tests
cargo test --all-features

# Build release binary
cargo build --release

# Build Python wheel (requires maturin: pip install maturin)
maturin develop --features python

# Lint
cargo clippy --all-features -- -D warnings

Architecture

src/
├── lib.rs       — Library entry point; Python module registration
├── main.rs      — CLI (clap): scrub / compress / ledger commands
├── types.rs     — EntityType, PiiSpan, SwapRecord, LedgerEntry
├── ner.rs       — Ner trait + RegexNer (9 built-in patterns + custom rules)
├── scrub.rs     — Pipeline: NER → alias → redact → compress → ledger
├── compress.rs  — Token-Tax compression (schema extraction + row compaction)
├── ledger.rs    — SQLite Risk Ledger (rusqlite, bundled)
└── config.rs    — .airlock.toml loader

Performance

  • Parallel NER scan and alias application via Rayon
  • Static regexes compiled once per process via OnceLock
  • Zero-copy span detection using byte offsets
  • ~3.5 MB statically-linked binary with no runtime dependencies

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

airlock_rs-0.3.0-cp313-cp313-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.13Windows x86-64

airlock_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

airlock_rs-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file airlock_rs-0.3.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: airlock_rs-0.3.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for airlock_rs-0.3.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4898e7d3643edf471168fd42eca3530d8d1c1a5d58d6ceb0e90eeb3581cdc37e
MD5 5b5e42fc599e9e27538882191be2d973
BLAKE2b-256 3d58ddfb06bf200104a7d3f247129c0120c0aa86249d03fdd337da4fab698226

See more details on using hashes here.

Provenance

The following attestation bundles were made for airlock_rs-0.3.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on OxideOps/airlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airlock_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for airlock_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 57699f10b30553a40bee3fa58251793c2ef64de603e84f4dd61b291d41165a91
MD5 28026194f4144a27b72be1cca7bcf91a
BLAKE2b-256 206994a2051edebda91657928c582971c31181f462e4df023f1d272181d80616

See more details on using hashes here.

Provenance

The following attestation bundles were made for airlock_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on OxideOps/airlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airlock_rs-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for airlock_rs-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 557f08535c80ceffcd6d5e1ef4cd0ca74022cdb8604281492c08e166247c7ca7
MD5 e01ed9e827b637a1a9187c0fe56522e5
BLAKE2b-256 da6f6c55b1950ef492d6e650fae28fb2c04a728cc3bb9260e62733092aab94f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for airlock_rs-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on OxideOps/airlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page