Skip to main content

Local-first AI security gateway: PII redaction, token compression, and audit ledger

Project description

Airlock

CI Crates.io PyPI License Rust

Local-first AI security gateway. Redact PII, cut LLM token costs, and maintain a full audit trail — without sending a single byte to the cloud.

pip install airlock-rs       # Python SDK
cargo install airlock-rs     # CLI

The Problem

You have logs, support tickets, or user data you want to send to an AI model. But that data contains names, emails, phone numbers, and credit card numbers. Sending it to OpenAI or Claude as-is creates GDPR, HIPAA, and SOC2 exposure.

Airlock sits between your data and the model. It scrubs PII, compresses the JSON to cut token costs, and writes an auditable record of what was processed — all locally, zero network calls.


Python SDK

import airlock, json

records = [
    {"user": "Alice Johnson", "email": "alice@corp.com", "action": "login",  "ip": "192.168.1.1"},
    {"user": "Bob Smith",     "email": "bob@corp.com",   "action": "logout", "ip": "10.0.0.2"},
]

result = airlock.scrub(json.dumps(records), salt="my-org-secret")

print(result.pii_count)      # 6
print(result.risk_score)     # 75.0
print(result.reduction_pct)  # 38.4

for swap in result.swaps:
    print(f"{swap['original']}{swap['synthetic']}")
# alice@corp.com    → alias_a@redacted.dev
# Alice Johnson     → User_A
# 192.168.1.1       → IP_A
# bob@corp.com      → alias_b@redacted.dev
# Bob Smith         → User_B
# 10.0.0.2          → IP_B

# Feed the clean JSON directly to your LLM
response = openai_client.chat(messages=[{"role": "user", "content": result.json_str}])

API

# Scrub PII + compress
result = airlock.scrub(
    json_str,
    salt=None,          # str  — secret for stable cross-run aliases
    db_path=None,       # str  — path to SQLite audit ledger
    # Toggle individual entity types (all True by default):
    names=True,
    emails=True,
    phones=True,
    ssns=True,
    credit_cards=True,
    ip_addresses=True,
    jwt_tokens=True,
    aws_keys=True,
    env_secrets=True,
)
result.json_str       # str   — scrubbed, compressed JSON
result.pii_count      # int   — total PII instances found
result.risk_score     # float — 0–100 density score
result.reduction_pct  # float — token reduction percentage
result.swaps          # list[dict] — [{original, synthetic, entity_type}]
result.ledger_id      # int | None — SQLite row ID if db_path was set

# Compress only (no PII detection)
result = airlock.compress(json_str)
result.json_str
result.tokens_before
result.tokens_after
result.reduction_pct
result.entry_count

CLI

# Scrub PII from a JSON or NDJSON file
airlock scrub logs.json --diff

# Stable cross-run aliases (same person → same alias across files)
airlock scrub logs.json --salt my-secret --diff > clean.json

# Compress only
airlock compress logs.json

# View audit history
airlock ledger --last 20

All flags

airlock scrub <FILE>
  --salt <SALT>       Cross-run stable aliases via SHA-256(salt ‖ entity ‖ token)
  --diff              Print every original → alias swap to stderr
  --db <FILE>         SQLite ledger path [default: airlock_ledger.db]
  --output <FORMAT>   pretty (default) | compact
  -v / -vv / -vvv     Verbosity (info / debug / trace)

airlock compress <FILE>
  --output <FORMAT>   pretty | compact

airlock ledger
  --last <N>          Show N most recent entries [default: 10]
  --db <FILE>         SQLite ledger path

What Gets Redacted

PII Type Standard Example Input Alias
Full name Alice Johnson User_A
Email RFC 5322 alice@corp.com alias_a@redacted.dev
Phone NANP + E.164 555-867-5309, +44 7911 123456 Phone_A
SSN SSA format 123-45-6789 SSN_A
Credit card ISO/IEC 7812 (Luhn) 4111 1111 1111 1111 Card_A
IPv4 address RFC 791 192.168.1.100 IP_A
JWT token eyJhbGci... Token_A
AWS access key AKIAIOSFODNN7EXAMPLE AwsKey_A
Env secret API_KEY=sk-abc123API_KEY=Secret_A Secret_A

All 9 types are enabled by default and individually toggleable. Aliases are consistent within a run — User_A always refers to the same person, so AI models can still reason about behavior patterns without seeing real identities.


Config File

Drop a .airlock.toml in your project directory to set defaults:

[scrub]
salt = "my-org-secret"          # stable cross-run aliases
db   = "~/.airlock/ledger.db"   # shared ledger location

[redact]
ip_addresses = false            # keep IPs as-is

[[rules]]
name         = "EmployeeId"
pattern      = "EMP-\\d{5}"
alias_prefix = "Emp"            # EMP-00042 → Emp_A

CLI flags always take precedence over the config file.


Token Compression

Repeated JSON keys are expensive for LLMs. Airlock extracts them into a single schema header:

Before (keys repeated on every row):

[
  {"timestamp": "2026-01-01T10:00:00Z", "user": "User_A", "action": "login"},
  {"timestamp": "2026-01-01T10:01:00Z", "user": "User_B", "action": "logout"}
]

After (keys extracted once, 43% fewer tokens):

{
  "__airlock_schema": ["timestamp", "user", "action"],
  "__airlock_rows": [
    ["2026-01-01T10:00:00Z", "User_A", "login"],
    ["2026-01-01T10:01:00Z", "User_B", "logout"]
  ],
  "__airlock_meta": { "tokens_before": 120, "tokens_after": 68, "reduction_pct": "43.3" }
}

Typical savings: 20–60% on structured log data.


Cross-Run Stable Aliases (--salt)

By default, aliases are assigned in encounter order: the first name seen becomes User_A, the second User_B. This is consistent within a run but may differ between runs.

Pass --salt <secret> to enable cross-run stability: every alias is derived from SHA-256(salt ‖ entity_type ‖ token) fed into a ChaCha8Rng. The same real identity always produces the same alias, regardless of which file is processed or what order records appear in.

airlock scrub january.json --salt prod-2026 > jan_clean.json
airlock scrub february.json --salt prod-2026 > feb_clean.json
# "Alice Johnson" → "User_GKQT" in both files

Keep your salt secret. It is the only thing preventing alias reversal.


Audit Ledger

Every airlock scrub run writes a row to a local SQLite database:

  ╔══════╦════════════════════╦══════════╦═════════╦══════════╦══════════════════╗
  ║  ID  ║  Timestamp         ║ Entries  ║  PII    ║  Risk    ║  Compression     ║
  ╠══════╬════════════════════╬══════════╬═════════╬══════════╬══════════════════╣
  ║    1 ║ 2026-01-15T10:00   ║      500 ║      84 ║  42/100  ║          38.4%   ║
  ║    2 ║ 2026-01-15T14:22   ║     1200 ║     203 ║  71/100  ║          51.2%   ║
  ╚══════╩════════════════════╩══════════╩═════════╩══════════╩══════════════════╝

The ledger stores counts and statistics only — never the original PII values.


Security Guarantees

Zero network calls Airlock never opens a socket. All processing is in-process on your machine.
No PII on disk The ledger stores counts and risk scores only — never names, emails, or the aliases themselves.
Alias irreversibility In seeded mode, reversing an alias requires knowledge of your salt.
Deterministic The same input + same salt always produces the same output. Fully auditable.
No third-party AI The NER engine runs locally via compiled regex patterns. Your data never touches an external API.

Installation

Python (recommended)

pip install airlock-rs

Requires Python 3.8+. Pre-built wheels for Linux, macOS, and Windows.

CLI — Cargo

cargo install airlock-rs

CLI — pre-built binary

Download from GitHub Releases. Single static binary, no runtime dependencies.


Building from Source

git clone https://github.com/OxideOps/airlock
cd airlock

# Run tests
cargo test --all-features

# Build release binary
cargo build --release

# Build Python wheel (requires maturin: pip install maturin)
maturin develop --features python

# Lint
cargo clippy --all-features -- -D warnings

Architecture

src/
├── lib.rs       — Library entry point; Python module registration
├── main.rs      — CLI (clap): scrub / compress / ledger commands
├── types.rs     — EntityType, PiiSpan, SwapRecord, LedgerEntry
├── ner.rs       — Ner trait + RegexNer (9 built-in patterns + custom rules)
├── scrub.rs     — Pipeline: NER → alias → redact → compress → ledger
├── compress.rs  — Token-Tax compression (schema extraction + row compaction)
├── ledger.rs    — SQLite Risk Ledger (rusqlite, bundled)
└── config.rs    — .airlock.toml loader

Performance

  • Parallel NER scan and alias application via Rayon
  • Static regexes compiled once per process via OnceLock
  • Zero-copy span detection using byte offsets
  • ~3.5 MB statically-linked binary with no runtime dependencies

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

airlock_rs-0.3.1-cp313-cp313-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.13Windows x86-64

airlock_rs-0.3.1-cp313-cp313-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

airlock_rs-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file airlock_rs-0.3.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: airlock_rs-0.3.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for airlock_rs-0.3.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 43adf4712948cb361679ae6189727943409619ed2b5eb42661ec1d851e200470
MD5 af24adcf8fd737eac3f46f64e6b7d8c3
BLAKE2b-256 656088b290610be7264660669d4ec74aeecc696a61e0485ad7a8a4ac6b9036f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for airlock_rs-0.3.1-cp313-cp313-win_amd64.whl:

Publisher: release.yml on OxideOps/airlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airlock_rs-0.3.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for airlock_rs-0.3.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 566fa230532acba899278ad3a913466e46ef7613361ac30e18fc8db09450cfde
MD5 eb1f8ba55ee523b17a3ad33587b915ee
BLAKE2b-256 97a9c06ef21618e7c5094ca3745c54ecd39808b054f742a4f7d11cba90eea495

See more details on using hashes here.

Provenance

The following attestation bundles were made for airlock_rs-0.3.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on OxideOps/airlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airlock_rs-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for airlock_rs-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 647d712ab59e6603bcdd4d458e23e0dca28a8b07fb4b04d04e3987949e38eaac
MD5 347d9804a4ea2dac0694a812a31bd995
BLAKE2b-256 af70e870ef037ba4c71606e94024e09f0bd17d3bf7de1ff252f40b72432cfd62

See more details on using hashes here.

Provenance

The following attestation bundles were made for airlock_rs-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on OxideOps/airlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page