Input sanitization pipeline for untrusted text. Deterministic. No ML. No false positives.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

project-navi

These details have not been verified by PyPI

Project description

navi-sanitize

Deterministic input sanitization for untrusted text. Zero dependencies, zero false positives.

from navi_sanitize import clean

clean("Неllo Wоrld")  # "Hello World" — Cyrillic Н/о replaced
clean("price:\u200b 0")  # "price: 0" — zero-width space stripped
clean("file\x00.txt")  # "file.txt" — null byte removed

Opt-in utilities for deeper analysis: decode_evasion() peels nested URL/HTML/hex encodings, detect_scripts() and is_mixed_script() flag mixed-script spoofing.

Why This Matters

Untrusted text contains invisible attacks: homoglyph substitution, zero-width characters, null bytes, fullwidth encoding, template/prompt injection delimiters. These bypass validation, poison templates, and fool humans.

navi-sanitize fixes the text before it reaches your application. It doesn't detect attacks — it removes them.

LLM prompt pipelines — User input flows into system prompts, RAG context, and tool calls. Invisible Unicode (tag block characters, bidi overrides) encodes instructions that tokenizers read but humans can't see. Homoglyphs bypass keyword filters. navi-sanitize strips these vectors before text reaches the model, and the pluggable escaper lets you add vendor-specific prompt escaping on top.

Web applications — Jinja2 SSTI, path traversal, and fullwidth encoding bypasses are well-known but tedious to cover manually. A single clean(user_input, escaper=jinja2_escaper) call handles homoglyph-disguised payloads like {{ cоnfig }} (Cyrillic о) that naive escaping misses.

Config and data ingestion — YAML, TOML, and JSON parsed from untrusted sources can carry null bytes that truncate C-extension processing, zero-width characters that break key matching, and homoglyphs that create near-duplicate keys. walk(parsed_config) sanitizes every string in a nested structure in one call.

Log analysis and SIEM — Attackers embed bidi overrides and zero-width characters in log entries to hide indicators of compromise from analysts and pattern-matching tools. Sanitizing log data on ingest ensures what you search is what's actually there.

Identity and anti-phishing — pаypal.com (Cyrillic а) renders identically to paypal.com in most fonts. Homoglyph replacement normalizes display names, URLs, and email addresses to catch spoofing that visual inspection misses.

How It Compares

navi-sanitize is the only library that combines invisible character stripping, homoglyph replacement, NFKC normalization, and pluggable escaping in a single zero-dependency pipeline. Existing tools solve pieces of this problem:

	navi-sanitize	Unidecode / anyascii	confusable_homoglyphs	ftfy	MarkupSafe / nh3
Purpose	Security sanitization	ASCII transliteration	Homoglyph detection	Encoding repair	HTML escaping
Invisible chars	Strips 411 (bidi, tag block, ZW, VS)	Incidental	No	Partial (preserves bidi, ZW, VS)	No
Homoglyphs	Replaces 51 curated pairs	Transliterates all non-ASCII	Detects only (no replace)	No	No
NFKC	Yes	No	No	NFC (NFKC optional)	No
Null bytes	Yes	No	No	No	No
Preserves Unicode	Yes (CJK, Arabic, emoji intact)	No (destroys all non-ASCII)	Yes	Yes	Yes
Pluggable escaper	Yes	No	No	No	N/A (HTML-specific)
Dependencies	Zero	Zero	Zero	wcwidth	C ext / Rust ext

Key differences:

Unidecode / anyascii transliterate all non-ASCII to Latin. They turn " into "Zhong" and Cyrillic sentences into gibberish. navi-sanitize normalizes only the 51 highest-risk lookalikes and leaves legitimate Unicode intact.
confusable_homoglyphs uses the full Unicode Consortium confusables dataset (thousands of pairs) but only detects — you'd need to write your own replacement layer. It's also archived.
ftfy is complementary, not competing. It fixes encoding corruption and explicitly preserves bidi overrides and zero-width characters that navi-sanitize strips. Different threat model.
MarkupSafe / nh3 handle HTML structure; navi-sanitize handles the character-level content inside that structure. They compose naturally.
pydantic / cerberus are validation frameworks — call navi_sanitize.clean() inside a pydantic AfterValidator or cerberus coercion chain for validated, sanitized output.

Pipeline

Every string passes through stages in order. Each stage returns clean output and a warning if it changed anything.

Stage	What it does
Null bytes	Strip `\x00`
Invisibles	Strip zero-width, Unicode Tag block, bidi controls
NFKC	Normalize fullwidth ASCII to standard ASCII
Homoglyphs	Replace Cyrillic/Greek lookalikes with Latin equivalents
Escaper	Pluggable — you choose what to escape for

The first four stages are universal. The escaper is where you tell the pipeline what the output is for.

Escapers

from navi_sanitize import clean, jinja2_escaper, path_escaper

# For Jinja2 templates
clean("{{ malicious }}", escaper=jinja2_escaper)

# For filesystem paths
clean("../../etc/passwd", escaper=path_escaper)

# For LLM prompts — bring your own
clean(user_input, escaper=my_prompt_escaper)

# No escaper — just the universal stages
clean(user_input)

An escaper is a function: str -> str. Write one in three lines.

Framework Integration

# Pydantic — validate then sanitize
from typing import Annotated
from pydantic import BaseModel, AfterValidator
from navi_sanitize import clean

SafeStr = Annotated[str, AfterValidator(clean)]

class UserInput(BaseModel):
    name: SafeStr
    bio: SafeStr

# FastAPI — sanitize at the edge
from fastapi import Depends, Query
from navi_sanitize import clean

def safe_query(q: str = Query()) -> str:
    return clean(q)

@app.get("/search")
def search(q: str = Depends(safe_query)):
    return {"results": find(q)}

# Jinja2 — sanitize before rendering
from navi_sanitize import clean, jinja2_escaper

safe_context = {k: clean(v, escaper=jinja2_escaper) for k, v in user_data.items()}
template.render(**safe_context)

Install

pip install navi-sanitize

Walk untrusted data structures

from navi_sanitize import walk

# Recursively sanitize every string in a dict/list
spec = walk(untrusted_json)

Opt-in Utilities

These utilities are not part of clean() and are never run automatically. You must call them explicitly.

from navi_sanitize import decode_evasion, clean, detect_scripts, is_mixed_script, path_escaper

# Double-encoded path traversal
raw = "%252e%252e%252fetc%252fpasswd"

# 1. Peel nested encodings (URL → HTML entities → hex escapes)
peeled = decode_evasion(raw)           # "../etc/passwd"

# 2. Sanitize through the universal pipeline
cleaned = clean(peeled, escaper=path_escaper)  # "etc/passwd"

# 3. Check for mixed-script spoofing (useful on raw or pre-clean input)
if is_mixed_script(raw) or is_mixed_script(peeled):
    flag_for_review(raw)

decode_evasion(text, *, max_layers=3) — iterative URL/HTML/hex decoding; stops when a pass produces no change
detect_scripts(text) — returns script buckets present in text (latin, cyrillic, greek, etc.)
is_mixed_script(text) — True when 2+ scripts detected

Script detection can be applied pre-clean too — most useful on raw input for phishing detection.

What This Doesn't Do

navi-sanitize operates at the character level. It does not cover:

HTML/XSS — use your template engine's auto-escaping (markupsafe.escape(), nh3.clean())
SQL injection — use parameterized queries
Schema validation — use pydantic, cerberus, or similar (they compose with clean())
LLM prompt injection — vendor syntax is a moving target; write a custom escaper

These are different problems with mature, purpose-built solutions. navi-sanitize handles what they don't: the invisible, character-level content that slips past them.

Warnings

The pipeline never errors. It always produces output. When it changes something, it logs a warning.

import logging
logging.basicConfig()

clean("pаypal.com")
# WARNING:navi_sanitize: Replaced 1 homoglyph(s) in value
# Returns: "paypal.com"

Performance

Measured on Python 3.12, single thread. clean() is the per-string cost; walk() includes deepcopy.

Scenario	Mean	Ops/sec
`clean()` — short, clean text (no-op)	2.8 us	358K
`clean()` — short, hostile (all stages fire)	67 us	15K
`clean()` — 13KB clean text	810 us	1.2K
`clean()` — 10KB hostile text	449 us	2.2K
`clean()` — 100KB hostile payload	5.7 ms	176
`walk()` — 100-item nested dict, clean	537 us	1.9K
`walk()` — 100-item nested dict, hostile	6.9 ms	144

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

project-navi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Apr 5, 2026

0.2.0

Apr 5, 2026

0.1.1

Mar 2, 2026

This version

0.1.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

navi_sanitize-0.1.0.tar.gz (49.7 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

navi_sanitize-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file navi_sanitize-0.1.0.tar.gz.

File metadata

Download URL: navi_sanitize-0.1.0.tar.gz
Upload date: Mar 1, 2026
Size: 49.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for navi_sanitize-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0f20aa257963a371dc2d2475060d97f9f5bc317713597d7be8cc38c77d4ac9ee`
MD5	`279b7b98106ad6ee8cd74d3cd1b5b4f9`
BLAKE2b-256	`8b1a39992bde198be4ec2f2978ea0cd6519ed77b6d0973f30fc76abe2826082f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for navi_sanitize-0.1.0.tar.gz:

Publisher: publish.yml on Project-Navi/navi-sanitize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: navi_sanitize-0.1.0.tar.gz
- Subject digest: 0f20aa257963a371dc2d2475060d97f9f5bc317713597d7be8cc38c77d4ac9ee
- Sigstore transparency entry: 1006692536
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: Project-Navi/navi-sanitize@d7bafaa1f56ab24f6bc9507175c4a72b51f37521
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Project-Navi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d7bafaa1f56ab24f6bc9507175c4a72b51f37521
- Trigger Event: release

File details

Details for the file navi_sanitize-0.1.0-py3-none-any.whl.

File metadata

Download URL: navi_sanitize-0.1.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 15.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for navi_sanitize-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`58c9230227de26ab90702a35cf2133b626064d90739da2f7882c2d875f096c0f`
MD5	`a1d3527b7bc153e8ee43c152e428b158`
BLAKE2b-256	`844cd6db86dc84308c564c5624834114085182daf6f665ecf9b5f25f7a52e334`

See more details on using hashes here.

Provenance

The following attestation bundles were made for navi_sanitize-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Project-Navi/navi-sanitize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: navi_sanitize-0.1.0-py3-none-any.whl
- Subject digest: 58c9230227de26ab90702a35cf2133b626064d90739da2f7882c2d875f096c0f
- Sigstore transparency entry: 1006692539
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: Project-Navi/navi-sanitize@d7bafaa1f56ab24f6bc9507175c4a72b51f37521
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Project-Navi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d7bafaa1f56ab24f6bc9507175c4a72b51f37521
- Trigger Event: release

navi-sanitize 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

navi-sanitize

Why This Matters

How It Compares

Pipeline

Escapers

Framework Integration

Install

Walk untrusted data structures

Opt-in Utilities

What This Doesn't Do

Warnings

Performance

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance