Skip to main content

No project description provided

Project description

Sanitext

Sanitize text from LLMs

Sanitext is a command-line tool and Python library for detecting and removing unwanted characters in text. It supports:

  • ASCII-only sanitization (default)
  • Custom character allowlists (--allow-chars, --allow-file)
  • Interactive review of non-allowed characters (--interactive)

Installation

pip install sanitext

By default, sanitext uses the string in your clipboard unless you specify one with --string.

CLI usage example

# Process the clipboard content & copy back to clipboard
sanitext
# Detect characters but don't modify
sanitext --detect
# Process clipboard + show detected characters (most common command)
sanitext -v
# Process clipboard + show input, detected characters & output
sanitext -vv
# Process the provided string and print it
sanitext --string "Héllø, 𝒲𝑜𝓇𝓁𝒹!"
# Allow additional characters (for now, only single unicode code point characters)
sanitext --allow-chars "αøñç"
# Allow characters from a file
sanitext --allow-file allowed_chars.txt
# Allow single code point emoji
sanitext --allow-emoji
# Prompt user for handling disallowed characters
# y (Yes) -> keep it
# n (No) -> remove it
# r (Replace) -> provide a replacement character
sanitext --interactive
# Allow emojis
sanitext --allow-emoji

Python library usage example

from sanitext.text_sanitization import (
    sanitize_text,
    detect_suspicious_characters,
    get_allowed_characters,
)

text = "“2×3 – 4 = 5”😎󠅒󠅟󠅣󠅣"

# Detect suspicious characters
suspicious_characters = detect_suspicious_characters(text)
print(f"Suspicious characters: {suspicious_characters}")
# [('“', 'LEFT DOUBLE QUOTATION MARK'), ('×', 'MULTIPLICATION SIGN'), ('–', 'EN DASH'), ('”', 'RIGHT DOUBLE QUOTATION MARK')]

# Sanitize text to all ASCII
sanitized_text = sanitize_text(text)
print(f"Sanitized text: {sanitized_text}")  # "2x3 - 4 = 5"
# Allow the multiplication sign
allowed_characters = get_allowed_characters()
allowed_characters.add("×")
sanitized_text = sanitize_text(text, allowed_characters=allowed_characters)
print(f"Sanitized text: {sanitized_text}")  # "2×3 - 4 = 5"
# Allow the emoji (but clean it from the encoded message "boss")
allowed_characters = get_allowed_characters(allow_emoji=True)
sanitized_text = sanitize_text(text, allowed_characters=allowed_characters)
print(f"Sanitized text: {sanitized_text}")  # "2x3 - 4 = 5"😎

Dev setup

# Install dependencies
poetry install
# Use it
poetry run python sanitext/cli.py --help
poetry run python sanitext/cli.py --string "your string"
# Run tests
poetry run pytest
poetry run pytest -s tests/test_cli.py
# Run tests over different python versions (TODO: setup github action)
poetry run tox
# Publish to PyPI
poetry build
poetry publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanitext-0.1.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanitext-0.1.1-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file sanitext-0.1.1.tar.gz.

File metadata

  • Download URL: sanitext-0.1.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.9.18 Darwin/22.6.0

File hashes

Hashes for sanitext-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f2cc14baf7e08460dd20d1bacd125924581a3a4cf1fb4d1e51ec0b48ebc97e57
MD5 faa6895138a58555dc179947da028749
BLAKE2b-256 0d321ff2109ac972a90b6ce1eecb2dff5360fc3c96ee362989aeaac6cc7f70e1

See more details on using hashes here.

File details

Details for the file sanitext-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sanitext-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.9.18 Darwin/22.6.0

File hashes

Hashes for sanitext-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35106ce61c4d8d64e6987929ff3fe1cb5a579db13be48d90616b6a6e7dd57444
MD5 b5aa31214b6386add41b6181200cd71e
BLAKE2b-256 89e3500926e26b92b4a72bc18de336851a123d02c230bd3f3bec2f827b4a7fca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page