A Python package for detecting hidden Unicode and ASCII characters.
Project description
🕵️♂️ ByteSleuth — The Ghost Hunter for Hidden Characters
"Elementary, my dear dev. The ghosts of hidden characters won't escape this audit!"
— CharlockHolmes, the detective inside ByteSleuth
ByteSleuth is a powerful Unicode & ASCII character scanner designed to detect obfuscation, invisible threats, and suspicious bytes lurking in text or code. Whether you're hunting down ghost characters or analyzing unexpected encoding issues, ByteSleuth ensures a clean and transparent result.
🚀 Key Features
✅ Detects ASCII control characters (e.g., NUL, BEL, ESC)
✅ Flags Unicode invisibles and directional controls (e.g., U+200B, U+202E)
✅ Optionally sanitizes input by removing hidden/malicious characters
✅ Works seamlessly with files and directories
✅ Supports logging for audit trails
✅ Can be embedded in existing workflows
🔧 CLI Usage
python byte_sleuth.py <target> [-m MODE] [-s] [-l LOG_FILE]
CLI Arguments
| Argument | Description |
|---|---|
target |
File or directory to scan |
-m, --mode |
Scan only ASCII, only Unicode, or both (all) |
-s, --sanitize |
Automatically remove suspicious characters |
-l, --log |
Log file to write results (default: scanner.log) |
CLI Example
python byte_sleuth.py suspicious.txt -m all -s
Scans
suspicious.txtfor both ASCII & Unicode anomalies, removes them, and logs results.
📦 Using ByteSleuth in Your Python Projects
Since ByteSleuth is modular, you can easily integrate it into any existing application.
Installing ByteSleuth
Once published to PyPI, you can install it via:
pip install byte-sleuth
Basic Usage in Python
from byte_sleuth import CharacterScanner
scanner = CharacterScanner(sanitize=True)
findings = scanner.scan_file("example.txt", mode="all")
for cp, name, char in findings:
print(f"⚠️ Suspicious Character: {name} (U+{cp:04X}) → {repr(char)}")
This scans
"example.txt"for hidden characters and removes them if needed.
🔁 Embedding ByteSleuth in Workflows
ByteSleuth can be used beyond basic scans, making it a perfect fit for automation and security audits:
- 🛠️ Pre-commit hook — Block commits containing obfuscated characters.
- 🔍 CI/CD pipelines — Ensure clean and readable source code before deployment.
- 📜 Log analysis — Detect and clean malformed logs with invisible characters.
Example: Pre-commit Hook
# .pre-commit-config.yaml
- repo: local
hooks:
- id: byte-sleuth-scan
name: ByteSleuth Unicode & ASCII Scanner
entry: python byte_sleuth.py src/ -m all -s
language: system
pass_filenames: false
🧠 Why Use ByteSleuth?
Some characters are invisible but dangerous—causing confusion in source code, configs, or documents.
Common attack vectors include:
🔹 Zero-width spaces used for code obfuscation
🔹 Bidirectional override characters affecting text visibility
🔹 Hidden ASCII control codes that alter behavior unexpectedly
🔹 Formatting trickery affecting debugging & diffs
ByteSleuth gives you a detective's magnifying glass to expose them all. 🔍
🚀 Roadmap
✔️ Expand sanitization methods
✔️ Improve CLI interactivity
✔️ Output JSON reports
🟡 VSCode Extension (planned)
🟡 Interactive CLI with rich or curses UI (planned)
🕵️♂️ Honorary Agent: CharlockHolmes
When Unicode hides... he seeks.
When ASCII misbehaves... he strikes.
Because no character escapes... the ByteSleuth.
📄 License
MIT — Feel free to sleuth away!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bytesleuth-1.0.0-py3-none-any.whl.
File metadata
- Download URL: bytesleuth-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d16ab2078a817179856c71f46f426ea4da9c600d5dcc2ac3ce52acda4d8aa78
|
|
| MD5 |
0c681110922ff67cdf4d7fb1afba1d06
|
|
| BLAKE2b-256 |
593e4181185bac489f465fa4e110690adb82297e44d6acfe8eb8eee98804ab6c
|