Skip to main content

A Python package for detecting hidden Unicode and ASCII characters.

Project description

ByteSleuth_Banner

🕵️‍♂️ ByteSleuth — The Ghost Hunter for Hidden Characters

"Elementary, my dear dev. The ghosts of hidden characters won't escape this audit!" — CharlockHolmes, the detective inside ByteSleuth

ByteSleuth is a powerful Unicode & ASCII character scanner designed to detect obfuscation, invisible threats, and suspicious bytes lurking in text or code. Whether you're hunting down ghost characters or analyzing unexpected encoding issues, ByteSleuth ensures a clean and transparent result.


🚀 Key Features

  • ✅ Detects ASCII control characters (e.g., NUL, BEL, ESC)
  • ✅ Flags Unicode invisibles and directional controls (e.g., U+200B, U+202E)
  • ✅ Optionally sanitizes input by removing hidden/malicious characters
  • ✅ Works seamlessly with files, directories, and stdin/PIPE
  • ✅ Supports logging for audit trails
  • ✅ Generates SHA256 hash before/after sanitization
  • ✅ Outputs JSON reports (stdout or file)
  • Concurrent directory scanning for speed
  • Fail on detect mode for CI/CD/pre-commit
  • Backup/restore before sanitization
  • VSCode extension for easy integration
  • Pre-commit & CI/CD integration examples
  • Real-world examples included

🔧 CLI Usage

python src/byte_sleuth.py <target> [options]

CLI Options

Option Description
target File or directory to scan (or use PIPE input)
-s, --sanitize Automatically remove suspicious characters
-l, --log Log file to write results (default: scanner.log)
-r, --report [file] Print JSON report to stdout or save to file
-f, --no-backup Disable backup creation
-v, --verbose Enable verbose output (shows hashes, findings)
-d, --debug Enable debug output
-q, --quiet Suppress all output except errors
-S, --sanitize-only Only sanitize, do not scan/report
-F, --fail-on-detect Exit with code 1 if suspicious characters are found
-V, --version Show version and exit

CLI Examples

python src/byte_sleuth.py suspicious.txt -s -v
python src/byte_sleuth.py ./data/ -r report.json
cat file.txt | python src/byte_sleuth.py -s > sanitized.txt
python src/byte_sleuth.py src/ -F  # For CI/pre-commit: fail if any issue found

📦 Using ByteSleuth in Your Python Projects

Installation

Once published to PyPI:

pip install byte-sleuth

Basic Usage in Python

from byte_sleuth import ByteSleuth
scanner = ByteSleuth(sanitize=True)
findings = scanner.scan_file("example.txt")
for cp, name, char, idx in findings:
    print(f"⚠️ Suspicious Character: {name} (U+{cp:04X}) at position {idx}{repr(char)}")

🔁 Automation & Integration

  • Pre-commit hook: Block commits with hidden characters
  • CI/CD pipelines: Fail builds if issues are found
  • VSCode extension: Scan open files with one click
  • JSON reports: For audit or further automation

Pre-commit Example

# .pre-commit-config.yaml
- repo: local
  hooks:
    - id: byte-sleuth-scan
      name: ByteSleuth Unicode & ASCII Scanner
      entry: python src/byte_sleuth.py src/ -F
      language: system
      pass_filenames: false

GitHub Actions Example

- name: Scan for hidden characters
  run: python src/byte_sleuth.py src/ -F

🧑‍💻 VSCode Extension

  • Scan the current file for hidden/suspicious characters
  • See results directly in VSCode
  • Easy to install and use (see vscode-extension/README.md)

🧠 Why Use ByteSleuth?

Some characters are invisible but dangerous—causing confusion in source code, configs, or documents. Common attack vectors include:

  • Zero-width spaces for code obfuscation
  • Bidirectional override characters
  • Hidden ASCII control codes
  • Formatting trickery affecting debugging & diffs

ByteSleuth gives you a detective's magnifying glass to expose them all. 🔍

Comparison with other tools

Tool Unicode ASCII Control Sanitization JSON Report CLI/Automation VSCode Integration
ByteSleuth ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
grep/sed ✔️ ✔️
ad-hoc scripts ✔️ ✔️
  • ByteSleuth covers Unicode, ASCII, sanitizes, generates reports, and integrates easily with automation and VSCode.
  • grep/sed are great for simple ASCII, but do not cover Unicode or sanitization.
  • Ad-hoc scripts are fragile and hard to maintain.

🚀 Roadmap

  • Expand sanitization methods
  • Improve CLI interactivity
  • Output JSON reports
  • VSCode Extension
  • HTML reports
  • Support for more file formats (zip, PDF, etc.)
  • Public changelog/roadmap

📄 License

MIT — Feel free to sleuth away!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bytesleuth-1.0.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file bytesleuth-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: bytesleuth-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for bytesleuth-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 354a2de6f414a77543b3ee658b93678b2577daca454410cd37a2d8be44064189
MD5 8e191601c45a07aa2c584f66974968f5
BLAKE2b-256 965f3de4406cc7e90f414c36597dd48bcf5938013b3c410df8bbaa6637bbc5b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page