Skip to main content

A Python package for detecting hidden Unicode and ASCII characters.

Project description

ByteSleuth_Banner

🕵️‍♂️ ByteSleuth — The Ghost Hunter for Hidden Characters

"Elementary, my dear dev. The ghosts of hidden characters won't escape this audit!" — CharlockHolmes, the detective inside ByteSleuth

ByteSleuth is a powerful Unicode & ASCII character scanner designed to detect obfuscation, invisible threats, and suspicious bytes lurking in text or code. Whether you're hunting down ghost characters or analyzing unexpected encoding issues, ByteSleuth ensures a clean and transparent result.


🚀 Key Features

  • ✅ Detects ASCII control characters (e.g., NUL, BEL, ESC)
  • ✅ Flags Unicode invisibles and directional controls (e.g., U+200B, U+202E)
  • ✅ Optionally sanitizes input by removing hidden/malicious characters
  • ✅ Works seamlessly with files, directories, and stdin/PIPE
  • ✅ Supports logging for audit trails
  • ✅ Generates SHA256 hash before/after sanitization
  • ✅ Outputs JSON reports (stdout or file)
  • Concurrent directory scanning for speed
  • Fail on detect mode for CI/CD/pre-commit
  • Backup/restore before sanitization
  • VSCode extension for easy integration
  • Pre-commit & CI/CD integration examples
  • Real-world examples included

🔧 CLI Usage

python src/byte_sleuth.py <target> [options]

CLI Options

Option Description
target File or directory to scan (or use PIPE input)
-s, --sanitize Automatically remove suspicious characters
-l, --log Log file to write results (default: scanner.log)
-r, --report [file] Print JSON report to stdout or save to file
-f, --no-backup Disable backup creation
-v, --verbose Enable verbose output (shows hashes, findings)
-d, --debug Enable debug output
-q, --quiet Suppress all output except errors
-S, --sanitize-only Only sanitize, do not scan/report
-F, --fail-on-detect Exit with code 1 if suspicious characters are found
-V, --version Show version and exit

CLI Examples

# Scan and sanitize a file, showing hashes and findings
python byte_sleuth/byte_sleuth.py suspicious.txt -s -v

# Scan a directory, output JSON report to file
python byte_sleuth/byte_sleuth.py ./data/ -r report.json

# Sanitize stdin (PIPE), output to sanitized.txt
cat file.txt | python byte_sleuth/byte_sleuth.py -s > sanitized.txt

# Scan from PIPE and fail (exit 1) if any suspicious character is found (for CI/pre-commit)
cat file.txt | python byte_sleuth/byte_sleuth.py -F

# Log all removed characters from PIPE to a custom log file
cat file.txt | python byte_sleuth/byte_sleuth.py -s -l removed_chars.log > sanitized.txt

# Scan a directory and fail if any file contains suspicious characters (CI/pre-commit)
python byte_sleuth/byte_sleuth.py src/ -F

📦 Using ByteSleuth in Your Python Projects

Installation

Once published to PyPI:

pip install byte-sleuth

Basic Usage in Python

from byte_sleuth import ByteSleuth
scanner = ByteSleuth(sanitize=True)
findings = scanner.scan_file("example.txt")
for cp, name, char, idx in findings:
    print(f"⚠️ Suspicious Character: {name} (U+{cp:04X}) at position {idx}{repr(char)}")

🔁 Automation & Integration

  • Pre-commit hook: Block commits with hidden characters
  • CI/CD pipelines: Fail builds if issues are found
  • VSCode extension: Scan open files with one click
  • JSON reports: For audit or further automation

Pre-commit Example

# .pre-commit-config.yaml
- repo: local
  hooks:
    - id: byte-sleuth-scan
      name: ByteSleuth Unicode & ASCII Scanner
      entry: python byte_sleuth/byte_sleuth.py src/ -F
      language: system
      pass_filenames: false

GitHub Actions Example

- name: Scan for hidden characters
  run: cat file.txt | python byte_sleuth/byte_sleuth.py -F

🧑‍💻 VSCode Extension

  • Scan the current file for hidden/suspicious characters
  • See results directly in VSCode
  • Easy to install and use (see vscode-extension/README.md)

🧠 Why Use ByteSleuth?

Some characters are invisible but dangerous—causing confusion in source code, configs, or documents. Common attack vectors include:

  • Zero-width spaces for code obfuscation
  • Bidirectional override characters
  • Hidden ASCII control codes
  • Formatting trickery affecting debugging & diffs

ByteSleuth gives you a detective's magnifying glass to expose them all. 🔍

Comparison with other tools

Tool Unicode ASCII Control Sanitization JSON Report CLI/Automation VSCode Integration
ByteSleuth ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
grep/sed ✔️ ✔️
ad-hoc scripts ✔️ ✔️
  • ByteSleuth covers Unicode, ASCII, sanitizes, generates reports, and integrates easily with automation and VSCode.
  • grep/sed are great for simple ASCII, but do not cover Unicode or sanitization.
  • Ad-hoc scripts are fragile and hard to maintain.

🚀 Roadmap

  • Expand sanitization methods
  • Improve CLI interactivity
  • Output JSON reports
  • VSCode Extension
  • HTML reports
  • Support for more file formats (zip, PDF, etc.)
  • Public changelog/roadmap

📄 License

MIT — Feel free to sleuth away!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bytesleuth-1.0.5.tar.gz (491.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bytesleuth-1.0.5-py3-none-any.whl (490.1 kB view details)

Uploaded Python 3

File details

Details for the file bytesleuth-1.0.5.tar.gz.

File metadata

  • Download URL: bytesleuth-1.0.5.tar.gz
  • Upload date:
  • Size: 491.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bytesleuth-1.0.5.tar.gz
Algorithm Hash digest
SHA256 9c4b36e9ecb66ffbad0d79284f5ba6d6a6e463b5cf18076ca0198eba618a44d2
MD5 b4d89799450a1957440a4da63aa4292c
BLAKE2b-256 a239377041cd4a7c633f5b72e2ecc935535d998c7a7e900ebdac2e206b5666d4

See more details on using hashes here.

File details

Details for the file bytesleuth-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: bytesleuth-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 490.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bytesleuth-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4d3846cca184b2334f4e1079b6ebeb8c8160cd642d5aaae7e1952520c73f77ab
MD5 ff42a14c4c552e1ad0ce2f16ac2ef6d2
BLAKE2b-256 3de26c5c892458d566f055fb134c05fb6e60a6427fe9b62d1f7bfd965a02aae9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page