Skip to main content

Simple text encoding type classifier

Project description

whatenc

PyPI License

Text encoding type classifier.

whatenc is a command-line tool that uses a gradient-boosted tree classifier to detect the encoding of a given string or file.

The model is trained on text samples from the Wikipedia corpus, with lines encoded using multiple encoding schemes to generate labeled examples.

How It Works

whatenc applies a feature-based approach to characterize text, then feeds these features into a gradient-boosted decision tree model to classify the encoding.

Feature Extraction

Each input string is converted into a feature vector describing its statistical properties.

Features include:

Feature Description
Length (n) Number of characters in the input
n % 4 Useful for identifying base-N encodings
Printable Ratio Fraction of characters in string.printable
Alphabetic / Digit Ratios Ratio of letters and digits to total length
Padding Ratio (=) Common in Base64/32 encodings
Compressibility Ratio of compressed to raw byte length
Shannon Entropy Measure of randomness in character distribution
English Letter Correlation Correlation between letter frequencies and English letter frequency distribution
Stopword Ratio Fraction of English stopwords

Supported Encodings

whatenc currently recognizes the following formats and transformations:

Category Encodings
Base encodings base32, base64, base85, hex, url
Text ciphers rot13, rot47, morse
Compression gzip64
Hash digests md5, sha1, sha224, sha256, sha384, sha512

Installation

You can install whatenc using pipx:

pipx install whatenc

Usage

whatenc aGVsbG8gd29ybGQ=
whatenc samples.txt

Examples

[+] input: aGVsbG8gd29ybGQ=
   [=] top guess   = base64
      [~] base64   = 0.455
      [~] plain    = 0.312
      [~] url      = 0.126

[+] input: hello
   [=] top guess   = plain
      [~] plain    = 0.552
      [~] url      = 0.246
      [~] rot13    = 0.192

[+] input: uryyb jbeyq
   [=] top guess   = rot13
      [~] rot13    = 0.555
      [~] plain    = 0.440
      [~] url      = 0.004

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatenc-0.3.1.tar.gz (184.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whatenc-0.3.1-py3-none-any.whl (188.3 kB view details)

Uploaded Python 3

File details

Details for the file whatenc-0.3.1.tar.gz.

File metadata

  • Download URL: whatenc-0.3.1.tar.gz
  • Upload date:
  • Size: 184.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for whatenc-0.3.1.tar.gz
Algorithm Hash digest
SHA256 8e3943ecaa6d0cd2a342fc87663ab2c298269d9c79f930976372a86f3d5ca804
MD5 66f0c2aa8150f66fec35250203077c52
BLAKE2b-256 8da03bcbf11fcce69e6f08d3e7d343d00b3ac18194805b967099eb79f52959fb

See more details on using hashes here.

File details

Details for the file whatenc-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: whatenc-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 188.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for whatenc-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab4819194565b01ad8816bc16f3b51efae4658278b275bbb2d3d279319c10f2f
MD5 912cb1e8615ba7c3c957636bf47f8211
BLAKE2b-256 57768f00054a72b596e7c6c6e23db897c75cc9c2817c49cc2a2531a52382a260

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page