Skip to main content

Text encoding type classifier

Project description

whatenc

PyPI License

Blog

Text encoding type classifier.

whatenc is a command-line tool that identifies the encoding or transformation of a given string or file.

The model is trained on text samples from the English, Greek, Russian, Hebrew, and Arabic Wikipedia corpora, chosen to represent a diverse set of writing systems (Latin, Greek, Cyrillic, Hebrew, and Arabic scripts). Each line is encoded using multiple encoding schemes to generate labeled examples.

How It Works

whatenc uses a character-level 1D Convolutional Neural Network trained directly on bigram token sequences.

Each training sample is represented as:

  • bigram of characters, padded to a fixed maximum length
  • a true length scalar feature, allowing the network to learn relative string lengths

This neural approach achieves near-perfect classification accuracy after only a few epochs.

Supported Encodings

whatenc currently recognizes the following formats and transformations:

Category Encodings
Base encodings base32, base64, base85, hex, url
Text ciphers morse
Compression gzip64
Hash digests md5, sha1, sha224, sha256, sha384, sha512

Installation

You can install whatenc using pipx:

pipx install whatenc

Usage

API

from whatenc import Classifier

classifier = Classifier()
print(classifier.predict("hello, world!")) # returns: [('plain', 1.0), ('md5', 7.686760500681856e-26), ('base85', 2.864714171264974e-35)]

CLI

whatenc hello
whatenc samples.txt

Examples

[+] input: ZW5jb2RlIHRvIGJhc2U2NCBmb3JtYXQ=
   [~] top guess   = base64
      [=] base64   = 1.000
      [=] base85   = 0.000
      [=] plain    = 0.000

[+] input: hello
   [~] top guess   = plain
      [=] plain    = 1.000
      [=] md5      = 0.000
      [=] base64   = 0.000

[*] loading model
[+] input: האקדמיה ללשון העברית
   [~] top guess   = plain
      [=] plain    = 1.000
      [=] base64   = 0.000
      [=] base85   = 0.000

[*] loading model
[+] input: bfa99df33b137bc8fb5f5407d7e58da8
   [~] top guess   = md5
      [=] md5      = 0.999
      [=] sha1     = 0.001
      [=] sha224   = 0.000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatenc-0.9.0.tar.gz (7.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whatenc-0.9.0-py3-none-any.whl (7.8 MB view details)

Uploaded Python 3

File details

Details for the file whatenc-0.9.0.tar.gz.

File metadata

  • Download URL: whatenc-0.9.0.tar.gz
  • Upload date:
  • Size: 7.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for whatenc-0.9.0.tar.gz
Algorithm Hash digest
SHA256 e38c35b8926e42828060989224f295d6ec30faaeda09bb021df27aad52ead474
MD5 ecc970e903ffcdf31cf1c62870242c9e
BLAKE2b-256 c22109a610f28a2dfe08f4c39e14a3b438d56f9de16430c3758062979a43e596

See more details on using hashes here.

File details

Details for the file whatenc-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: whatenc-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for whatenc-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cf831a3cedaead9e947390fae315d5f7cd5eab9fbc342a54174cbcda4e147ce
MD5 60759c23cee52e1976bc006e4065de71
BLAKE2b-256 ee5d55613c45a2148217f3c0891af58d8088816fe3e154221a6925d5b952a870

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page