Text encoding type classifier
Project description
whatenc is a command-line tool that identifies the encoding or transformation of a given string or file.
The model is trained on text samples from the English, Greek, Russian, Hebrew, and Arabic Wikipedia corpora, chosen to represent a diverse set of writing systems (Latin, Greek, Cyrillic, Hebrew, and Arabic scripts). Each line is encoded using multiple encoding schemes to generate labeled examples.
How It Works
whatenc uses a character-level 1D Convolutional Neural Network trained directly on bigram token sequences.
Each training sample is represented as:
- bigram of characters, padded to a fixed maximum length
- a true length scalar feature, allowing the network to learn relative string lengths
This neural approach achieves near-perfect classification accuracy after only a few epochs.
Supported Encodings
whatenc currently recognizes the following formats and transformations:
| Category | Encodings |
|---|---|
| Base encodings | base32, base64, base85, hex, url |
| Text ciphers | morse |
| Compression | gzip64 |
| Hash digests | md5, sha1, sha224, sha256, sha384, sha512 |
Installation
You can install whatenc using pipx:
pipx install whatenc
Usage
API
from whatenc import Classifier
classifier = Classifier()
print(classifier.predict("hello, world!")) # returns: [('plain', 1.0), ('md5', 7.686760500681856e-26), ('base85', 2.864714171264974e-35)]
CLI
whatenc hello
whatenc samples.txt
Examples
[+] input: ZW5jb2RlIHRvIGJhc2U2NCBmb3JtYXQ=
[~] top guess = base64
[=] base64 = 1.000
[=] base85 = 0.000
[=] plain = 0.000
[+] input: hello
[~] top guess = plain
[=] plain = 1.000
[=] md5 = 0.000
[=] base64 = 0.000
[*] loading model
[+] input: האקדמיה ללשון העברית
[~] top guess = plain
[=] plain = 1.000
[=] base64 = 0.000
[=] base85 = 0.000
[*] loading model
[+] input: bfa99df33b137bc8fb5f5407d7e58da8
[~] top guess = md5
[=] md5 = 0.999
[=] sha1 = 0.001
[=] sha224 = 0.000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whatenc-0.9.1.tar.gz.
File metadata
- Download URL: whatenc-0.9.1.tar.gz
- Upload date:
- Size: 7.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
473b44a271839dba4ed25580baae1254a765318c266b3ce461cb69dbb0bfc561
|
|
| MD5 |
46f087972e420dda16cd88a6a31eb307
|
|
| BLAKE2b-256 |
295becc5f581286933d41f58ed00af38d570a211658fb5e2276cc9e610867c36
|
File details
Details for the file whatenc-0.9.1-py3-none-any.whl.
File metadata
- Download URL: whatenc-0.9.1-py3-none-any.whl
- Upload date:
- Size: 7.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fd774ab242999c6ee90229e26a2830d543179e34cc6182edab4a3d98940bbbb
|
|
| MD5 |
6a782188e050d5208e20d19d96a6fe41
|
|
| BLAKE2b-256 |
6615bc3d68854d9087c4bd51f88f2d86341ec4bf5fd3a8cf3685b2a508c5a6b5
|