Native codecs extension
Project description
CodExt
Encode/decode anything.
CodExt is a (Python2-3 compatible) library that extends the native codecs
library (namely for adding new custom encodings and character mappings) and provides 120+ new codecs, hence its name combining CODecs EXTension. It also features a guess mode for decoding multiple layers of encoding and CLI tools for convenience.
$ pip install codext
Want to contribute a new codec ? | Want to contribute a new macro ? |
---|---|
Check the documentation first Then PR your new codec |
PR your updated version of macros.json |
Demonstrations
//img.shields.io/badge/Tweet%20(codext)--lightgrey?logo=twitter&style=social" alt="Tweet on codext" height="20"/>
$ codext -i test.txt encode dna-1
GTGAGCGGGTATGTGA
$ echo -en "test" | codext encode morse
- . ... -
$ echo -en "test" | codext encode braille
⠞⠑⠎⠞
$ echo -en "test" | codext encode base100
👫👜👪👫
Chaining codecs
$ echo -en "Test string" | codext encode reverse
gnirts tseT
$ echo -en "Test string" | codext encode reverse morse
--. -. .. .-. - ... / - ... . -
$ echo -en "Test string" | codext encode reverse morse dna-2
AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC
$ echo -en "Test string" | codext encode reverse morse dna-2 octal
101107124103101107124103101107124107101107101101101107124103101107124107101107101101101107124107101107124107101107101101101107124107101107124103101107124107101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124124101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124107101107101101101107124103
$ echo -en "AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC" | codext -d dna-2 morse reverse
test string
Using macros
$ codext add-macro my-encoding-chain gzip base63 lzma base64
$ codext list macros
example-macro, my-encoding-chain
$ echo -en "Test string" | codext encode my-encoding-chain
CQQFAF0AAIAAABuTgySPa7WaZC5Sunt6FS0ko71BdrYE8zHqg91qaqadZIR2LafUzpeYDBalvE///ug4AA==
$ codext remove-macro my-encoding-chain
$ codext list macros
example-macro
//img.shields.io/badge/Tweet%20(unbase)--lightgrey?logo=twitter&style=social" alt="Tweet on unbase" height="20"/>
$ echo "Test string !" | base122
*.7!ft9�-f9Â
$ echo "Test string !" | base91
"ONK;WDZM%Z%xE7L
$ echo "Test string !" | base91 | base85
B2P|BJ6A+nO(j|-cttl%
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr
QVx5tvgjvCAkXaMSuKoQmCnjeCV1YyyR3WErUUErFf
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | base58-flickr -d | base36 -d | base85 -d | base91 -d
Test string !
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -m 3
Test string !
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -f Test
Test string !
Usage (Python)
Getting the list of available codecs:
>>> import codext
>>> codext.list()
['ascii85', 'base85', 'base100', 'base122', ..., 'tomtom', 'dna', 'html', 'markdown', 'url', 'resistor', 'sms', 'whitespace', 'whitespace-after-before']
>>> codext.encode("this is a test", "base58-bitcoin")
'jo91waLQA1NNeBmZKUF'
>>> codext.encode("this is a test", "base58-ripple")
'jo9rA2LQwr44eBmZK7E'
>>> codext.encode("this is a test", "base58-url")
'JN91Wzkpa1nnDbLyjtf'
>>> codecs.encode("this is a test", "base100")
'👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫'
>>> codecs.decode("👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫", "base100")
'this is a test'
>>> for i in range(8):
print(codext.encode("this is a test", "dna-%d" % (i + 1)))
GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA
CTCACGGACGGCCTATAGAACGGCCTATAGAACGACAGAACTCACGCCCTATCTCA
ACAGATTGATTAACGCGTGGATTAACGCGTGGATGAGTGGACAGATAAACGCACAG
AGACATTCATTAAGCGCTCCATTAAGCGCTCCATCACTCCAGACATAAAGCGAGAC
TCTGTAAGTAATTCGCGAGGTAATTCGCGAGGTAGTGAGGTCTGTATTTCGCTCTG
TGTCTAACTAATTGCGCACCTAATTGCGCACCTACTCACCTGTCTATTTGCGTGTC
GAGTGCCTGCCGGATATCTTGCCGGATATCTTGCTGTCTTGAGTGCGGGATAGAGT
CACTCGGTCGGCCATATGTTCGGCCATATGTTCGTCTGTTCACTCGCCCATACACT
>>> codext.decode("GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA", "dna-1")
'this is a test'
>>> codecs.encode("this is a test", "morse")
'- .... .. ... / .. ... / .- / - . ... -'
>>> codecs.decode("- .... .. ... / .. ... / .- / - . ... -", "morse")
'this is a test'
>>> with open("morse.txt", 'w', encoding="morse") as f:
f.write("this is a test")
14
>>> with open("morse.txt",encoding="morse") as f:
f.read()
'this is a test'
>>> codext.decode("""
=
X
:
x
n
r
y
Y
y
p
a
`
n
|
a
o
h
`
g
o
z """, "whitespace-after+before")
'CSC{not_so_invisible}'
>>> print(codext.encode("An example test string", "baudot-tape"))
***.**
. *
***.*
* .
.*
* .*
. *
** .*
***.**
** .**
.*
* .
* *. *
.*
* *.
* *. *
* .
* *.
* *. *
***.
*.*
***.*
* .*
List of codecs
BaseXX
-
base1
: useless, but for the sake of completeness -
base2
: simple conversion to binary (with a variant with a reversed alphabet) -
base3
: conversion to ternary (with a variant with a reversed alphabet) -
base4
: conversion to quarternary (with a variant with a reversed alphabet) -
base8
: simple conversion to octal (with a variant with a reversed alphabet) -
base10
: simple conversion to decimal -
base11
: conversion to digits with a "a" -
base16
: simple conversion to hexadecimal (with a variant holding an alphabet with digits and letters inverted) -
base26
: conversion to alphabet letters -
base32
: classical conversion according to the RFC4648 with all its variants (zbase32, extended hexadecimal, geohash, Crockford) -
base36
: Base36 conversion to letters and digits (with a variant inverting both groups) -
base45
: Base45 DRAFT algorithm (with a variant inverting letters and digits) -
base58
: multiple versions of Base58 (bitcoin, flickr, ripple) -
base62
: Base62 conversion to lower- and uppercase letters and digits (with a variant with letters and digits inverted) -
base63
: similar tobase62
with the "_
" added -
base64
: classical conversion according to RFC4648 with its variant URL (or file) (it also holds a variant with letters and digits inverted) -
base67
: custom conversion using some more special characters (also with a variant with letters and digits inverted) -
base85
: all variants of Base85 (Ascii85, z85, Adobe, (x)btoa, RFC1924, XML) -
base91
: Base91 custom conversion -
base100
(or emoji): Base100 custom conversion -
base122
: Base100 custom conversion -
base-genericN
: see base encodings ; supports any possible base
This category also contains ascii85
, adobe
, [x]btoa
, zeromq
with the base85
codec.
Binary
-
baudot
: supports CCITT-1, CCITT-2, EU/FR, ITA1, ITA2, MTK-2 (Python3 only), UK, ... -
baudot-spaced
: variant ofbaudot
; groups of 5 bits are whitespace-separated -
baudot-tape
: variant ofbaudot
; outputs a string that looks like a perforated tape -
bcd
: Binary Coded Decimal, encodes characters from their (zero-left-padded) ordinals -
bcd-extended0
: variant ofbcd
; encodes characters from their (zero-left-padded) ordinals using prefix bits0000
-
bcd-extended1
: variant ofbcd
; encodes characters from their (zero-left-padded) ordinals using prefix bits1111
-
excess3
: uses Excess-3 (aka Stibitz code) binary encoding to convert characters from their ordinals -
gray
: aka reflected binary code -
manchester
: XORes each bit of the input with01
-
manchester-inverted
: variant ofmanchester
; XORes each bit of the input with10
-
rotateN
: rotates characters by the specified number of bits (N belongs to [1, 7] ; Python 3 only)
Common
-
a1z26
: keeps words whitespace-separated and uses a custom character separator -
cases
: set of case-related encodings (including camel-, kebab-, lower-, pascal-, upper-, snake- and swap-case, slugify, capitalize, title) -
dummy
: set of simple encodings (including integer, replace, reverse, word-reverse, substite and strip-spaces) -
octal
: dummy octal conversion (converts to 3-digits groups) -
octal-spaced
: variant ofoctal
; dummy octal conversion, handling whitespace separators -
ordinal
: dummy character ordinals conversion (converts to 3-digits groups) -
ordinal-spaced
: variant ofordinal
; dummy character ordinals conversion, handling whitespace separators
Compression
-
gzip
: standard Gzip compression/decompression -
lz77
: compresses the given data with the algorithm of Lempel and Ziv of 1977 -
lz78
: compresses the given data with the algorithm of Lempel and Ziv of 1978 -
pkzip_deflate
: standard Zip-deflate compression/decompression -
pkzip_bzip2
: standard BZip2 compression/decompression -
pkzip_lzma
: standard LZMA compression/decompression
:warning: Compression functions are of course definitely NOT encoding functions ; they are implemented for leveraging the
.encode(...)
API fromcodecs
.
Cryptography
-
affine
: aka Affine Cipher -
atbash
: aka Atbash Cipher -
bacon
: aka Baconian Cipher -
barbie-N
: aka Barbie Typewriter (N belongs to [1, 4]) -
citrix
: aka Citrix CTX1 password encoding -
railfence
: aka Rail Fence Cipher -
rotN
: aka Caesar cipher (N belongs to [1,25]) -
scytaleN
: encrypts using the number of letters on the rod (N belongs to [1,[) -
shiftN
: shift ordinals (N belongs to [1,255]) -
xorN
: XOR with a single byte (N belongs to [1,255])
:warning: Crypto functions are of course definitely NOT encoding functions ; they are implemented for leveraging the
.encode(...)
API fromcodecs
.
Hashing
-
blake
: includes BLAKE2b and BLAKE2s (Python 3 only ; relies onhashlib
) -
checksums
: includes Adler32 and CRC32 (relies onzlib
) -
crypt
: Unix's crypt hash for passwords (Python 3 and Unix only ; relies oncrypt
) -
md
: aka Message Digest ; includes MD4 and MD5 (relies onhashlib
) -
sha
: aka Secure Hash Algorithms ; includes SHA1, 224, 256, 384, 512 (Python2/3) but also SHA3-224, -256, -384 and -512 (Python 3 only ; relies onhashlib
) -
shake
: aka SHAKE hashing (Python 3 only ; relies onhashlib
)
:warning: Hash functions are of course definitely NOT encoding functions ; they are implemented for convenience with the
.encode(...)
API fromcodecs
and useful for chaning codecs.
Languages
-
braille
: well-known braille language (Python 3 only) -
ipsum
: aka lorem ipsum -
galactic
: aka galactic alphabet or Minecraft enchantment language (Python 3 only) -
leetspeak
: based on minimalistic elite speaking rules -
morse
: uses whitespace as a separator -
navajo
: only handles letters (not full words from the Navajo dictionary) -
radio
: aka NATO or radio phonetic alphabet -
southpark
: converts letters to Kenny's language from Southpark (whitespace is also handled) -
southpark-icase
: case insensitive variant ofsouthpark
-
tap
: converts text to tap/knock code, commonly used by prisoners -
tomtom
: similar tomorse
, using slashes and backslashes
Others
-
dna
: implements the 8 rules of DNA sequences (N belongs to [1,8]) -
letter-indices
: encodes consonants and/or vowels with their corresponding indices -
markdown
: unidirectional encoding from Markdown to HTML
Steganography
-
hexagram
: uses Base64 and encodes the result to a charset of I Ching hexagrams (as implemented here) -
klopf
: aka Klopf code ; Polybius square with trivial alphabetical distribution -
resistor
: aka resistor color codes -
rick
: aka Rick cipher (in reference to Rick Astley's song "Never gonna give you up") -
sms
: also called T9 code ; uses "-
" as a separator for encoding, "-
" or "_
" or whitespace for decoding -
whitespace
: replaces bits with whitespaces and tabs -
whitespace_after_before
: variant ofwhitespace
; encodes characters as new characters with whitespaces before and after according to an equation described in the codec name (e.g. "whitespace+2*after-3*before
")
Web
-
html
: implements entities according to this reference -
url
: aka URL encoding
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file codext-1.15.4.tar.gz
.
File metadata
- Download URL: codext-1.15.4.tar.gz
- Upload date:
- Size: 4.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3d2e83289421d21d8f55281d426966ccea4054d96a0ea0ac15b354b386accbf |
|
MD5 | 4104d4b21d17ddf24ca092fc303bd921 |
|
BLAKE2b-256 | 3a4b08c52d43e1c9b34132c0027a5123619cab646bd1aeae9f81798e2ab5b286 |
File details
Details for the file codext-1.15.4-py3-none-any.whl
.
File metadata
- Download URL: codext-1.15.4-py3-none-any.whl
- Upload date:
- Size: 137.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0248f5c821b8443bdb4678164430d460d6136ee8fa8d62c7847f248906664c0 |
|
MD5 | 68fd9a37e01a45e7c4eced68a11b62ba |
|
BLAKE2b-256 | c67b2fd9f63256cad6960cece9ccbc2e1c56e7d6bc39a6a72960b0b5f2ed57c3 |