Skip to main content

Advanced string encoding / decoding toolkit — 24 formats, auto-detection, deep decode, pipelines, plugins

Project description

stringshift

Python License PyPI Formats

Advanced string encoding / decoding toolkit for Python.

24 built-in formats · Auto-detection engine · Deep multi-layer unwrapping · Operation pipelines · Runtime plugin system · Full CLI · Zero dependencies


What's new in v4.0

  • 16 new formatsbase16, base58, base85, ascii85, rot47, nato, braille, caesar, atbash, vigenere, xor, reverse, unicode_escape, punycode, quoted_printable, uuencode
  • magic_decode — rank every possible interpretation of an unknown string
  • smart_decode — one-call auto-detection and decoding
  • deep_decode — recursively unwrap multi-layer encodings (like CyberChef)
  • pipeline — chain encode/decode operations in sequence
  • Plugin system — register custom codecs at runtime
  • Proper exceptionsDecodeError, EncodeError, UnknownFormatError, PipelineError
  • Bug fixes: exceptions.py now contains actual exceptions, all functions properly exported

Installation

pip install stringshift

# Optional: smarter byte-encoding detection
pip install "stringshift[full]"

Quick Start

import stringshift

# Encode & decode
stringshift.encode("hello", "base64")       # 'aGVsbG8='
stringshift.decode("aGVsbG8=", "base64")    # 'hello'

# Don't know what something is? Auto-detect it
stringshift.smart_decode("SGVsbG8=")        # 'Hello'

# See every possible interpretation, ranked by confidence
stringshift.magic_decode("SGVsbG8=")
# [{'format': 'base64', 'confidence': 0.89, 'decoded': 'Hello'}, ...]

# Unwrap multi-layer encodings in one call
stringshift.deep_decode("SGVsbG8%3D")
# {
#   'result': 'Hello',
#   'total_layers': 2,
#   'layers': [
#     {'layer': 1, 'format': 'url',    'value': 'SGVsbG8='},
#     {'layer': 2, 'format': 'base64', 'value': 'Hello'}
#   ]
# }

# Chain operations
stringshift.pipeline("hello", ["base64_encode", "url_encode"])  # 'aGVsbG8%3D'

Supported Formats (24 built-in)

Category Formats
Base encodings base64 base32 base16 base58 base85 ascii85
Binary / Hex hex binary
Web / Text url html quoted_printable punycode uuencode
Classic ciphers caesar atbash vigenere rot13 rot47 xor
Symbol / Human morse nato braille
Misc reverse unicode_escape

CLI

Install and the stringshift command is available immediately.

# Auto-detect and decode
$ stringshift "SGVsbG8="
Hello

# Encode
$ stringshift "Hello" -e base64
SGVsbG8=

# Decode a specific format
$ stringshift "48656c6c6f" -d hex
Hello

# Show all possible interpretations with confidence scores
$ stringshift "SGVsbG8=" --magic
Confidence   Format             Decoded
------------------------------------------------------
89%          base64             Hello

# Unwrap multi-layer encodings (like CyberChef)
$ stringshift "SGVsbG8%3D" --deep
Layer  1  [url             ]  SGVsbG8=
Layer  2  [base64          ]  Hello

Final result: Hello

# Chain operations via pipeline
$ stringshift "hello" --pipeline base64_encode url_encode
aGVsbG8%3D

# Cipher options
$ stringshift "Hello" -e caesar --shift 3      # Khoor
$ stringshift "Hello" -e vigenere --key secret
$ stringshift "Hello" -e xor --xor-key 99

# Batch process — one item per line from stdin
$ echo -e "aGVsbG8=\nd29ybGQ=" | stringshift --batch -d base64
Hello
world

# Process a file
$ stringshift -i encoded.txt -d base64 > decoded.txt

# List every available format
$ stringshift --list

# Benchmark processing time
$ stringshift "SGVsbG8=" --benchmark

# Interactive mode (no arguments)
$ stringshift
stringshift 4.0.0    interactive mode
Commands:  encode <fmt> <text>
           decode <fmt> <text>
           magic  <text>
           deep   <text>
           list
stringshift>

Python API

Encode & Decode

import stringshift

stringshift.encode("hello", "hex")                            # '68656c6c6f'
stringshift.encode("Hello", "caesar", shift=3)                # 'Khoor'
stringshift.encode("Hello", "vigenere", key="secret")         # 'Zinlc'
stringshift.encode("SOS", "morse")                            # '... --- ...'
stringshift.encode("ABC", "nato")                             # 'Alpha Bravo Charlie'
stringshift.encode("hi", "braille")                           # '⠓⠊'

stringshift.decode("68656c6c6f", "hex")                       # 'hello'
stringshift.decode("Khoor", "caesar", shift=3)                # 'Hello'
stringshift.decode("Zinlc", "vigenere", key="secret")         # 'Hello'

You can also call individual format functions directly:

from stringshift import encode_base64, decode_morse, encode_braille
encode_base64("hello")           # 'aGVsbG8='
decode_morse("... --- ...")      # 'SOS'
encode_braille("hello")          # '⠓⠑⠇⠇⠕'

Auto-Detection

# Best guess — returns a single string
stringshift.smart_decode("68656c6c6f")         # 'hello'
stringshift.smart_decode("hello%20world")      # 'hello world'
stringshift.smart_decode("... --- ...")        # 'SOS'

# All candidates, ranked by confidence
results = stringshift.magic_decode("SGVsbG8=")
for r in results:
    print(f"{r['confidence']:.0%}  {r['format']:15s}  {r['decoded']}")

# Detection only — no decoding
results = stringshift.detect_format("SGVsbG8=")
# [{'format': 'base64', 'confidence': 0.89, 'decoded': 'Hello'}]

Deep Decode

Automatically peels every encoding layer off a string, the same way CyberChef's "Magic" operation works.

# Two layers: url → base64
info = stringshift.deep_decode("SGVsbG8%3D")
print(info["result"])          # 'Hello'
print(info["total_layers"])    # 2
for layer in info["layers"]:
    print(layer["layer"], layer["format"], layer["value"])
# 1  url     SGVsbG8=
# 2  base64  Hello

# Three layers: url → base64 → hex
tripled = stringshift.encode(
    stringshift.encode(stringshift.encode("Hi", "hex"), "base64"),
    "url"
)
info = stringshift.deep_decode(tripled)
print(info["result"])         # 'Hi'
print(info["total_layers"])   # 3

Pipeline

Chain any number of encode/decode steps. Each step must end with _encode or _decode. Pass a tuple to include kwargs for ciphers.

# Simple chain
result = stringshift.pipeline("hello", [
    "base64_encode",
    "url_encode",
])
# 'aGVsbG8%3D'

# Reverse it
stringshift.pipeline(result, ["url_decode", "base64_decode"])
# 'hello'

# With cipher kwargs
stringshift.pipeline("hello", [
    ("caesar_encode", {"shift": 5}),
    "base64_encode",
    "url_encode",
])

Batch Processing

All batch functions use a thread pool internally and scale automatically to your CPU count.

texts = ["hello", "world", "foo"]

# Encode all in parallel
stringshift.batch_process(texts, operation="encode", fmt="base64")
# ['aGVsbG8=', 'd29ybGQ=', 'Zm9v']

# Decode all — explicit format
encoded = [stringshift.encode(t, "hex") for t in texts]
stringshift.batch_process(encoded, operation="decode", fmt="hex")
# ['hello', 'world', 'foo']

# Decode all — auto-detect format per item
mixed = ["SGVsbG8=", "68656c6c6f", "hello%20world"]
stringshift.batch_process(mixed)
# ['Hello', 'hello', 'hello world']

# Control worker threads
stringshift.batch_process(texts, operation="encode", fmt="base64", workers=8)

Plugin System

Register your own codec at runtime. It immediately becomes available to encode(), decode(), pipeline(), the CLI, and list_formats().

# Simple functional style
stringshift.register_codec(
    "shout",
    encoder=str.upper,
    decoder=str.lower,
)
stringshift.encode("hello", "shout")    # 'HELLO'
stringshift.decode("HELLO", "shout")    # 'hello'

# Class decorator style — cleaner for complex codecs
@stringshift.codec("reverse_words")
class ReverseWords:
    def encode(self, text: str) -> str:
        return " ".join(word[::-1] for word in text.split())
    def decode(self, text: str) -> str:
        return self.encode(text)   # self-inverse

stringshift.encode("hello world", "reverse_words")   # 'olleh dlrow'

# Use in a pipeline
stringshift.pipeline("hello world", [
    "reverse_words_encode",
    "base64_encode",
])

# See all formats including plugins
stringshift.list_formats()
# {'builtin': ['ascii85', 'atbash', 'base16', ...], 'plugins': ['shout', 'reverse_words']}

Error Handling

from stringshift import (
    DecodeError, EncodeError,
    UnknownFormatError, PipelineError,
)

# Bad input for a known format
try:
    stringshift.decode("not!!valid!!", "base64")
except stringshift.DecodeError as exc:
    print(exc.original)    # the input that failed
    print(exc.error)       # the underlying exception

# Requesting a format that doesn't exist
try:
    stringshift.encode("hello", "made_up")
except stringshift.UnknownFormatError as exc:
    print(exc.fmt)         # 'made_up'
    print(exc.available)   # full list of valid format names

# Pipeline step failure
try:
    stringshift.pipeline("hello", ["badstep"])
except stringshift.PipelineError as exc:
    print(exc.step)        # 'badstep'
    print(exc.index)       # 0  (position in the pipeline)

# Auto-detect on truly unrecognisable input
try:
    stringshift.smart_decode("!@#$%^&*()")
except stringshift.DecodeError:
    print("Could not determine encoding")

Legacy Helpers (v1 compatible)

These functions are kept for backward compatibility.

# decode_all: applies URL + HTML + escape-sequence decoding in one pass
stringshift.decode_all("hello%20world")              # 'hello world'
stringshift.decode_all("&lt;b&gt;hi&lt;/b&gt;")     # '<b>hi</b>'
stringshift.decode_all("\\x", fallback="[error]")    # '[error]'  ← invalid escape

# Normalise Unicode
stringshift.normalize_text("café", "NFC")
stringshift.normalize_text("café", "NFD")

# Parallel decode_all over a list
stringshift.batch_decode(["hello%20world", "&amp;foo"])
# ['hello world', '&foo']

Format Reference

Format Encode: "Hi" Notes
base64 SGk= RFC 4648, auto-padded
base32 JBQQ==== uppercase alphabet
base16 4869 uppercase hex
base58 9Ajd Bitcoin alphabet, no 0/O/I/l
base85 LrF Python base64.b85encode
ascii85 9jqo^ Adobe variant
hex 4869 lowercase, strips 0x/spaces/colons on decode
binary 01001000 01101001 8-bit groups, space-separated
url Hi ("Hi!"Hi%21) quote(safe="")
html Hi ("<b>"&lt;b&gt;) full entity escaping
quoted_printable Hi email-safe encoding
punycode caf-dma (for café) IDN domain encoding
uuencode *2&D classic Unix transfer encoding
rot13 Uv letter-only, self-inverse
rot47 w6 all printable ASCII, self-inverse
caesar Jk (shift=1) kwarg: shift (default 13)
atbash Sr A↔Z substitution, self-inverse
vigenere Rr (key="k") kwarg: key (default "key")
xor 62 43 kwarg: key int 0-255 (default 42)
morse .... .. dots, dashes, / for space
nato Hotel India full NATO phonetic alphabet
braille ⠓⠊ Grade 1 Braille
unicode_escape \u0048\u0069 \uXXXX / \xXX sequences
reverse iH self-inverse

Running Tests

pip install pytest
pytest tests/ -v

Contributing

Pull requests are welcome. To add a new codec:

  1. Add encode_<name> and decode_<name> functions to core.py
  2. Register them in ENCODE_REGISTRY and DECODE_REGISTRY
  3. Add a round-trip test in tests/test_core.py
  4. Update the format table in this README

License

MIT — free for personal and commercial use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stringshift-4.0.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stringshift-4.0.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file stringshift-4.0.0.tar.gz.

File metadata

  • Download URL: stringshift-4.0.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for stringshift-4.0.0.tar.gz
Algorithm Hash digest
SHA256 e0631f8cf0fc5f821fe7a53f554d0effeec21c5c4092bfbc20715a37054b4890
MD5 4a747c3ee8a1a4daa1b4a186d6c3adbd
BLAKE2b-256 509a96a83715c49f4ad317488a95efce0bf5649cd2506d2bc7172904d44ca16b

See more details on using hashes here.

File details

Details for the file stringshift-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: stringshift-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for stringshift-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52c7a37e22ad214e506bc2f734a89ca974c76e472dda9539adcf9ce5d8fc4545
MD5 0d4f5a96883c91379fe9082a4a104bd4
BLAKE2b-256 c3b987be6d3e82f6f4214878aedc63772f5852bb00970cbeba4fc8eac5057205

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page