Skip to main content

base26 ([A-Z]) and base52 ([A-Za-z]) encodings

Project description

alphacodings

base26 ([A-Z]) and base52 ([A-Za-z]) encodings

🌟 overview

transform any string to alphabetic-only with base26 ([A-Z]) and base52 ([A-Za-z]) lossless encodings; useful for transmitting textual data over restrictive channels or for training AI models and tokenizers on simpler vocabularies.

alphacodings is a fast and lightweight library using GMP arithmetic.

⚙️ installation

python -m pip install alphacodings

🚀 usage

from alphacodings import base26_encode, base26_decode, base52_encode, base52_decode


string = """\
<!DOCTYPE html>
<html>
<head>
    <title>sample page</title>
</head>
<body>
    <h1>welcome!</h1>
    <p>you are reading a sample HTML string.</p>
</body>
</html>
"""


if __name__ == "__main__":
    encoding_base26 = base26_encode(string)
    print(encoding_base26)
    # >>> ["YBPNLKVNQWZQCMDHMLNDTVQCCRKQLNCFGMQPNGQCIXHUUPHFUNKUFEPDLKIGARFOKTDEZKQHXGCPYHDZKKVIUDNFOAYYAUOQFBJFFGSTKAXNWGDPVUJNBARPNXBASHZBXIBSSEFTAIQRPEADSOVVNXUMQXVDWTAIVCIVWQZAHAGYAVZYKGMETJOOUQNOEXMSOOGSKVMFBYZIBZDAITICYVXMJTTCCHPMSCABLYUMFDUNLVSLNKHSBPKCGASXJSFYDHZFAOEQTUACEBIFKQGYC"]

    encoding_base52 = base52_encode(string)
    print(encoding_base52)
    # >>> ["EgcgYRPxckylMQWRLDADNZxPJiJcHaVwYHLnicahBgaotGGANZuvsvcpSSOJFLXvKPjRlNQCJqqdviiIdtnwJyDOnWojsrpkWSTZFHbMIREvREjpsODtSxoLlLjQZOoehsGFzawGQecyuomgpZQNyFnZQLWPiDhzClwxBFCCwdqduGJoshrwFdwHWMtJpSTmjxzaYmNvzOIOwLkJvyQHCaFtrODPhbhBpPBmC"]

    assert base26_decode(encoding_base26) == string
    assert base52_decode(encoding_base52) == string

🧠 motivation

The library is inspired by R. Heaton's base26 implementation and his story of manipulating data transmission in restrictive network channels on long-distance flights using alphabetic-only encodings and tokenization.

have a look at the original repository and story blog post and show him some love.

📊 benchmarking

our implementation is orders of magnitude more efficient on 100k+ strings:

benchmarking

Figure 1: runtime and memory usage performance against Heaton's original implementation with and without automatic chunking and SIMD on variable-length strings with a strict 60-second timeout; average over 5 trials.

🤝 contributing

contributions to alphacodings are welcome!

feel free to submit pull requests or open issues on our repository.

📄 license

see the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphacodings-0.2.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alphacodings-0.2.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file alphacodings-0.2.0.tar.gz.

File metadata

  • Download URL: alphacodings-0.2.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.5.7

File hashes

Hashes for alphacodings-0.2.0.tar.gz
Algorithm Hash digest
SHA256 00af7678a8d6699614b75d4c59353c7ce82769ff565e219b95b63f62595a9c9d
MD5 609af95b0d2a065b6b550514acfca817
BLAKE2b-256 c461d7ded79cb9515c70d056bb2b2d5fc654a9c3ffad3f57df59d5adfa2f9eaf

See more details on using hashes here.

File details

Details for the file alphacodings-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for alphacodings-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 efa215f5dca2d5c3b2e67e6c1bc8daa4404c07f11c41d2c3731a28b6f3430e22
MD5 f7d53af9a7b7fd97bbe000a1a088a248
BLAKE2b-256 c0b191861670659888fc655cc96cfae324a9413eb4fe0610a5f709090e55c51f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page