Radix32 with safe alphabet.
Project description
safe_radix32
A simple Python package that encodes any 64bit integer (long) into to a string that is free from accidentally intelligible words.
What?
The primary use-case for this library is to encode (and decode) 64bit database IDs into compact strings. This functionality was defined by a few goals:
- translated IDs must be string-type to be compatible with JavaScript where 64bit integers are not (well) supported
- IDs must be usable in URLs (path, query-value and fragment) without escaping
- IDs must avoid translating into intelligible words
- the translated form has to be as compact as feasible
Why?
There are numerous well supported and established encodings to represent an integer as string, but hardly any (here I mean I could not find one suitable) where the translated value is guaranteed to... look and feel professional.
Examples:
- A system generates a database ID randomly for an account:
6733255313934373709
. It seems perfectly innocuous, but if you base64 encode it the result is highly undesirable:XXFUCKER000
. Normal base32 has similar issues, but without both letter cases. - Encoding into hexadecimal representation is mostly acceptable. Some undesired values:
B00B5
,DEAD
,FACE
, etc. One could argue that these are not very bad... maybe childish, but certainly not "professional".
I'll leave even worse randomly translated words up to imagination. The point is that immutable database IDs for a business account mustn't have such handles.
How?
I could not find many detailed sources/references on how to construct a suitable alphabet. After gathering some information from the internet, some good ideas were presented and those were combined:
- Avoid vowels. This greatly limits random word creation. There are less vowels than consonants so excluding the smaller group leaves us more to work with.
- Avoid visually similar characters:
5Ss
,i1Il
,o0O
, etc. (depends on font, but generally true) - Trim further letters in order of their frequency in the language (english).
I've started out with the base64 alphabet and reduced it. To make processing easy I wished to use a base that is a power of 2. Thankfully trimming half of the base64 alphabet seemed to be just about optimal. A base16 alphabet would not yield meaningful benefits.
The safe_radix32 alphabet is 2346789BCFGJKLMPQVWZbcfgjkmpqvwz
which was generated by the script at tools/alphabet.py
.
Implementation
There is a Cython extension implementation with good performance and a pure python fallback module for compatibility.
Install
A source distribution is available on PyPI:
$ python -m pip install safe_radix32
Python 3.6+ and PyPy3 are supported.
Usage
>>> import safe_radix32
>>> safe_radix32.encode(12345678987654321)
'GwqGVVF6v8V'
>>> safe_radix32.decode(_)
12345678987654321
# or fixed width encoding (13 chars)
>>> safe_radix32.encode_fw(12345678987654321)
'22GwqGVVF6v8V'
>>> safe_radix32.decode(_)
12345678987654321
Security
Encoding and decoding will raise OverflowException
if the value cannot be faithfully represented in a C long.
Decoding a string will also raise OverflowException
if an invalid character is found. If the input is badly
formatted or invalid UnicodeError
will be raised.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file safe_radix32-0.4.0.tar.gz
.
File metadata
- Download URL: safe_radix32-0.4.0.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc2f7ebd99cbaab2d121c52a8530ad523d9e1ce866e98a7d53c02716331450c9 |
|
MD5 | 70352069fe5c7060c0a3ded0d0b8a703 |
|
BLAKE2b-256 | 7dd70ee39b156c4cdc9c3d4b8f8d0df0fef2eca2c51880081456032691fcbf67 |