Skip to main content

Count according lower case alphabet and numbers (without 0, 1, and l)

Project description

example workflow GitHub license

erdi8

erdi8 is a unique identifier scheme and counter that operates on the following alphabet:

['2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 
'i', 'j', 'k', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

It is basically a base36 alphabet that intentionally avoids the ambigous characters [0, 1, and l] and therefore shrinks to 33. In addition to that, it ensures that no identifier starts with a numeric value by using an offset of 8. The zero is represented by 'a', 25 is represented by 'a2', etc. With three characters or less one can create 28'075 (25 + 25 * 33 + 25 * 33 * 33) different identifiers. With 6 characters or less we have 1'008'959'350 options. In a traditional identifier world one would use a prefix, e.g. M, and then an integer. This only gives you 100k identifiers (M0 to M99999) with up to 6 characters. The scheme enables consecutive counting and is therefore free of collissions. In particular, it is not a method to create secret identifiers.

Usage

$ python3

>>> from erdi8 import Erdi8
>>> e8 = Erdi8()
>>> e8.increment("erdi8")
'erdi9'
>>> e8.decode_int("erdi8")
6545185
>>> e8.encode_int(6545185)
'erdi8'

Test cases

$ python3 -m unittest test/erdi8_test.py 

Intended use

When you run an identifier redirect service of the type https://purl.example.org/ your users can reserve "their space" for their current business application and or domain. We encourage the administrator of such a service to offer opaque folder names for long-term identifier stability. These folder names can be chosen to follow the erdi8 scheme and offer 825 (25 * 33) potential two-character folder names. In addition, also subfolder names and local accession identifiers can be generated with this scheme such that FAIR data objects can be identified with URIs of the type https://purl.example.org/b7/a/erdi8.

FAQ

Why no upper case characters?

Because we don't want to erdi8 to be confused with Erdi8.

Why no start with a number?

Because we want to avoid "number-only" identifiers. If we allowed to start with a number we would have identifiers of the type 42 and 322 which could be mistaken for integers. We could achieve this with a more complex scheme avoiding any number-only combinations (would therefore still allow ids like 2z, to be investigated).

How about combinations that form actual (bad) words?

This depends on the use case and the way erdi8 is used. Therefore, we can recommend to work with filter lists. In addition an erdi8 object that avoids the aeiou characters can be created with Erdi8(safe=True). This shrinks the available character space to 28 and the produced output is not compatible to Erdi8(safe=False) (default). The danger that unintended English words are created is lower with this setting. It is recommended for erdi8 identifiers that are longer than three characters where filter lists start to become impractical.

How does this relate to binary-to-text encodings such as base32 and base64?

erdi8 can be used for a binary-to-text encoding and the basic functions to implement this are provided with encode_int and decode_int. However, the primary purpose is to provide a short counting scheme for identifiers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

erdi8-0.0.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

erdi8-0.0.1-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file erdi8-0.0.1.tar.gz.

File metadata

  • Download URL: erdi8-0.0.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for erdi8-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4df0d70f2163257b0cffc5a44363bef507fb0ea75c045ddc6726b2f6bef72b15
MD5 6ef256dde37344a8b59e75edd247fcdf
BLAKE2b-256 7f8de0277dd7cd73bd43f69d4836ecaf91793b7d4ce08d5adb772daee92c3b93

See more details on using hashes here.

File details

Details for the file erdi8-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: erdi8-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for erdi8-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d81d08f91b1cab49669261d9d9c47cb495694233e27f094b3adfbce29a5120be
MD5 dfcd619ca588b6d0502df52548218c3e
BLAKE2b-256 29e6ae2f0887a99df86dba4644e1ae963c7c335eb20c570a58cf7dc41500ee05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page