Skip to main content

Count according to lower case alphabet and numbers (without ambiguous 0, 1, and l) and always start with a letter

Project description

example workflow PyPI GitHub license

erdi8

erdi8 is a unique identifier scheme and identifier generator that counts with the following alphabet:

['2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 
'i', 'j', 'k', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

It is basically a base36 alphabet that intentionally avoids the ambiguous characters [0, 1, and l] and therefore shrinks to 33. In addition to that, it ensures that no identifier starts with a numeric value by using an offset of 8. The zero is represented by 'a', 25 is represented by 'a2', etc. With three characters or less one can create 28'075 (25 + 25 * 33 + 25 * 33 * 33) different identifiers. With 6 characters or less we have 1'008'959'350 options. In a traditional identifier world, one would use a prefix, e.g. M, and then an integer. This only gives you 100k identifiers (M0 to M99999) with up to 6 characters. The scheme enables consecutive counting and is therefore free of collisions. In particular, it is not a method to create secret identifiers.

Usage

Basic (counting)

$ python3

>>> from erdi8 import Erdi8
>>> e8 = Erdi8()
>>> e8.increment("erdi8")
'erdi9'
>>> e8.decode_int("erdi8")
6545185
>>> e8.encode_int(6545185)
'erdi8'

Advanced (still counting)

Fixed length "fancy" identifiers with safe=True

$ python3

>>> from erdi8 import Erdi8
>>> safe = True
>>> start = 'b222222222'
>>> stride = 30321718760514
>>> e8 = Erdi8(safe)
>>> e8.increment_fancy(start, stride)
'fmzz7cwc43'
>>> current = e8.increment_fancy('fmzz7cwc43', stride)
>>> print(current)
k7zydqrp64

# reverse engineer stride from two consecutive identifiers
>>> e8.compute_stride('fmzz7cwc43', current)
{'stride_effective': 30321718760517, 'stride_other_candidates': [30321718760516, 30321718760515, 30321718760514]}

NOTE

  1. These sequences may have a "fancy" appearance but they are not random. They are perfectly predictable and are designed to "fill up the whole mod space" before previously coined identifiers start re-appearing.
  2. The safe=True option helps you to avoid unintended words (i.e. removes the characters [aeiou] from the alphabet)
  3. The fancy increment works with fixed lengths. If you work with a length of 10 (like above) You will have 20 * 28^9 = 211'569'119'068'160 options with safe=True. If you think you have more things to identify at some point you have two options: a) start directly with more characters or b) check for the start value (in this case b222222222) to re-appear - this will be the identifier that will "show up twice" first.
  4. Store the following four parts in a safe place: a) safe parameter b) the start value c) the stride value. On top, keep good track of the current value.

Advanced (random)

Also see documentation of Python's integrated random and secrets modules, in particular for random: "The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module". In any case, you should know what you are doing.

random module:

$ python3

>>> import random
>>> from erdi8 import Erdi8
>>> e8 = Erdi8()

# get random erdi8 identifiers with length 10
>>> mini, maxi, space = e8.mod_space(10)
>>> e8.encode_int(random.randint(mini, maxi))
'vvctyx7c6o'

secrets module:

$ python3

>>> import secrets
>>> from erdi8 import Erdi8
>>> e8 = Erdi8()

>>> e8.encode_int(int.from_bytes(secrets.token_bytes()))
'jtx3i83pii8wo98wzuucu7uag6khrfpabrdn3qrqrxdxauvcgjg'

>>> e8.encode_int(secrets.randbits(256))
'a53mpn3xntywcbdcvfa932ub34evne9oha8pzoy6ii3ur2e364z'

Even more advanced

Run a light-weight erdi8 identifier service via fasterid

Test cases

$ python3 -m unittest test/erdi8_test.py 

FAQ

Why no upper case characters?

Because we don't want to erdi8 to be confused with Erdi8.

Why no start with a number?

Because we want to avoid "number-only" identifiers. If we allowed to start with a number, we would have identifiers of the type 42 and 322 which could be mistaken for integers. We could achieve this with a more complex scheme avoiding any number-only combinations (would therefore still allow ids like 2z, to be investigated). In essence it is important to note that programs like Excel are really creative when transforming input data, for example 08342 -> 8342, 12e34 -> 12E+34, SEPT1 -> Sep-01 etc. erdi8 with the safe option on avoids 99% of these types of issues.

How about combinations that form actual (bad) words?

This depends on the use case and the way erdi8 is used. Therefore, we can recommend to work with filter lists. In addition, an erdi8 object that avoids the aeiou characters can be created with Erdi8(safe=True). This shrinks the available character space to 28 and the produced output is not compatible to Erdi8(safe=False) (default). The danger that unintended English words are created is lower with this setting. It is recommended for erdi8 identifiers that are longer than three characters where filter lists start to become impractical.

How does this relate to binary-to-text encodings such as base32 and base64?

erdi8 can be used for a binary-to-text encoding and the basic functions to implement this are provided with encode_int and decode_int. However, the primary purpose is to provide a short counting scheme for identifiers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

erdi8-0.3.0.tar.gz (17.7 kB view hashes)

Uploaded Source

Built Distribution

erdi8-0.3.0-py3-none-any.whl (18.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page