Skip to main content

Universally Unique Prefixed Lexicographically Sortable Identifier

Project description

UPID

pronounced YOO-pid

aka Universally Unique Prefixed Lexicographically Sortable Identifier

This is the spec and Python implementation for UPID.

UPID is based on ULID but with some modifications, inspired by this article and Stripe IDs.

The core idea is that a meaningful prefix is specified that is stored in a 128-bit UUID-shaped slot. Thus a UPID is human-readable (like a Stripe ID), but still efficient to store, sort and index.

UPID allows a prefix of up to 4 characters (will be right-padded if shorter than 4), includes a non-wrapping timestamp with about 250 millisecond precision, and 64 bits of entropy.

This is a UPID in Python:

upid("user")            # user_2accvpp5guht4dts56je5a

And in Rust:

UPID::new("user")      // user_2accvpp5guht4dts56je5a

And in Postgres too:

CREATE TABLE users (id upid NOT NULL DEFAULT gen_upid('user') PRIMARY KEY);
INSERT INTO users DEFAULT VALUES;
SELECT id FROM users;  -- user_2accvpp5guht4dts56je5a

-- this also works
SELECT id FROM users WHERE id = 'user_2accvpp5guht4dts56je5a';

Plays nice with your server code too, no extra work needed:

with psycopg.connect("postgresql://...") as conn:
    res = conn.execute("SELECT id FROM users").fetchone()
    print(res)          # user_2accvpp5guht4dts56je5a

Specification

Key changes relative to ULID:

  1. Uses a modified form of Crockford's base32 that uses lower-case and includes the full alphabet (for prefix flexibility).
  2. Does not permit upper-case/lower-case to be decoded interchangeably.
  3. The text encoding is still 5 bits per base32 character.
  4. 20 bits assigned to the prefix
  5. 40 bits (down from 48) assigned to the timestamp, placed first in binary for sorting
  6. 64 bits (down from 80) for randomness
  7. 4 bits as a version specifier
    user       2accvpp5      guht4dts56je5       a
   |----|     |--------|    |-------------|   |-----|
   prefix       time            random        version     total
   4 chars      8 chars         13 chars      1 char      26 chars
       \________/________________|___________    |
               /                 |           \   |
              /                  |            \  |
           40 bits            64 bits         24 bits    128 bits
           5 bytes            8 bytes         3 bytes     16 bytes
           time               random      prefix+version

Binary layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            time_high                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    time_low   |                     random                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             random                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     random    |                  prefix_and_version           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Collision

Relative to ULID, the time precision is reduced from 48 to 40 bits (keeping the most significant bits, so oveflow still won't occur until 10889 AD), and the randomness reduced from 80 to 64 bits.

The timestamp precision at 40 bits is around 250 milliseconds. In order to have a 50% probability of collision with 64 bits of randomness, you would need to generate around 4 billion items per 250 millisecond window.

Python implementation

This aims to be maximally simple to convey the core working of the spec. The current Python implementation is entirely based on mdomke/python-ulid.

Installation

pip install upid

Usage

Run from the CLI:

python -m upid user

Use in a program:

from upid import upid
upid("user")

Development

Code and tests are in the py/ directory. Using Rye for development (installation instructions at the link).

# can be run from the repo root
rye sync
rye run all  # or fmt/lint/check/test

If you just want to have a look around, pip should also work:

pip install -e .

Rust implementation

The current Rust implementation is based on dylanhart/ulid-rs, but using the same lookup base32 lookup method as the Python implementation.

Installation

cargo add upid

Usage

use upid::Upid;
Upid::new("user");

Development

Code and tests are in the upid_rs/ directory.

cd upid_rs
cargo check  # or fmt/clippy/build/test/run

Postgres extension

There is also a Postgres extension built on the Rust implementation, using pgrx and based on the very similar extension pksunkara/pgx_ulid.

Installation

You can try out the Docker image carderne/postgres-upid:16:

docker run -e POSTGRES_HOST_AUTH_METHOD=trust -p 5432:5432 carderne/postgres-upid:16

If you want to install it into another Postgres, you'll install pgrx and follow its installation instructions. Something like this:

cargo install --locked cargo-pgrx
pgrx init
cd upid_pg
pgrx install

Installable binaries will come soon.

Usage

CREATE EXTENSION ulid;


CREATE TABLE users (
    id   upid NOT NULL DEFAULT gen_upid('user') PRIMARY KEY,
    name text NOT NULL
);
INSERT INTO users (name) VALUES('Bob');
SELECT * FROM users;

Development

Code and tests are in the upid_pg/ directory.

cd upid_pg
cargo check  # or fmt/clippy

# must test/run/install with pgrx
# this will compile it into a Postgres installation
# and run the tests there, or drop you into a psql prompt
cargo pgrx test  # or run/install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upid-0.1.7.tar.gz (9.6 kB view hashes)

Uploaded Source

Built Distribution

upid-0.1.7-py3-none-any.whl (8.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page