Skip to main content

Universally Unique Prefixed Lexicographically Sortable Identifier

Project description

UPID

pronounced YOO-pid

aka Universally Unique Prefixed Lexicographically Sortable Identifier

This is the spec and Python and Rust implementations for UPID. The Typescript implementation is here.

UPID is based on ULID but with some modifications, inspired by this article and Stripe IDs.

The core idea is that a meaningful prefix is specified that is stored in a 128-bit UUID-shaped slot. Thus a UPID is human-readable (like a Stripe ID), but still efficient to store, sort and index.

UPID allows a prefix of up to 4 characters (will be right-padded if shorter than 4), includes a non-wrapping timestamp with about 250 millisecond precision, and 64 bits of entropy.

This is a UPID in Python:

upid("user")            # user_2accvpp5guht4dts56je5a

And in Rust:

UPID::new("user")      // user_2accvpp5guht4dts56je5a

And in Postgres too:

CREATE TABLE users (id upid NOT NULL DEFAULT gen_upid('user') PRIMARY KEY);
INSERT INTO users DEFAULT VALUES;
SELECT id FROM users;  -- user_2accvpp5guht4dts56je5a

-- this also works
SELECT id FROM users WHERE id = 'user_2accvpp5guht4dts56je5a';

Plays nice with your server code, no extra work needed:

with psycopg.connect("postgresql://...") as conn:
    res = conn.execute("SELECT id FROM users").fetchone()
    print(res)          # user_2accvpp5guht4dts56je5a

Examples

You can try out the Python and Rust examples in this repository. They both involve spinning up a Postgres DB and inserting a UPID as itself, as a UUID and as text.

There are also TypeScript examples for browser and Node (with Postgres) in the upid-ts repo.

Demo

You can give it a spin at upid.rdrn.me.

Benefits

  • Context: You'll never forget what kind of ID you're staring at. Is it a user_ or a prod_ or maybe a role_? Your product team will thank you, as will your API users.
  • Compatible: Under the hood it's just 128 bits, so you can pass it to Postgres or anything else and pretend it's a UUID. But you'll know your prefix is safely waiting to remind you what it is.
  • K-Sortable: UPID has 256 millisecond timestamp precision. This ensures data locality and while not leaking too much information about timing and ordering.
  • Pretty: The encoding is short, easily copy-pastable and URL-safe. It uses lower-case letters, which are prettier than upper-case ones.

Implementations

If you don't have time for ASCII art, you can skip to the good stuff:

Language Link
Python in this repo (scroll down)
Postgres in this repo (scroll down)
Rust in this repo (scroll down)
TypeScript carderne/upid-ts

Specification

Key changes relative to ULID:

  1. Uses a modified form of Crockford's base32 that uses lower-case and includes the full alphabet (for prefix flexibility).
  2. Does not permit upper-case/lower-case to be decoded interchangeably.
  3. The text encoding is still 5 bits per base32 character.
  4. 20 bits assigned to the prefix
  5. 40 bits (down from 48) assigned to the timestamp, placed first in binary for sorting
  6. 64 bits (down from 80) for randomness
  7. 4 bits as a version specifier
    user       2accvpp5      guht4dts56je5       a
   └────┘     └────────┘    └─────────────┘   └─────┘
   prefix       time            random        version     total
   4 chars      8 chars         13 chars      1 char      26 chars
       └────────│────────────────│───────────┐  
                                             
                                             
             40 bits            64 bits      24 bits     128 bits
             5 bytes            8 bytes      3 bytes      16 bytes
             time               random       prefix+version

Binary layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            time_high                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    time_low   |                     random                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             random                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     random    |                  prefix_and_version           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Collision

Relative to ULID, the time precision is reduced from 48 to 40 bits (keeping the most significant bits, so overflow still won't occur until 10,889 AD), and the randomness reduced from 80 to 64 bits.

The timestamp precision at 40 bits is around 250 milliseconds. In order to have a 50% probability of collision with 64 bits of randomness, you would need to generate around 4 billion items per 250 millisecond window.

Python implementation

This aims to be maximally simple to convey the core working of the spec. The current Python implementation is entirely based on mdomke/python-ulid.

Installation

pip install upid

Usage

Run from the CLI:

python -m upid user

Use in a program:

from upid import upid
upid("user")

Or more explicitly:

from upid import UPID
UPID.from_prefix("user")

Or specifying your own timestamp or datetime

import time, datetime
UPID.from_prefix_and_milliseconds("user", milliseconds)
UPID.from_prefix_and_datetime("user", datetime.datetime.now())

From and to a string:

u = UPID.from_str("user_2accvpp5guht4dts56je5a")
u.to_str()        # user_2a...

Get stuff out:

u.prefix     # user
u.datetime   # 2024-07-07 ...

Convert to other formats:

int(u)       # 2079795568564925668398930358940603766
u.hex        # 01908dd6a3669b912738191ea3d61576
u.to_uuid()  # UUID('01908dd6-a366-9b91-2738-191ea3d61576')

Development

Code and tests are in the py/ directory. Using Rye for development (installation instructions at the link).

# can be run from the repo root
rye sync
rye run all  # or fmt/lint/check/test

If you just want to have a look around, pip should also work:

pip install -e .

Please open a PR if you spot a bug or improvement!

Rust implementation

The current Rust implementation is based on dylanhart/ulid-rs, but using the same lookup base32 lookup method as the Python implementation.

Installation

cargo add upid

Usage

use upid::Upid;
Upid::new("user");

Or specifying your own timestamp or datetime:

use std::time::SystemTime;
Upid::from_prefix_and_milliseconds("user", 1720366572288);
Upid::from_prefix_and_datetime("user", SystemTime::now());

From and to a string:

let u = Upid::from_string("user_2accvpp5guht4dts56je5a");
u.to_string();

Get stuff out:

u.prefix();       // user
u.datetime();     // 2024-07-07 ...
u.milliseconds(); // 17203...

Convert to other formats:

u.to_bytes();

Development

Code and tests are in the upid_rs/ directory.

cd upid_rs
cargo check  # or fmt/clippy/build/test/run

Please open a PR if you spot a bug or improvement!

Postgres extension

There is also a Postgres extension built on the Rust implementation, using pgrx and based on the very similar extension pksunkara/pgx_ulid.

Installation

The easiest would be to try out the Docker image carderne/postgres-upid:16, currently built for arm64 and amd64 but only for Postgres 16:

docker run -e POSTGRES_HOST_AUTH_METHOD=trust -p 5432:5432 carderne/postgres-upid:16

You can also grab a Linux .deb from the Releases page. This is built for Postgres 16 and amd64 only.

More architectures and versions will follow once it is out of alpha.

Usage

CREATE EXTENSION upid_pg;

CREATE TABLE users (
    id   upid NOT NULL DEFAULT gen_upid('user') PRIMARY KEY,
    name text NOT NULL
);

INSERT INTO users (name) VALUES('Bob');

SELECT * FROM users;
--              id              | name
-- -----------------------------+------
--  user_2accvpp5guht4dts56je5a | Bob

You can get the raw bytea data, or the prefix or timestamp:

SELECT upid_to_bytea(id) FROM users;
-- \x019...

SELECT upid_to_prefix(id) FROM users;
-- 'user'

SELECT upid_to_timestamp(id) FROM users;
-- 2024-07-07 ...

You can convert a UPID to a regular Postgres UUID:

SELECT upid_to_uuid(gen_upid('user'));

Or the reverse (although the prefix and timestamp will no longer make sense):

select upid_from_uuid(gen_random_uuid());

Development

If you want to install it into another Postgres, you'll install pgrx and follow its installation instructions. Something like this:

cd upid_pg
cargo install --locked cargo-pgrx
cargo pgrx init
cargo pgrx install

Some cargo commands work as normal:

cargo check  # or fmt/clippy

But building, testing and running must be done via pgrx. This will compile it into a Postgres installation, and allow an interactive session and tests there.

cargo pgrx test pg16
# or       run
# or       install

Related work

  • ULID: like UPID, but without the prefix
  • UUIDv7: like ULID, but an IETF standard and using standard hexadecimal UUID-style (long) string encoding
  • TypeID: a UUID with a prefix, except the prefix is separate from the 128-bit binary so must be added/stripped at some boundary (or everything stored as text)
  • cuid2: only text (can't store binary), not K-Sortable (deliberately), slower (deliberately) and more random
  • Nano ID: only text (can't store binary), only random, bigger alphabet (shorter string)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upid-0.3.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upid-0.3.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file upid-0.3.1.tar.gz.

File metadata

  • Download URL: upid-0.3.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for upid-0.3.1.tar.gz
Algorithm Hash digest
SHA256 aef346dae3a3c1c80613b7589627329286fc0d98234cb8a5c280a19ba5930e7a
MD5 13bbfefb6aeab8359b01e67639f06c02
BLAKE2b-256 fae7c0d6f5a8b0952a92c8be3d663f7b68efd8c285209167d8678b07194db5a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for upid-0.3.1.tar.gz:

Publisher: release.yml on carderne/upid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file upid-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: upid-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for upid-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 450db55f7c2815d7bc4d08c3337211f84501ad5619b0c8fe1815f9117a80e0cc
MD5 d4bb160f06fc79a45bd6e724e349e0bc
BLAKE2b-256 d7865424435876949cebe7c34be2cb04595d73b1a7f91a5ea6db8afdcf96f3a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for upid-0.3.1-py3-none-any.whl:

Publisher: release.yml on carderne/upid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page