Graph data modeling and code generator for Postgres.

Project description

GrapDB

A tool for building graph structures on top of Postgres.

Goals

Scale out Postgres for use across many machines.
Do not sacrifice Postgres features on single DB shard.
- A single shard should be a fully featured Postgres instance.
- Data types & indexes
- ACID
Patterns for efficiently querying within shard and across shards.
Multi-shard ACID via two-phase commit.
[WIP] Automated cache layer.
No dependencies on unofficial Postgres features.
Autogenerate bindings for Python (SqlAlchemy) and Rust (Diesel).

Minimum Requirements

Python >= 3.8
Postgres >= 9.1 (ALTER TYPE... ADD VALUE... support)

Example

Create graph.toml:

[vertex.user]
cols = [
    ["username", "Text"],
    ["email", "Text"],
    ["is_subscriber", "Boolean"],
]
# WARNING: This design requires usernames to be immutable.
entry_point = "username"
col.email.nullable = true
col.is_subscriber.default = "false"

[enum.animal_type]
values = ["dog", "cat"]

[vertex.animal]
cols = [
    ["type", "enum:animal_type"],
    ["name", "Text"],
]

[edge.owner]
src = "user"
tgt = "animal"
cols = [
    ["type", "enum:animal_type"],  # Used by `Counter` example below
    ["adopted_on", "Timestamp"],
]

Generate SQL migration plan:

$ grapdb plan graph.toml

{
  "migration_plan": [
    "CREATE TYPE gid AS (\n  id BigInt,\n  vshard SmallInt\n);",
    "CREATE FUNCTION gid_pk_insert_trig() RETURNS trigger LANGUAGE plpgsql AS\n  $$BEGIN\n   IF (NEW.gid).id IS NULL THEN\n    NEW.gid = (nextval(TG_ARGV[0]), (NEW.gid).vshard);\n   END IF;\n   RETURN NEW;\nEND;$$;",
    "CREATE TYPE graph.animal_type AS ENUM ('dog', 'cat');",
    "CREATE TABLE graph.\"user\" (\n  \"gid\" gid NOT NULL,\n  \"data\" JSONB NOT NULL DEFAULT '{}',\n  PRIMARY KEY (gid),\n  CHECK ((gid).id IS NOT NULL AND (gid).vshard IS NOT NULL)\n);",
    "CREATE SEQUENCE graph.user_gid_id_seq OWNED BY graph.user.gid;",
    "CREATE TRIGGER user_insert_trigger\n  BEFORE INSERT ON graph.\"user\"\n  FOR EACH ROW\n  EXECUTE PROCEDURE\n    gid_pk_insert_trig(\"graph.user_gid_id_seq\");",
    "CREATE TABLE graph.\"animal\" (\n  \"gid\" gid NOT NULL,\n  \"data\" JSONB NOT NULL DEFAULT '{}',\n  PRIMARY KEY (gid),\n  CHECK ((gid).id IS NOT NULL AND (gid).vshard IS NOT NULL)\n);",
    "CREATE SEQUENCE graph.animal_gid_id_seq OWNED BY graph.animal.gid;",
    "CREATE TRIGGER animal_insert_trigger\n  BEFORE INSERT ON graph.\"animal\"\n  FOR EACH ROW\n  EXECUTE PROCEDURE\n    gid_pk_insert_trig(\"graph.animal_gid_id_seq\");",
    "ALTER TABLE graph.\"user\" ADD COLUMN \"username\" Text NOT NULL;",
    "ALTER TABLE graph.\"user\" ADD COLUMN \"email\" Text;",
    "ALTER TABLE graph.\"user\" ADD COLUMN \"is_subscriber\" Boolean NOT NULL DEFAULT false;",
    "ALTER TABLE graph.\"animal\" ADD COLUMN \"type\" graph.animal_type NOT NULL;",
    "ALTER TABLE graph.\"animal\" ADD COLUMN \"name\" Text NOT NULL;",
    "CREATE UNIQUE INDEX CONCURRENTLY user__entry_point ON graph.\"user\" USING btree (username);",
    "CREATE TABLE graph.\"owner\" (\n  \"src\" gid NOT NULL,\n  \"tgt\" gid NOT NULL,\n  \"data\" JSONB NOT NULL DEFAULT '{}',\n  PRIMARY KEY (src, tgt),\n  FOREIGN KEY (src) REFERENCES graph.\"user\" (gid),\n  CHECK ((src).id IS NOT NULL AND (src).vshard IS NOT NULL),\n  CHECK ((tgt).id IS NOT NULL AND (tgt).vshard IS NOT NULL)\n);",
    "ALTER TABLE graph.\"owner\" ADD COLUMN \"type\" graph.animal_type NOT NULL;",
    "ALTER TABLE graph.\"owner\" ADD COLUMN \"adopted_on\" Timestamp NOT NULL;"
  ],
  "drop_plan": [
    "DROP TABLE graph.\"owner\";",
    "DROP TABLE graph.\"animal\";",
    "DROP TABLE graph.\"user\";",
    "DROP TYPE graph.animal_type;",
    "DROP TYPE gid;"
  ]
}

Generate Python SQLAlchemy bindings:

grapdb sqlalchemy-orm graph.toml .

Concepts

True to its Graph DB aspirations, there are vertices and edges.

Vertex

Analogous to what would be stored in a traditional RDBMS table. A vertex is keyed by a Global ID (Gid) which is 64 bits of an auto-increment integer, and 16 bits of shard ID. A vertex can have any number of columns as well as a JSONB column data.

Look up

A vertex can be looked up by its Gid. Additionally, an entry_point can be specified which allows look up by one other text column (or composite type of text fields; or array of gids) with unique values. Internally, the hash of an entry_point fully determines its shard ID. For this reason, an entry_point column must be treated as immutable once set.

Colocation

A vertex V that extends another vertex W guarantees that each of its rows has a corresponding row with matching Gid in W. This necessarily guarantees that rows in V are colocated with their corresponding rows in W. In other words, they reside on the same shard.

Indices

Indices can be added to a vertex. Indices can specify columns with functions, indexing method, and a where clause. A query using an index will need to be run on all shards. A future optimization is to limit the query to one shard if one of the query-where-clauses bounds the query to a specific shard (e.g. vertex colocation reference).

Mapping to Postgres

A vertex maps to a regular table with a column of composite type for the Gid. An entry_point adds a unique index for the target column.

A data JSONB column is allocated automatically for extending the schema without requiring a migration.

Edge

A directional edge that connects two vertices together. Similar to a vertex but includes columns for source and target vertices.

Colocation

Edge entries are colocated with the source vertex they're associated with. Edges can have indexes which allow sorting and filtering on a per source-vertex basis.

Indices

Indices can be added to an edge. Indices can specify columns with functions, indexing method, and a where clause. An edge index is always key-ed by the src gid so queries will efficiently access a single shard.

Mapping to Postgres

An edge maps to a regular table with columns for the source and target vertex gids.

A data JSONB column is allocated automatically for extending the schema without requiring a migration.

Columns

Columns define the data that entities (vertices & edges) contain.

Available Data Types

Text
SmallInt
Integer
BigInt
Binary
Timestamp
Date
Vertex: ->vertex
Colocated vertex -)vertex
Enum: enum:name
Composite Types: ctype:name
Array: type[]

Vertex references and colocation

A column can reference a vertex using the -> notation. If the vertex is guaranteed to be colocated with the entity, use the -) notation, which will allow for efficient same-shard joins.

Mapping to Postgres

A straightforward mapping to database columns.

Enums

Enums limit the values for a given column to a finite set of strings. Enums are stored more efficiently than text.

[enum.animal_type]
values = [
    "dog",
    "cat"]
added_values = [
    "tiger",
]

Note the use of added_values when adding enum variants after the initial migration.

Mapping to Postgres

Enums map to Postgres enums. The use of added_values creates a migration plan that utilizes Postgres's ALTER TYPE... ADD VALUE... query which can be run on a live database as it does not require rewriting dependent tables.

Composite Types

Composite types enable the creation of a new type that is a tuple of types.

[ctype.point]
fields = [
    ["x", "Integer"],
    ["y", "Integer"],
]

Mapping to Postgres

Maps directly to Postgres's composite types feature.

Dropping

Entities and indices can be dropped by marking them as drop=true. An explicit drop annotation is preferred over deleting a definition because drop adds the appropriate queries to the migration plan.

drop is also useful in development environments to quickly purge and re-create entities as an entity definition is iterated.

Counters

A counter is defined on an edge type and tracks the cardinality of the set of edges matching specific column values (src is required).

[edge.owner.counter.animal_type_count]
cols = ["animal_type"]

Language Generators

Once the graph is specified in .toml, GrapDB can generate ORM mappings. Currently, only SqlAlchemy for Python3 is supported.

Migration Invariants

The general rule is that your graph config file can expand in definition, but nothing should ever be mutated or removed.

Use drop rather than deletion for entities and indices.
Do not change a column's data type.
Do not change a column's default.
Do not change a column's nullability.
A new column for an entity must have a default or be nullable.
Use added_values for expanding enums.

As long as these rules are followed, the migration_plan is idempotent and can be executed after every graph config change.

Client-Server Architecture

Outside of 2PC transaction, GrapDB does not add any layer between applications and Postgres servers. For this reason, clients are responsible for:

Mapping shard IDs to servers.
Mapping virtual shard IDs to shard IDs using jump consistent hashing.
Extracting the virtual shard ID from a gid.
Mapping entry points to virtual shard IDs using a SHA-1 hash.
Setting virtual shard IDs for new entities.

[WIP] pg-2pc-tm

GrapDB's Postgres 2PC Transaction Manager offers a barebones interface for applications to register a 2pc transaction.

A critical missing feature is its lack of deadlock detection.

Quirks

SQLAlchemy interaction with dropped entities

When an entity is marked as dropped, the associated Python definition is also eliminated. Naturally, references in Python code to the definition should be removed. However, it can be impossible to remove the definition when it has been used in a migration script. It's not clear whether the entity should be retained purely for legacy migration scripts, or whether the migration script should be removed once unnecessary.

SQLAlchemy versions

Using the --sa-version flag, the generated code can be made suitable for v1.3 (13), v1.4 (14), or v2.0 (20).

Project details

Release history Release notifications | RSS feed

This version

0.23.0

Jun 30, 2024

0.22.5

Feb 9, 2024

0.22.4

Feb 9, 2024

0.22.3

Feb 9, 2024

0.22.2

Feb 9, 2024

0.22.1

Feb 9, 2024

0.22.0

Feb 9, 2024

0.21.0

Feb 7, 2024

0.20.0

Feb 1, 2024

0.19.0

Dec 7, 2023

0.18.0

Sep 19, 2023

0.17.1

Aug 8, 2023

0.17

Aug 8, 2023

0.16

Jun 10, 2023

0.15

May 4, 2023

0.14

May 4, 2023

0.13

Mar 12, 2023

0.12

Nov 12, 2022

0.11

Dec 30, 2021

0.10

Oct 8, 2021

0.9

Oct 6, 2021

0.8

Oct 4, 2021

0.7

Aug 25, 2021

0.6

Aug 22, 2021

0.5

Aug 21, 2021

0.4

Aug 20, 2021

0.3

Jul 10, 2021

0.2

Feb 24, 2021

0.1

Feb 13, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grapdb-0.23.0.tar.gz (38.3 kB view details)

Uploaded Jun 30, 2024 Source

Built Distribution

grapdb-0.23.0-py3-none-any.whl (48.7 kB view details)

Uploaded Jun 30, 2024 Python 3

File details

Details for the file grapdb-0.23.0.tar.gz.

File metadata

Download URL: grapdb-0.23.0.tar.gz
Upload date: Jun 30, 2024
Size: 38.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.12.2 CPython/3.10.12

File hashes

Hashes for grapdb-0.23.0.tar.gz
Algorithm	Hash digest
SHA256	`b2f8a425ff7340de21102795e869ac20d640fd62a99b12b9f5cf8a2f1a921676`
MD5	`849934c3dc249c477dd5b8f5708f7533`
BLAKE2b-256	`78fe3dedeed7c86ca8606a6a853b16b44a289f2dc09a16dd07f528ed3c653ed3`

See more details on using hashes here.

File details

Details for the file grapdb-0.23.0-py3-none-any.whl.

File metadata

Download URL: grapdb-0.23.0-py3-none-any.whl
Upload date: Jun 30, 2024
Size: 48.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.12.2 CPython/3.10.12

File hashes

Hashes for grapdb-0.23.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2fd2aae3b5633da7d3913a88d5e6d2f6533f21b5a795cbb928a988f1a428d964`
MD5	`59a34e821100b713d34a1f3c11c25f67`
BLAKE2b-256	`3e7039eded7fda33200d6dce3e20f8458e00766b814ff72dc78f60009b924159`

See more details on using hashes here.

grapdb 0.23.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

GrapDB

Goals

Minimum Requirements

Example

Concepts

Vertex

Look up

Colocation

Indices

Mapping to Postgres

Edge

Colocation

Indices

Mapping to Postgres

Columns

Available Data Types

Vertex references and colocation

Mapping to Postgres

Enums

Mapping to Postgres

Composite Types

Mapping to Postgres

Dropping

Counters

Language Generators

Migration Invariants

Client-Server Architecture

[WIP] pg-2pc-tm

Quirks

SQLAlchemy interaction with dropped entities

SQLAlchemy versions

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes