Skip to main content

A mutable, versioned property-graph store built on the GraphAr physical layout with ACID semantics delegated to LakeFS.

Project description

DeltaGraphAr

A mutable, versioned property-graph store built on the GraphAr physical layout (chunked Parquet + YAML metadata) with ACID semantics delegated to LakeFS.

Pure-Python reference implementation. Suitable for graph datasets that evolve over time and need repeatable reads at arbitrary historical snapshots.

What it does

  • Stores vertices and edges as chunked Parquet files following the GraphAr layout spec.
  • Appends edges to an unordered "delta" region; CSR-ordered adjacency is built on demand via compact().
  • Every mutating operation produces a versioned commit. Any commit ref can be used as a ref= argument to read historical state.
  • Vertices are identified by arbitrary string logical IDs; the ID map translates to contiguous physical chunk-aligned integers for storage.
  • LakeFS backend delegates branching, tagging, and atomic commits to a running LakeFS instance. The local backend (copy-on-commit) requires no external dependencies.

Install

pip install deltagraphar

Requires Python ≥ 3.10.

For development (includes pytest, hypothesis, pandas):

git clone https://github.com/nishankmahore/DeltaGraphAr.git
cd DeltaGraphAr
pip install -e ".[dev]"

Quickstart

python examples/quickstart.py

Or with LakeFS (requires docker compose up first):

docker compose up -d
python examples/ldbc_snb_tiny_loader.py

API

from deltagraphar.versioning.local_backend import LocalBackend
from deltagraphar.store.graphstore import GraphStore
from deltagraphar.format.schema import GraphInfo, VertexInfo, EdgeInfo

b = LocalBackend("/path/to/repo")
vi = VertexInfo(label="person", chunk_size=65_536)
ei = EdgeInfo("person", "knows", "person", chunk_size=1_048_576, src_chunk_size=65_536)
gi = GraphInfo(name="social", prefix="", vertex_infos=[vi], edge_infos=[ei])

gs = GraphStore.create(b, gi)
gs.add_vertices("person", [{"id": "alice"}, {"id": "bob"}])
gs.add_edges(("person", "knows", "person"), [{"src": "alice", "dst": "bob"}])
gs.compact(("person", "knows", "person"))

neighbors = gs.out_neighbors("person", "alice", ("person", "knows", "person"))
# → ["bob"]

# Time travel
ref = gs.snapshots()[1].ref
old_neighbors = gs.out_neighbors("person", "alice", ("person", "knows", "person"), ref=ref)

CLI

deltagraphar log --repo /path/to/repo
deltagraphar neighbors --repo /path/to/repo --label person --vertex alice --etype person,knows,person
deltagraphar compact --repo /path/to/repo --etype person,knows,person
deltagraphar tag --repo /path/to/repo v1

Schema evolution

Add a new property group to existing vertices without rewriting existing data:

from deltagraphar.format.schema import PropertyGroup, Property

pg = PropertyGroup([Property("score", "float64")], prefix="person_score")
gs.add_property_group("vertex:person", pg, {"alice": 0.9, "bob": 0.7})

Tests

pytest

51 tests, 2 skipped (LakeFS integration — requires docker compose up).

Benchmarks

python benchmarks/bench_v1.py --rows 10000 --queries 1000

Architecture

GraphStore
├── IDMap          — logical ↔ physical vertex ID, chunk-aligned Parquet
├── compaction.py  — delta→CSR merge, offset sweep, property reorder
└── VersioningBackend (ABC)
    ├── LocalBackend   — copy-on-commit snapshots, no external deps
    └── LakeFSBackend  — atomic commits, branching, tagging via LakeFS API

Physical layout (GraphAr spec)
  vertex/<label>/<pg_prefix>/chunk<k>         — vertex property tables
  vertex/<label>/__vid_map__/chunk<k>         — ID map
  edge/<src>_<et>_<dst>/ordered_by_source/    — CSR adj list + offsets
  edge/<src>_<et>_<dst>/unordered_by_source/  — delta (append-only per vchunk)

Data storage layout

Data is stored as chunked Parquet files under a local repo directory. Using the movie graph as an example (repo_dir = "/tmp/movies_repo"):

/tmp/movies_repo/
├── work/                                          ← current HEAD (mutable working copy)
│   ├── movies.graph.yml                           ← graph manifest
│   ├── Person.vertex.yml                          ← vertex schema
│   ├── Movie.vertex.yml
│   ├── vertex/
│   │   ├── Person/
│   │   │   ├── person_name/
│   │   │   │   └── chunk0                        ← name column (Parquet)
│   │   │   └── __vid_map__/
│   │   │       └── chunk0                        ← logical↔physical ID map
│   │   └── Movie/
│   │       └── movie_props/
│   │           └── chunk0                        ← title, released columns (Parquet)
│   └── edge/
│       └── Person_ACTED_IN_Movie/
│           ├── Person_ACTED_IN_Movie.edge.yml     ← edge schema
│           ├── ordered_by_source/                 ← CSR (written after compact)
│           │   ├── adj_list/
│           │   │   └── part0/chunk0              ← sorted src/dst pairs (Parquet)
│           │   └── offset/
│           │       └── part0/chunk0              ← CSR offset array (Parquet)
│           └── unordered_by_source/               ← delta (append-only, pre-compact)
│               └── adj_list/
│                   └── part0/chunk0              ← unsorted src/dst pairs (Parquet)
└── snapshots/
    ├── <sha1ref>/                                 ← immutable copy-on-commit snapshot
    ├── <sha1ref>/
    └── ...                                        ← one directory per commit

To persist data across runs, replace tempfile.TemporaryDirectory() with a fixed path:

repo_dir = "/tmp/movies_repo"
b = LocalBackend(repo_dir)

To inspect any chunk file directly:

import pyarrow.parquet as pq
pq.read_table("/tmp/movies_repo/work/vertex/Person/person_name/chunk0").to_pandas()

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltagraphar-0.1.1.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltagraphar-0.1.1-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file deltagraphar-0.1.1.tar.gz.

File metadata

  • Download URL: deltagraphar-0.1.1.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deltagraphar-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0f88f6cb16d767cd8e44447a95a3bab9f865bf69041019d71e9041196c7a0405
MD5 d449604510b656846d63299db20ade4c
BLAKE2b-256 22f1f439ce0533ab6b9e42552b076108843bef05943a0a035515d8a69d2740ab

See more details on using hashes here.

File details

Details for the file deltagraphar-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: deltagraphar-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deltagraphar-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 384c5ca1aaf6c4ae5a9ab1eb10ef28fba533d5f859baa5a06ae51139966f93c3
MD5 368938052e4c11d65d6bf9b1262d4f72
BLAKE2b-256 d4e4d1235b03ff5256a44677041f5a2c2a129c02b22fac66a8be0b6986326bd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page