Skip to main content

A mutable, versioned property-graph store built on the GraphAr physical layout with ACID semantics delegated to LakeFS.

Project description

DeltaGraphAr

A mutable, versioned property-graph store built on the GraphAr physical layout (chunked Parquet + YAML metadata) with ACID semantics delegated to LakeFS.

Pure-Python reference implementation. Suitable for graph datasets that evolve over time and need repeatable reads at arbitrary historical snapshots.

What it does

  • Stores vertices and edges as chunked Parquet files following the GraphAr layout spec.
  • Appends edges to an unordered "delta" region; CSR-ordered adjacency is built on demand via compact().
  • Every mutating operation produces a versioned commit. Any commit ref can be used as a ref= argument to read historical state.
  • Vertices are identified by arbitrary string logical IDs; the ID map translates to contiguous physical chunk-aligned integers for storage.
  • LakeFS backend delegates branching, tagging, and atomic commits to a running LakeFS instance. The local backend (copy-on-commit) requires no external dependencies.

Install

pip install deltagraphar

Requires Python ≥ 3.10.

For development (includes pytest, hypothesis, pandas):

git clone https://github.com/nishankmahore/DeltaGraphAr.git
cd DeltaGraphAr
pip install -e ".[dev]"

Quickstart

python examples/quickstart.py

Or with LakeFS (requires docker compose up first):

docker compose up -d
python examples/ldbc_snb_tiny_loader.py

API

from deltagraphar.versioning.local_backend import LocalBackend
from deltagraphar.store.graphstore import GraphStore
from deltagraphar.format.schema import GraphInfo, VertexInfo, EdgeInfo

b = LocalBackend("/path/to/repo")
vi = VertexInfo(label="person", chunk_size=65_536)
ei = EdgeInfo("person", "knows", "person", chunk_size=1_048_576, src_chunk_size=65_536)
gi = GraphInfo(name="social", prefix="", vertex_infos=[vi], edge_infos=[ei])

gs = GraphStore.create(b, gi)
gs.add_vertices("person", [{"id": "alice"}, {"id": "bob"}])
gs.add_edges(("person", "knows", "person"), [{"src": "alice", "dst": "bob"}])
gs.compact(("person", "knows", "person"))

neighbors = gs.out_neighbors("person", "alice", ("person", "knows", "person"))
# → ["bob"]

# Time travel
ref = gs.snapshots()[1].ref
old_neighbors = gs.out_neighbors("person", "alice", ("person", "knows", "person"), ref=ref)

CLI

deltagraphar log --repo /path/to/repo
deltagraphar neighbors --repo /path/to/repo --label person --vertex alice --etype person,knows,person
deltagraphar compact --repo /path/to/repo --etype person,knows,person
deltagraphar tag --repo /path/to/repo v1

Schema evolution

Add a new property group to existing vertices without rewriting existing data:

from deltagraphar.format.schema import PropertyGroup, Property

pg = PropertyGroup([Property("score", "float64")], prefix="person_score")
gs.add_property_group("vertex:person", pg, {"alice": 0.9, "bob": 0.7})

Tests

pytest

51 tests, 2 skipped (LakeFS integration — requires docker compose up).

Benchmarks

python benchmarks/bench_v1.py --rows 10000 --queries 1000

Architecture

GraphStore
├── IDMap          — logical ↔ physical vertex ID, chunk-aligned Parquet
├── compaction.py  — delta→CSR merge, offset sweep, property reorder
└── VersioningBackend (ABC)
    ├── LocalBackend   — copy-on-commit snapshots, no external deps
    └── LakeFSBackend  — atomic commits, branching, tagging via LakeFS API

Physical layout (GraphAr spec)
  vertex/<label>/<pg_prefix>/chunk<k>         — vertex property tables
  vertex/<label>/__vid_map__/chunk<k>         — ID map
  edge/<src>_<et>_<dst>/ordered_by_source/    — CSR adj list + offsets
  edge/<src>_<et>_<dst>/unordered_by_source/  — delta (append-only per vchunk)

Data storage layout

Data is stored as chunked Parquet files under a local repo directory. Using the movie graph as an example (repo_dir = "/tmp/movies_repo"):

/tmp/movies_repo/
├── work/                                          ← current HEAD (mutable working copy)
│   ├── movies.graph.yml                           ← graph manifest
│   ├── Person.vertex.yml                          ← vertex schema
│   ├── Movie.vertex.yml
│   ├── vertex/
│   │   ├── Person/
│   │   │   ├── person_name/
│   │   │   │   └── chunk0                        ← name column (Parquet)
│   │   │   └── __vid_map__/
│   │   │       └── chunk0                        ← logical↔physical ID map
│   │   └── Movie/
│   │       └── movie_props/
│   │           └── chunk0                        ← title, released columns (Parquet)
│   └── edge/
│       └── Person_ACTED_IN_Movie/
│           ├── Person_ACTED_IN_Movie.edge.yml     ← edge schema
│           ├── ordered_by_source/                 ← CSR (written after compact)
│           │   ├── adj_list/
│           │   │   └── part0/chunk0              ← sorted src/dst pairs (Parquet)
│           │   └── offset/
│           │       └── part0/chunk0              ← CSR offset array (Parquet)
│           └── unordered_by_source/               ← delta (append-only, pre-compact)
│               └── adj_list/
│                   └── part0/chunk0              ← unsorted src/dst pairs (Parquet)
└── snapshots/
    ├── <sha1ref>/                                 ← immutable copy-on-commit snapshot
    ├── <sha1ref>/
    └── ...                                        ← one directory per commit

To persist data across runs, replace tempfile.TemporaryDirectory() with a fixed path:

repo_dir = "/tmp/movies_repo"
b = LocalBackend(repo_dir)

To inspect any chunk file directly:

import pyarrow.parquet as pq
pq.read_table("/tmp/movies_repo/work/vertex/Person/person_name/chunk0").to_pandas()

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltagraphar-0.1.0.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltagraphar-0.1.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file deltagraphar-0.1.0.tar.gz.

File metadata

  • Download URL: deltagraphar-0.1.0.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for deltagraphar-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7dcca02132484dd7b9d098bbeace8663278ec52d043cc4e014c46f391fdb39bb
MD5 0a082275465cad0c2c3274e3b98b490c
BLAKE2b-256 6a35c4e90eab878b454aa535d07bcaab59a23ead63bfb44174cf5828da861f7e

See more details on using hashes here.

File details

Details for the file deltagraphar-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: deltagraphar-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for deltagraphar-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 510e375fdfd59a5bd35106adb9a0a1675971e3da8a0ab566b24bc437e7ef63bb
MD5 80c7d0368bbea4325f805f95d4f8f12b
BLAKE2b-256 f4bf15ae01fbe8062694b0f86c544a2681d6471f2754b16be2bac5548bd5a777

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page