Skip to main content

Composable graph tooling for analysis, construction, and refinement

Project description

GraphForge

PyPI version Monthly downloads Python versions Build status Coverage License

Composable graph tooling for analysis, construction, and refinement

A lightweight, embedded, openCypher-compatible graph engine for research and investigative workflows


Table of Contents


Why GraphForge?

We are not building a database for applications. We are building a graph execution environment for thinking.

Modern data science and ML workflows increasingly produce graph-shaped data — entity relationships extracted by LLMs, citation networks, dependency graphs, social connections, knowledge bases. Working with this data shouldn't require running a database server. GraphForge brings the full expressiveness of the openCypher query language to the Python notebook and script environment: zero configuration, single-file persistence, and first-class Python integration.

NetworkX GraphForge Neo4j / Memgraph
Setup pip install pip install Run a server
Query language Python API Full openCypher Full Cypher
Persistence Manual SQLite (automatic) Native
Notebook-friendly Requires connection
Graph size Millions up to ~20M edges† Billions
TCK compliance N/A 100% (3,885/3,885) ~100%

Use GraphForge for: knowledge graphs, citation networks, research workflows, LLM output storage, social network analysis in notebooks.

Use a production database for: high throughput, multi-user access, or graphs beyond the limits in Scale Limits.

Traversal queries with LIMIT scale to ~20M edges; full-scan aggregations are practical up to ~1M edges.

v0.3.9 — Performance Release

v0.3.9 delivers substantial performance improvements over v0.3.8: LALR(1) linear-time parsing, O(1) property equality index, LIMIT short-circuit for traversal and UNWIND, bulk ingestion API, SQLite PRAGMA tuning, and elementId(). TCK compliance is maintained at 3,885/3,885 (100%).

See CHANGELOG.md for the full list of changes.


Installation

pip install graphforge
# or
uv add graphforge

Requirements: Python 3.10–3.14

Core dependencies: pydantic>=2.6, lark>=1.1, msgpack>=1.0


Quick Start

In-memory graph

from graphforge import GraphForge

db = GraphForge()

# Create nodes and relationships
db.execute("""
    CREATE (alice:Person {name: 'Alice', age: 30})
    CREATE (bob:Person {name: 'Bob', age: 25})
    CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
""")

# Query the graph
results = db.execute("""
    MATCH (p:Person)-[:KNOWS]->(friend)
    WHERE p.age > 25
    RETURN p.name AS person, friend.name AS friend, p.age AS age
    ORDER BY p.age DESC
""")

for row in results:
    print(f"{row['person'].value} (age {row['age'].value}) knows {row['friend'].value}")

Persistent graph

# Save to SQLite
db = GraphForge("research.db")
db.execute("CREATE (:Paper {title: 'Graph Neural Networks', year: 2024})")
db.close()

# Reload later
db = GraphForge("research.db")
result = db.execute("MATCH (p:Paper) RETURN p.title AS t")
print(result[0]['t'].value)  # Graph Neural Networks

Python builder API

alice = db.create_node(['Person', 'Employee'], name='Alice', age=30)
bob = db.create_node(['Person'], name='Bob', age=25)
db.create_relationship(alice, bob, 'KNOWS', since=2020)

Access result values

Results contain CypherValue objects — use .value to get the Python value:

results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")

for row in results:
    name: str = row['name'].value
    age: int  = row['age'].value

Cypher Features

GraphForge implements the full openCypher language (100% TCK compliant as of v0.3.8).

Clauses

-- Reading
MATCH (n:Person)-[:KNOWS]->(friend)
OPTIONAL MATCH (n)-[:WORKS_AT]->(company)
WHERE n.age > 25
WITH n, count(friend) AS friends
RETURN n.name, friends
ORDER BY friends DESC
LIMIT 10

-- Writing
CREATE (n:Person {name: 'Alice'})
MERGE (n:Person {name: 'Alice'})
SET n.age = 30
REMOVE n.temp
DELETE n
DETACH DELETE n

-- Iteration
UNWIND [1, 2, 3] AS x
RETURN x * 2 AS doubled

-- Subqueries
MATCH (n) WHERE EXISTS { MATCH (n)-[:KNOWS]->() }
RETURN n

Patterns

(n)                                -- Any node
(n:Person)                         -- Node with label
(n:Person {age: 30})               -- Node with property
(a)-[r:KNOWS]->(b)                 -- Directed relationship
(a)-[r:KNOWS|LIKES]->(b)           -- Multiple types
(a)-[*1..3]->(b)                   -- Variable-length (1 to 3 hops)
(a)-[*]->(b)                       -- Any length
p = (a)-[*]->(b)                   -- Bind path to variable

Functions

Category Functions
String toLower, toUpper, trim, split, replace, substring, left, right, reverse, size
Math abs, ceil, floor, round, sqrt, pow, exp, log, sin, cos, tan, pi, e
List head, tail, last, range, size, reverse, sort, collect, reduce, filter, extract
Aggregation count, sum, avg, min, max, collect, stDev, percentileDisc
Predicate all, any, none, single, exists, isEmpty
Temporal date, datetime, localDatetime, time, localtime, duration, now
Spatial point, distance
Graph id, labels, type, keys, properties, nodes, relationships, startNode, endNode
Conversion toInteger, toFloat, toString, toBoolean, coalesce

Temporal types (full precision)

-- Dates, times, datetimes
RETURN date('2024-01-15')
RETURN datetime('2024-01-15T14:30:00[Europe/London]')  -- IANA timezone
RETURN duration('P1Y2M3DT4H5M6.789S')

-- Nanosecond precision
RETURN duration('PT0.000000789S').nanoseconds  -- 789

-- Extreme years (outside Python's 1-9999 range)
RETURN localdatetime('+999999999-12-31T23:59:59')

-- Arithmetic
RETURN date('2024-01-01') + duration('P1M')  -- 2024-02-01
RETURN duration.between(date('2020-01-01'), date('2024-01-01'))

Datasets

Load 100+ real-world graphs instantly:

from graphforge import GraphForge
from graphforge.datasets import load_dataset, list_datasets

db = GraphForge()

# Load any pre-registered dataset (auto-downloads and caches)
load_dataset(db, "snap-ego-facebook")   # Facebook ego networks (SNAP)
load_dataset(db, "ldbc-snb-sf0.1")      # Social network benchmark (LDBC)
load_dataset(db, "netrepo-karate")      # Karate club (NetworkRepository)

# Browse available datasets
for ds in list_datasets(source="snap")[:3]:
    print(f"{ds.name}: {ds.nodes:,} nodes, {ds.edges:,} edges")

# Analyze immediately
results = db.execute("""
    MATCH (n)-[r]->()
    RETURN n.id AS user, count(r) AS degree
    ORDER BY degree DESC LIMIT 5
""")

Available sources:

  • SNAP (Stanford): 95 social, web, email, citation, and collaboration networks
  • LDBC: 10 social network benchmark datasets with temporal data
  • NetworkRepository: 10 pre-registered datasets

Transactions

db = GraphForge("graph.db")

db.begin()
try:
    db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
    db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123})")
    db.commit()
except Exception:
    db.rollback()
    raise
finally:
    db.close()

Architecture

GraphForge is built in four independent layers:

┌─────────────────────────────────────────────────┐
│  Parser         cypher.lark + parser.py         │  Cypher → AST
├─────────────────────────────────────────────────┤
│  Planner        planner.py + operators.py       │  AST → Logical plan
├─────────────────────────────────────────────────┤
│  Executor       executor.py + evaluator.py      │  Plan → Results
├─────────────────────────────────────────────────┤
│  Storage        memory.py + sqlite_backend.py   │  In-memory + SQLite
└─────────────────────────────────────────────────┘

Storage uses MessagePack for efficient binary encoding of graph properties. Persistence is a single SQLite file with WAL mode for durability.


Development

# Install with dev dependencies
uv sync --dev

# Run all checks (mirrors CI)
make pre-push

# Run tests
uv run pytest tests/unit tests/integration
uv run pytest tests/tck/ -n auto   # Full TCK (3,885 scenarios)

# Coverage
make coverage

Roadmap

Version Focus Status
v0.3.8 Full TCK compliance (3,885/3,885) Released
v0.3.9 Performance: LALR parser, property indexes, bulk ingest, SQLite tuning, LIMIT short-circuit Released
v0.3.10 Analytics integration: NetworkX/igraph export, parse/plan cache Planned
v0.4.0 Native SNA algorithms: PageRank, betweenness, WCC, shortest path via CALL gf.algo.* Planned
v1.0 Production-ready: thread safety, large graph support Future

See CHANGELOG.md for full release history.


License

MIT © David Spencer — see LICENSE for details.

Built on Lark, Pydantic, MessagePack, and the openCypher specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphforge-0.3.10.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphforge-0.3.10-py3-none-any.whl (262.0 kB view details)

Uploaded Python 3

File details

Details for the file graphforge-0.3.10.tar.gz.

File metadata

  • Download URL: graphforge-0.3.10.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for graphforge-0.3.10.tar.gz
Algorithm Hash digest
SHA256 5a6e667665855d061c2648d902c909367d1214c15d233120045bcc39d17acafc
MD5 6e1083d2d862b8d4d02cf362ed2c9df6
BLAKE2b-256 769d99524dc368265c32a4d93cfd0c06b8255fb0f33a8e88743f498d5562056f

See more details on using hashes here.

File details

Details for the file graphforge-0.3.10-py3-none-any.whl.

File metadata

  • Download URL: graphforge-0.3.10-py3-none-any.whl
  • Upload date:
  • Size: 262.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for graphforge-0.3.10-py3-none-any.whl
Algorithm Hash digest
SHA256 e717ac07131b23df94194cfa58fbde34964e5a3543d5ba51276e2560c2211e87
MD5 d6c18885e28042e373f0e0837b85bf28
BLAKE2b-256 145cdd79acd42d7dbf3319c1bfa6cd51ba71abde3fcffa7a7704daf29cd17df4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page