Composable graph tooling for analysis, construction, and refinement
Project description
GraphForge
Composable graph tooling for analysis, construction, and refinement
A lightweight, embedded, openCypher-compatible graph engine for research and investigative workflows
Table of Contents
- Why GraphForge?
- Installation
- Quick Start
- Cypher Features
- Datasets
- Transactions
- Architecture
- Development
- Roadmap
- License
Why GraphForge?
Modern data science and ML workflows increasingly produce graph-shaped data — entity relationships extracted by LLMs, citation networks, dependency graphs, social connections, knowledge bases. Working with this data shouldn't require running a database server. GraphForge brings the full expressiveness of the openCypher query language to the Python notebook and script environment: zero configuration, single-file persistence, and first-class Python integration.
| NetworkX | GraphForge | Neo4j / Memgraph | |
|---|---|---|---|
| Setup | pip install |
pip install |
Run a server |
| Query language | Python API | Full openCypher | Full Cypher |
| Persistence | Manual | SQLite (automatic) | Native |
| Notebook-friendly | ✓ | ✓ | Requires connection |
| Graph size | Millions | < 10M nodes | Billions |
| TCK compliance | N/A | 100% (3,885/3,885) | ~100% |
Use GraphForge for: knowledge graphs, citation networks, research workflows, LLM output storage, social network analysis in notebooks.
Use a production database for: high throughput, multi-user access, graphs > 10M nodes.
We are not building a database for applications. We are building a graph execution environment for thinking.
v0.3.8 — Full TCK Compliance
As of v0.3.8, GraphForge passes all 3,885 openCypher TCK scenarios with zero failures and zero expected failures. This is the first embedded Python graph database to achieve complete openCypher TCK compliance.
Installation
pip install graphforge
# or
uv add graphforge
Requirements: Python 3.10–3.14
Core dependencies: pydantic>=2.6, lark>=1.1, msgpack>=1.0
Quick Start
In-memory graph
from graphforge import GraphForge
db = GraphForge()
# Create nodes and relationships
db.execute("""
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 25})
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
""")
# Query the graph
results = db.execute("""
MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.age > 25
RETURN p.name AS person, friend.name AS friend, p.age AS age
ORDER BY p.age DESC
""")
for row in results:
print(f"{row['person'].value} (age {row['age'].value}) knows {row['friend'].value}")
Persistent graph
# Save to SQLite
db = GraphForge("research.db")
db.execute("CREATE (:Paper {title: 'Graph Neural Networks', year: 2024})")
db.close()
# Reload later
db = GraphForge("research.db")
result = db.execute("MATCH (p:Paper) RETURN p.title AS t")
print(result[0]['t'].value) # Graph Neural Networks
Python builder API
alice = db.create_node(['Person', 'Employee'], name='Alice', age=30)
bob = db.create_node(['Person'], name='Bob', age=25)
db.create_relationship(alice, bob, 'KNOWS', since=2020)
Access result values
Results contain CypherValue objects — use .value to get the Python value:
results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")
for row in results:
name: str = row['name'].value
age: int = row['age'].value
Cypher Features
GraphForge implements the full openCypher language (100% TCK compliant as of v0.3.8).
Clauses
-- Reading
MATCH (n:Person)-[:KNOWS]->(friend)
OPTIONAL MATCH (n)-[:WORKS_AT]->(company)
WHERE n.age > 25
WITH n, count(friend) AS friends
RETURN n.name, friends
ORDER BY friends DESC
LIMIT 10
-- Writing
CREATE (n:Person {name: 'Alice'})
MERGE (n:Person {name: 'Alice'})
SET n.age = 30
REMOVE n.temp
DELETE n
DETACH DELETE n
-- Iteration
UNWIND [1, 2, 3] AS x
RETURN x * 2 AS doubled
-- Subqueries
MATCH (n) WHERE EXISTS { MATCH (n)-[:KNOWS]->() }
RETURN n
Patterns
(n) -- Any node
(n:Person) -- Node with label
(n:Person {age: 30}) -- Node with property
(a)-[r:KNOWS]->(b) -- Directed relationship
(a)-[r:KNOWS|LIKES]->(b) -- Multiple types
(a)-[*1..3]->(b) -- Variable-length (1 to 3 hops)
(a)-[*]->(b) -- Any length
p = (a)-[*]->(b) -- Bind path to variable
Functions
| Category | Functions |
|---|---|
| String | toLower, toUpper, trim, split, replace, substring, left, right, reverse, size |
| Math | abs, ceil, floor, round, sqrt, pow, exp, log, sin, cos, tan, pi, e |
| List | head, tail, last, range, size, reverse, sort, collect, reduce, filter, extract |
| Aggregation | count, sum, avg, min, max, collect, stDev, percentileDisc |
| Predicate | all, any, none, single, exists, isEmpty |
| Temporal | date, datetime, localDatetime, time, localtime, duration, now |
| Spatial | point, distance |
| Graph | id, labels, type, keys, properties, nodes, relationships, startNode, endNode |
| Conversion | toInteger, toFloat, toString, toBoolean, coalesce |
Temporal types (full precision)
-- Dates, times, datetimes
RETURN date('2024-01-15')
RETURN datetime('2024-01-15T14:30:00[Europe/London]') -- IANA timezone
RETURN duration('P1Y2M3DT4H5M6.789S')
-- Nanosecond precision
RETURN duration('PT0.000000789S').nanoseconds -- 789
-- Extreme years (outside Python's 1-9999 range)
RETURN localdatetime('+999999999-12-31T23:59:59')
-- Arithmetic
RETURN date('2024-01-01') + duration('P1M') -- 2024-02-01
RETURN duration.between(date('2020-01-01'), date('2024-01-01'))
Datasets
Load 100+ real-world graphs instantly:
from graphforge import GraphForge
from graphforge.datasets import load_dataset, list_datasets
db = GraphForge()
# Load any pre-registered dataset (auto-downloads and caches)
load_dataset(db, "snap-ego-facebook") # Facebook ego networks (SNAP)
load_dataset(db, "ldbc-snb-sf0.1") # Social network benchmark (LDBC)
load_dataset(db, "netrepo-karate") # Karate club (NetworkRepository)
# Load from URL directly
load_dataset(db, "https://nrvis.com/download/data/labeled/karate.zip")
# Browse available datasets
for ds in list_datasets(source="snap")[:3]:
print(f"{ds.name}: {ds.nodes:,} nodes, {ds.edges:,} edges")
# Analyze immediately
results = db.execute("""
MATCH (n)-[r]->()
RETURN n.id AS user, count(r) AS degree
ORDER BY degree DESC LIMIT 5
""")
Available sources:
- SNAP (Stanford): 95 social, web, email, citation, and collaboration networks
- LDBC: 10 social network benchmark datasets with temporal data
- NetworkRepository: 10 pre-registered + thousands via direct URL
Transactions
db = GraphForge("graph.db")
db.begin()
try:
db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123})")
db.commit()
except Exception:
db.rollback()
raise
finally:
db.close()
Architecture
GraphForge is built in four independent layers:
┌─────────────────────────────────────────────────┐
│ Parser cypher.lark + parser.py │ Cypher → AST
├─────────────────────────────────────────────────┤
│ Planner planner.py + operators.py │ AST → Logical plan
├─────────────────────────────────────────────────┤
│ Executor executor.py + evaluator.py │ Plan → Results
├─────────────────────────────────────────────────┤
│ Storage memory.py + sqlite_backend.py │ In-memory + SQLite
└─────────────────────────────────────────────────┘
Storage uses MessagePack for efficient binary encoding of graph properties. Persistence is a single SQLite file with WAL mode for durability.
Development
# Install with dev dependencies
uv sync --dev
# Run all checks (mirrors CI)
make pre-push
# Run tests
uv run pytest tests/unit tests/integration
uv run pytest tests/tck/ -n auto # Full TCK (3,885 scenarios)
# Coverage
make coverage
Roadmap
| Version | Focus | Status |
|---|---|---|
| v0.3.8 | Full TCK compliance (3,885/3,885) | Released |
| v0.3.9 | Performance: LALR parser, property indexes, SQLite tuning | Planned |
| v0.4.0 | Analytics: NetworkX/igraph integration, SNA algorithms | Planned |
| v1.0 | Production-ready: thread safety, large graph support | Future |
See CHANGELOG.md for full release history.
License
MIT © David Spencer — see LICENSE for details.
Built on Lark, Pydantic, MessagePack, and the openCypher specification.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graphforge-0.3.8.tar.gz.
File metadata
- Download URL: graphforge-0.3.8.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bc37f8b5ad9c6e18b498a64e0dbb12e149e2b967ba159967c33804eeeac8379
|
|
| MD5 |
e7e9804dfadaf4262934a09bc336499a
|
|
| BLAKE2b-256 |
e1488de7d5f0ba98947c4fecde81d0d2fb2986dc8efeab751835ae0bffc8d98f
|
File details
Details for the file graphforge-0.3.8-py3-none-any.whl.
File metadata
- Download URL: graphforge-0.3.8-py3-none-any.whl
- Upload date:
- Size: 238.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a6b985defae0874d6f350a800b84db1f84eae64563bca95411efca1920eb0f3
|
|
| MD5 |
bd13135e081f3c7399b284d0f3e0777e
|
|
| BLAKE2b-256 |
56cae1dce4ea1a2ed661beb2bbee86b74b0783e93f52fcae2f927756ced1d6b2
|