Composable graph tooling for analysis, construction, and refinement

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

David.Spencer

These details have not been verified by PyPI

Project description

GraphForge

Composable graph tooling for analysis, construction, and refinement

A lightweight, embedded, openCypher-compatible graph engine for research and investigative workflows

Why GraphForge?
Installation
Quick Start
Cypher Features
Datasets
Transactions
Architecture
Development
Roadmap
License

Why GraphForge?

We are not building a database for applications. We are building a graph execution environment for thinking.

Modern data science and ML workflows increasingly produce graph-shaped data — entity relationships extracted by LLMs, citation networks, dependency graphs, social connections, knowledge bases. Working with this data shouldn't require running a database server. GraphForge brings the full expressiveness of the openCypher query language to the Python notebook and script environment: zero configuration, single-file persistence, and first-class Python integration.

	NetworkX	GraphForge	Neo4j / Memgraph
Setup	`pip install`	`pip install`	Run a server
Query language	Python API	Full openCypher	Full Cypher
Persistence	Manual	SQLite (automatic)	Native
Notebook-friendly	✓	✓	Requires connection
Graph size	Millions	up to ~20M edges†	Billions
TCK compliance	N/A	100% (3,885/3,885)	~100%

Use GraphForge for: knowledge graphs, citation networks, research workflows, LLM output storage, social network analysis in notebooks.

Use a production database for: high throughput, multi-user access, or graphs beyond the limits in Scale Limits.

† Traversal queries with LIMIT scale to ~20M edges; full-scan aggregations are practical up to ~1M edges.

v0.4.0 — Three-Surface API

v0.4.0 ships two new API surfaces alongside the existing Cypher executor:

db.gds — 8 compiled graph algorithms (PageRank, betweenness, Louvain, triangle count, and more) dispatched to igraph or NetworkX. Results write back to node properties and are immediately queryable via Cypher.
db.search — hybrid retrieval combining FTS5 text search and vector cosine similarity via RRF fusion. Returns SearchHit objects with score provenance; every result is addressable in db.execute().
graphforge.recipes — composable helper functions; neighbourhood() builds n-hop context for LLM prompts.

See CHANGELOG.md for the full list of changes.

Installation

pip install graphforge
# or
uv add graphforge

Requirements: Python 3.10–3.14

Core dependencies: pydantic>=2.6, lark>=1.1, msgpack>=1.0

Quick Start

In-memory graph

from graphforge import GraphForge

db = GraphForge()

# Create nodes and relationships
db.execute("""
    CREATE (alice:Person {name: 'Alice', age: 30})
    CREATE (bob:Person {name: 'Bob', age: 25})
    CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
""")

# Query the graph
results = db.execute("""
    MATCH (p:Person)-[:KNOWS]->(friend)
    WHERE p.age > 25
    RETURN p.name AS person, friend.name AS friend, p.age AS age
    ORDER BY p.age DESC
""")

for row in results:
    print(f"{row['person'].value} (age {row['age'].value}) knows {row['friend'].value}")

Persistent graph

# Save to SQLite
db = GraphForge("research.db")
db.execute("CREATE (:Paper {title: 'Graph Neural Networks', year: 2024})")
db.close()

# Reload later
db = GraphForge("research.db")
result = db.execute("MATCH (p:Paper) RETURN p.title AS t")
print(result[0]['t'].value)  # Graph Neural Networks

Python builder API

alice = db.create_node(['Person', 'Employee'], name='Alice', age=30)
bob = db.create_node(['Person'], name='Bob', age=25)
db.create_relationship(alice, bob, 'KNOWS', since=2020)

Graph algorithms

# Compute PageRank and write scores back to nodes
db.gds.pagerank(write_property="rank")

# Query the written scores via Cypher
top = db.execute("MATCH (n) RETURN n.name, n.rank ORDER BY n.rank DESC LIMIT 5")

# Stream mode — returns dict[node_id, score] without mutating the graph
bc = db.gds.betweenness_centrality()

Hybrid search

db = GraphForge("research.db")

# Index node text for full-text search
db.search.index_all(node_label="Paper", properties=["title", "abstract"])

# Store a precomputed embedding (bring your own model)
db.search.set_node_vector(node_id, embedding, space="text-embedding-3-small")

# Hybrid retrieval — text + vector signals fused via RRF
results = db.search("graph neural networks", vector=query_embedding, top_k=10)
for hit in results:
    print(hit.ref.properties["title"].value, hit.score, hit.sources)

Access result values

Results contain CypherValue objects — use .value to get the Python value:

results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")

for row in results:
    name: str = row['name'].value
    age: int  = row['age'].value

Cypher Features

GraphForge implements the full openCypher language (100% TCK compliant as of v0.3.8).

Clauses

-- Reading
MATCH (n:Person)-[:KNOWS]->(friend)
OPTIONAL MATCH (n)-[:WORKS_AT]->(company)
WHERE n.age > 25
WITH n, count(friend) AS friends
RETURN n.name, friends
ORDER BY friends DESC
LIMIT 10

-- Writing
CREATE (n:Person {name: 'Alice'})
MERGE (n:Person {name: 'Alice'})
SET n.age = 30
REMOVE n.temp
DELETE n
DETACH DELETE n

-- Iteration
UNWIND [1, 2, 3] AS x
RETURN x * 2 AS doubled

-- Subqueries
MATCH (n) WHERE EXISTS { MATCH (n)-[:KNOWS]->() }
RETURN n

Patterns

(n)                                -- Any node
(n:Person)                         -- Node with label
(n:Person {age: 30})               -- Node with property
(a)-[r:KNOWS]->(b)                 -- Directed relationship
(a)-[r:KNOWS|LIKES]->(b)           -- Multiple types
(a)-[*1..3]->(b)                   -- Variable-length (1 to 3 hops)
(a)-[*]->(b)                       -- Any length
p = (a)-[*]->(b)                   -- Bind path to variable

Functions

Category	Functions
String	`toLower`, `toUpper`, `trim`, `split`, `replace`, `substring`, `left`, `right`, `reverse`, `size`
Math	`abs`, `ceil`, `floor`, `round`, `sqrt`, `pow`, `exp`, `log`, `sin`, `cos`, `tan`, `pi`, `e`
List	`head`, `tail`, `last`, `range`, `size`, `reverse`, `sort`, `collect`, `reduce`, `filter`, `extract`
Aggregation	`count`, `sum`, `avg`, `min`, `max`, `collect`, `stDev`, `percentileDisc`
Predicate	`all`, `any`, `none`, `single`, `exists`, `isEmpty`
Temporal	`date`, `datetime`, `localDatetime`, `time`, `localtime`, `duration`, `now`
Spatial	`point`, `distance`
Graph	`id`, `labels`, `type`, `keys`, `properties`, `nodes`, `relationships`, `startNode`, `endNode`
Conversion	`toInteger`, `toFloat`, `toString`, `toBoolean`, `coalesce`

Temporal types (full precision)

-- Dates, times, datetimes
RETURN date('2024-01-15')
RETURN datetime('2024-01-15T14:30:00[Europe/London]')  -- IANA timezone
RETURN duration('P1Y2M3DT4H5M6.789S')

-- Nanosecond precision
RETURN duration('PT0.000000789S').nanoseconds  -- 789

-- Extreme years (outside Python's 1-9999 range)
RETURN localdatetime('+999999999-12-31T23:59:59')

-- Arithmetic
RETURN date('2024-01-01') + duration('P1M')  -- 2024-02-01
RETURN duration.between(date('2020-01-01'), date('2024-01-01'))

Datasets

Load 100+ real-world graphs instantly:

from graphforge import GraphForge
from graphforge.datasets import load_dataset, list_datasets

db = GraphForge()

# Load any pre-registered dataset (auto-downloads and caches)
load_dataset(db, "snap-ego-facebook")   # Facebook ego networks (SNAP)
load_dataset(db, "ldbc-snb-sf0.1")      # Social network benchmark (LDBC)
load_dataset(db, "netrepo-karate")      # Karate club (NetworkRepository)

# Browse available datasets
for ds in list_datasets(source="snap")[:3]:
    print(f"{ds.name}: {ds.nodes:,} nodes, {ds.edges:,} edges")

# Analyze immediately
results = db.execute("""
    MATCH (n)-[r]->()
    RETURN n.id AS user, count(r) AS degree
    ORDER BY degree DESC LIMIT 5
""")

Available sources:

SNAP (Stanford): 95 social, web, email, citation, and collaboration networks
LDBC: 10 social network benchmark datasets with temporal data
NetworkRepository: 10 pre-registered datasets

Transactions

db = GraphForge("graph.db")

db.begin()
try:
    db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
    db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123})")
    db.commit()
except Exception:
    db.rollback()
    raise
finally:
    db.close()

Architecture

GraphForge exposes three independent API surfaces over a shared storage layer:

db.execute("MATCH ...")    →  Cypher path   (Parser → Planner → Executor → Storage)
db.gds.pagerank(...)       →  Algorithm path (export → compiled backend → write-back)
db.search.fts(...)         →  Search path   (SQLite FTS5 / vector index → NodeRef list)

The Cypher path is four independent layers:

┌─────────────────────────────────────────────────┐
│  Parser         cypher.lark + parser.py         │  Cypher → AST
├─────────────────────────────────────────────────┤
│  Planner        planner.py + operators.py       │  AST → Logical plan
├─────────────────────────────────────────────────┤
│  Executor       executor.py + evaluator.py      │  Plan → Results
├─────────────────────────────────────────────────┤
│  Storage        memory.py + sqlite_backend.py   │  In-memory + SQLite WAL
└─────────────────────────────────────────────────┘

The algorithm and search paths bypass the Cypher executor entirely — db.gds and db.search are Python-method surfaces, not Cypher extensions. Storage uses MessagePack for efficient binary encoding of graph properties.

Development

# Install with dev dependencies
uv sync --dev

# Run all checks (mirrors CI)
make pre-push

# Run tests
uv run pytest tests/unit tests/integration
uv run pytest tests/tck/ -n auto   # Full TCK (3,885 scenarios)

# Coverage
make coverage

Roadmap

Version	Focus	Status
v0.3.8	Full TCK compliance (3,885/3,885)	Released
v0.3.9	Performance: LALR parser, property indexes, bulk ingest, SQLite tuning, LIMIT short-circuit	Released
v0.3.10	Analytics integration: NetworkX/igraph export, parse/plan cache, `add_graph_documents()`	Released
v0.4.0	Three-surface API: `db.gds.` graph algorithms + `db.search.` hybrid retrieval	Released

See CHANGELOG.md for full release history.

License

MIT © David Spencer — see LICENSE for details.

Built on Lark, Pydantic, MessagePack, and the openCypher specification.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

David.Spencer

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

May 7, 2026

0.3.10

May 6, 2026

0.3.9

May 4, 2026

0.3.8

May 3, 2026

0.3.7

Apr 7, 2026

0.3.6

Mar 1, 2026

0.3.5

Feb 19, 2026

0.3.4

Feb 19, 2026

0.3.3

Feb 18, 2026

0.3.2

Feb 18, 2026

0.3.1

Feb 18, 2026

0.3.0

Feb 9, 2026

0.2.1

Feb 4, 2026

0.2.0

Feb 3, 2026

0.1.4

Feb 2, 2026

0.1.2

Feb 1, 2026

0.1.1

Jan 30, 2026

0.1.0

Jan 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphforge-0.4.0.tar.gz (1.4 MB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

graphforge-0.4.0-py3-none-any.whl (278.8 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file graphforge-0.4.0.tar.gz.

File metadata

Download URL: graphforge-0.4.0.tar.gz
Upload date: May 7, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for graphforge-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`cac9542b08b89285454bc03fccc0c17a949edaf2c67ca1124e6c4fc859f568ca`
MD5	`003a54b316e261c9a9bb89b4ace5ee1d`
BLAKE2b-256	`0e7606270f1a49c484274b547ea74004934cacc5e961184be6ae2b30094619a3`

See more details on using hashes here.

File details

Details for the file graphforge-0.4.0-py3-none-any.whl.

File metadata

Download URL: graphforge-0.4.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 278.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for graphforge-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f5342cc178822bb8ecd2be301f4ac02b449699a1b43b6a95be879b0ba567212`
MD5	`0d58fe25eb8e1baf8a63c31b69054199`
BLAKE2b-256	`ec882d766ccb2afb8fafcd5b835cfd66bc1c46313f8b98c502846cc655d1075c`

See more details on using hashes here.

graphforge 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

GraphForge

Table of Contents

Why GraphForge?

v0.4.0 — Three-Surface API

Installation

Quick Start

In-memory graph

Persistent graph

Python builder API

Graph algorithms

Hybrid search

Access result values

Cypher Features

Clauses

Patterns

Functions

Temporal types (full precision)

Datasets

Transactions

Architecture

Development

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes