Composable graph tooling for analysis, construction, and refinement
Project description
GraphForge
Composable graph tooling for analysis, construction, and refinement
A lightweight, embedded, openCypher-compatible graph engine for research and investigative workflows
Table of Contents
- Why GraphForge?
- Installation
- Quick Start
- Core Concepts
- Python API Reference
- Cypher Query Language
- Usage Patterns
- Examples
- Advanced Features
- Design Principles
- Contributing
- License
Why GraphForge?
Modern data science and ML workflows increasingly produce graph-shaped data—entities and relationships extracted from text, tables, and LLM outputs. Yet practitioners face a painful choice:
| NetworkX | GraphForge | Production DBs (Neo4j, Memgraph) | |
|---|---|---|---|
| Durability | Manual serialization | ✓ SQLite backend | ✓ Persistent |
| Query language | None | openCypher subset | Full Cypher |
| Operational overhead | Minimal | Minimal (embedded) | High (services, config) |
| Notebook-friendly | ✓ | ✓ | ✗ |
| Iterative analysis | ✓ | ✓ | Poor |
GraphForge fills the gap—embedded, durable, and declarative—without running external services.
We are not building a database for applications. We are building a graph execution environment for thinking.
Use Cases
Knowledge Graph Construction
- Extract entities and relationships from unstructured text
- Build and query knowledge graphs from documents
- Iteratively refine graph structures during analysis
Data Lineage and Provenance
- Track data transformations and dependencies
- Query upstream and downstream impacts
- Maintain audit trails of analytical workflows
Network Analysis in Notebooks
- Analyze social networks, citation graphs, dependency graphs
- Persist analysis results alongside code
- Share reproducible graph analyses
LLM-Powered Graph Generation
- Store LLM-extracted entities and relationships
- Query structured outputs from language models
- Build hybrid retrieval systems with graph context
Installation
# Using uv (recommended)
uv add graphforge
# Using pip
pip install graphforge
Requirements: Python 3.10+
Dependencies: pydantic>=2.6, lark>=1.1, msgpack>=1.0
Quick Start
5-Minute Introduction
from graphforge import GraphForge
# Create an in-memory graph
db = GraphForge()
# Option 1: Python API (imperative)
alice = db.create_node(['Person'], name='Alice', age=30)
bob = db.create_node(['Person'], name='Bob', age=25)
db.create_relationship(alice, bob, 'KNOWS', since=2020)
# Option 2: Cypher queries (declarative)
db.execute("CREATE (c:Person {name: 'Charlie', age: 35})")
db.execute("MATCH (a:Person {name: 'Alice'}), (c:Person {name: 'Charlie'}) CREATE (a)-[:KNOWS]->(c)")
# Query the graph
results = db.execute("""
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.age > 25
RETURN p.name AS person, friend.name AS friend
ORDER BY p.age DESC
""")
for row in results:
print(f"{row['person'].value} knows {row['friend'].value}")
# Output:
# Charlie knows Alice
# Alice knows Bob
# Alice knows Charlie
Persistent Graphs
# Create a persistent graph
db = GraphForge("my-research.db")
# Add data (persists automatically on close)
db.execute("CREATE (p:Paper {title: 'Graph Neural Networks', year: 2021})")
db.close()
# Later: reload the same graph
db = GraphForge("my-research.db")
results = db.execute("MATCH (p:Paper) RETURN p.title AS title")
print(results[0]['title'].value) # Graph Neural Networks
Core Concepts
Nodes and Relationships
Nodes represent entities with:
- Labels: Categories like
Person,Document,Gene - Properties: Key-value attributes (strings, integers, booleans, lists, maps)
- IDs: Auto-generated unique identifiers
Relationships connect nodes with:
- Type: Semantic connection like
KNOWS,CITES,REGULATES - Direction: From source node to destination node
- Properties: Attributes on the relationship itself
# Python API
alice = db.create_node(['Person', 'Employee'],
name='Alice',
age=30,
skills=['Python', 'ML'])
bob = db.create_node(['Person'], name='Bob', age=25)
knows = db.create_relationship(alice, bob, 'KNOWS',
since=2020,
strength='strong')
# Cypher equivalent
db.execute("""
CREATE (a:Person:Employee {name: 'Alice', age: 30, skills: ['Python', 'ML']})
CREATE (b:Person {name: 'Bob', age: 25})
CREATE (a)-[:KNOWS {since: 2020, strength: 'strong'}]->(b)
""")
Graph Patterns
GraphForge uses graph patterns for both matching and creating:
(n:Person) # Node with label
(n:Person {age: 30}) # Node with properties
(a)-[r:KNOWS]->(b) # Directed relationship
(a)-[r:KNOWS]-(b) # Undirected relationship
(a)-[:KNOWS|LIKES]->(b) # Multiple relationship types
Python API Reference
GraphForge Class
__init__(path: str | Path | None = None)
Initialize a GraphForge instance.
Parameters:
path(optional): Path to SQLite database file. IfNone, uses in-memory storage.
Example:
# In-memory (data lost on exit)
db = GraphForge()
# Persistent (data saved to disk)
db = GraphForge("graphs/social-network.db")
create_node(labels: list[str] | None = None, **properties) -> NodeRef
Create a node with labels and properties.
Parameters:
labels: List of label strings (e.g.,['Person', 'Employee'])**properties: Property key-value pairs (str, int, float, bool, None, list, dict)
Returns: NodeRef for the created node
Example:
alice = db.create_node(
['Person', 'Employee'],
name='Alice',
age=30,
active=True,
skills=['Python', 'SQL'],
metadata={'department': 'Engineering'}
)
create_relationship(src: NodeRef, dst: NodeRef, rel_type: str, **properties) -> EdgeRef
Create a directed relationship between two nodes.
Parameters:
src: Source node (NodeRef)dst: Destination node (NodeRef)rel_type: Relationship type string (e.g.,'KNOWS','WORKS_AT')**properties: Property key-value pairs
Returns: EdgeRef for the created relationship
Example:
alice = db.create_node(['Person'], name='Alice')
company = db.create_node(['Company'], name='Acme Corp')
works_at = db.create_relationship(
alice,
company,
'WORKS_AT',
since=2020,
role='Engineer'
)
execute(query: str) -> list[dict]
Execute an openCypher query.
Parameters:
query: openCypher query string
Returns: List of result rows as dictionaries
Example:
results = db.execute("""
MATCH (p:Person)-[r:KNOWS]->(friend:Person)
WHERE p.age > 25
RETURN p.name AS person, count(friend) AS friend_count
ORDER BY friend_count DESC
LIMIT 10
""")
for row in results:
print(f"{row['person'].value}: {row['friend_count'].value} friends")
begin()
Start an explicit transaction.
Example:
db.begin()
db.execute("CREATE (n:Person {name: 'Alice'})")
db.commit() # or db.rollback()
commit()
Commit the current transaction. Saves changes to disk if using persistence.
Raises: RuntimeError if not in a transaction
rollback()
Roll back the current transaction. Reverts all changes made since begin().
Raises: RuntimeError if not in a transaction
close()
Save graph and close database. Safe to call multiple times.
Example:
db = GraphForge("my-graph.db")
# ... make changes ...
db.close() # Saves to disk
Accessing Result Values
Query results contain CypherValue objects. Access the underlying Python value with .value:
results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")
for row in results:
name = row['name'].value # str
age = row['age'].value # int
print(f"{name} is {age} years old")
Supported Value Types:
CypherString: PythonstrCypherInt: PythonintCypherFloat: PythonfloatCypherBool: PythonboolCypherNull: PythonNoneCypherList: Pythonlist(nested CypherValues)CypherMap: Pythondict(string keys, CypherValue values)
Cypher Query Language
GraphForge supports a subset of openCypher for declarative graph queries and mutations.
MATCH - Pattern Matching
Find nodes and relationships matching a pattern.
-- Match all nodes
MATCH (n)
RETURN n
-- Match nodes by label
MATCH (p:Person)
RETURN p.name
-- Match with multiple labels
MATCH (p:Person:Employee)
RETURN p
-- Match relationships
MATCH (a:Person)-[r:KNOWS]->(b:Person)
RETURN a.name, b.name, r.since
-- Match specific direction
MATCH (a)-[:FOLLOWS]->(b) -- Outgoing
MATCH (a)<-[:FOLLOWS]-(b) -- Incoming
MATCH (a)-[:FOLLOWS]-(b) -- Either direction
-- Multiple relationship types
MATCH (a)-[r:KNOWS|LIKES]->(b)
RETURN type(r), a.name, b.name
WHERE - Filtering
Filter matched patterns with predicates.
-- Property comparisons
MATCH (p:Person)
WHERE p.age > 30
RETURN p.name
-- Logical operators
MATCH (p:Person)
WHERE p.age > 25 AND p.city = 'NYC'
RETURN p.name
MATCH (p:Person)
WHERE p.age < 30 OR p.active = true
RETURN p.name
-- Property existence (returns false for null)
MATCH (p:Person)
WHERE p.email <> null
RETURN p.name
RETURN - Projection
Select and transform query results.
-- Return specific properties
MATCH (p:Person)
RETURN p.name, p.age
-- With aliases
MATCH (p:Person)
RETURN p.name AS person_name, p.age AS person_age
-- Return entire nodes/relationships
MATCH (p:Person)-[r:KNOWS]->(friend)
RETURN p, r, friend
CREATE - Graph Construction
Create new nodes and relationships.
-- Create single node
CREATE (n:Person {name: 'Alice', age: 30})
-- Create multiple nodes
CREATE (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
-- Create nodes with relationship
CREATE (a:Person {name: 'Alice'})-[r:KNOWS {since: 2020}]->(b:Person {name: 'Bob'})
-- Create with RETURN
CREATE (n:Person {name: 'Alice'})
RETURN n.name AS name
SET - Update Properties
Update properties on existing nodes and relationships.
-- Update single property
MATCH (p:Person {name: 'Alice'})
SET p.age = 31
-- Update multiple properties
MATCH (p:Person {name: 'Alice'})
SET p.age = 31, p.city = 'NYC', p.active = true
-- Update relationship properties
MATCH (a)-[r:KNOWS]->(b)
WHERE a.name = 'Alice'
SET r.strength = 'strong'
DELETE - Remove Elements
Delete nodes and relationships.
-- Delete specific node (and its relationships)
MATCH (p:Person {name: 'Alice'})
DELETE p
-- Delete relationship only
MATCH (a)-[r:KNOWS]->(b)
WHERE a.name = 'Alice' AND b.name = 'Bob'
DELETE r
-- Delete multiple elements
MATCH (a)-[r:KNOWS]->(b)
WHERE b.name = 'Bob'
DELETE r, b
MERGE - Idempotent Creation
Create nodes if they don't exist, or match existing ones.
-- Create or match
MERGE (p:Person {name: 'Alice'})
-- Always matches same node (idempotent)
MERGE (p:Person {name: 'Alice', age: 30})
MERGE (p:Person {name: 'Alice', age: 30})
-- Results in only 1 node
-- With RETURN
MERGE (p:Person {name: 'Alice'})
RETURN p.name
ORDER BY - Sorting
Sort query results.
-- Ascending (default)
MATCH (p:Person)
RETURN p.name, p.age
ORDER BY p.age
-- Descending
MATCH (p:Person)
RETURN p.name, p.age
ORDER BY p.age DESC
-- Multiple sort keys
MATCH (p:Person)
RETURN p.name, p.age, p.city
ORDER BY p.city ASC, p.age DESC
LIMIT and SKIP - Pagination
Limit and paginate results.
-- Get first 10 results
MATCH (p:Person)
RETURN p.name
ORDER BY p.name
LIMIT 10
-- Skip first 20, return next 10
MATCH (p:Person)
RETURN p.name
ORDER BY p.name
SKIP 20
LIMIT 10
Aggregations
Compute aggregate functions over groups.
-- Count all
MATCH (p:Person)
RETURN count(*) AS total
-- Count with grouping
MATCH (p:Person)
RETURN p.city, count(*) AS population
ORDER BY population DESC
-- Multiple aggregations
MATCH (p:Person)
RETURN
count(*) AS total,
sum(p.age) AS total_age,
avg(p.age) AS avg_age,
min(p.age) AS youngest,
max(p.age) AS oldest
-- Aggregation with WHERE
MATCH (p:Person)
WHERE p.active = true
RETURN p.department, count(*) AS active_count
Supported Functions:
count(*)- Count all rowscount(expr)- Count non-null valuessum(expr)- Sum numeric valuesavg(expr)- Average of numeric valuesmin(expr)- Minimum valuemax(expr)- Maximum value
Usage Patterns
Pattern 1: Exploratory Analysis
Use in-memory graphs for quick exploration, then persist interesting results.
# Start with in-memory for speed
db = GraphForge()
# Load and explore data
db.execute("CREATE (:Author {name: 'Alice', h_index: 42})")
db.execute("CREATE (:Author {name: 'Bob', h_index: 38})")
# ... load more data ...
# Explore interactively
results = db.execute("""
MATCH (a:Author)
WHERE a.h_index > 40
RETURN a.name, a.h_index
ORDER BY a.h_index DESC
""")
# If analysis is valuable, save it
if len(results) > 0:
db_persistent = GraphForge("high-impact-authors.db")
# Copy relevant subgraph...
db_persistent.close()
Pattern 2: Incremental Construction
Build graphs incrementally across sessions.
# Session 1: Initial data
db = GraphForge("knowledge-graph.db")
db.execute("CREATE (:Concept {name: 'Machine Learning'})")
db.close()
# Session 2: Add related concepts
db = GraphForge("knowledge-graph.db")
db.execute("""
MATCH (ml:Concept {name: 'Machine Learning'})
CREATE (dl:Concept {name: 'Deep Learning'})
CREATE (ml)-[:SPECIALIZES_TO]->(dl)
""")
db.close()
# Session 3: Add more relationships
db = GraphForge("knowledge-graph.db")
db.execute("""
MATCH (dl:Concept {name: 'Deep Learning'})
CREATE (cv:Concept {name: 'Computer Vision'})
CREATE (dl)-[:APPLIED_IN]->(cv)
""")
db.close()
Pattern 3: Transactional Updates
Use transactions for atomic updates.
db = GraphForge("production-graph.db")
try:
db.begin()
# Update multiple related entities
db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
db.execute("MATCH (p:Person {id: 123})-[r:WORKS_AT]->() DELETE r")
db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123, timestamp: 1234567890})")
db.commit()
except Exception as e:
db.rollback()
print(f"Transaction failed: {e}")
finally:
db.close()
Pattern 4: ETL Pipelines
Extract, transform, and load data into graph format.
import pandas as pd
# Load tabular data
papers = pd.read_csv("papers.csv")
citations = pd.read_csv("citations.csv")
# Transform to graph
db = GraphForge("citation-network.db")
# Create nodes from DataFrame
for _, row in papers.iterrows():
db.execute("""
CREATE (:Paper {
id: $id,
title: $title,
year: $year,
citations: $citations
})
""", {'id': row['id'], 'title': row['title'],
'year': int(row['year']), 'citations': int(row['citation_count'])})
# Create relationships from edges DataFrame
for _, row in citations.iterrows():
db.execute("""
MATCH (citing:Paper {id: $citing_id})
MATCH (cited:Paper {id: $cited_id})
CREATE (citing)-[:CITES]->(cited)
""", {'citing_id': row['citing_paper'], 'cited_id': row['cited_paper']})
db.close()
Pattern 5: Testing and Validation
Use transactions for isolated testing.
def test_graph_algorithm():
db = GraphForge()
# Setup test data
db.execute("CREATE (a:Node {id: 1})-[:LINKS]->(b:Node {id: 2})")
db.execute("CREATE (b)-[:LINKS]->(c:Node {id: 3})")
# Test query
results = db.execute("""
MATCH path = (a:Node {id: 1})-[:LINKS*]->(c:Node)
RETURN count(*) AS path_count
""")
assert results[0]['path_count'].value == 2
Examples
Example 1: Social Network Analysis
from graphforge import GraphForge
# Create social network
db = GraphForge("social-network.db")
# Add people
people = [
("Alice", 30, "NYC"),
("Bob", 25, "NYC"),
("Charlie", 35, "LA"),
("Diana", 28, "NYC"),
]
for name, age, city in people:
db.execute(f"""
CREATE (:Person {{name: '{name}', age: {age}, city: '{city}'}})
""")
# Add friendships
friendships = [
("Alice", "Bob", 2015),
("Alice", "Charlie", 2018),
("Bob", "Diana", 2019),
("Charlie", "Diana", 2020),
]
for person1, person2, since in friendships:
db.execute(f"""
MATCH (a:Person {{name: '{person1}'}})
MATCH (b:Person {{name: '{person2}'}})
CREATE (a)-[:KNOWS {{since: {since}}}]->(b)
""")
# Analysis: Who has the most friends?
results = db.execute("""
MATCH (p:Person)-[:KNOWS]-(friend:Person)
RETURN p.name AS person, count(DISTINCT friend) AS friend_count
ORDER BY friend_count DESC
""")
print("Friend counts:")
for row in results:
print(f" {row['person'].value}: {row['friend_count'].value} friends")
# Analysis: People in NYC who know each other
results = db.execute("""
MATCH (a:Person)-[:KNOWS]-(b:Person)
WHERE a.city = 'NYC' AND b.city = 'NYC'
RETURN DISTINCT a.name AS person1, b.name AS person2
""")
print("\nNYC connections:")
for row in results:
print(f" {row['person1'].value} ↔ {row['person2'].value}")
db.close()
Example 2: Document Citation Network
from graphforge import GraphForge
db = GraphForge("citations.db")
# Create papers
papers = [
("P1", "Graph Neural Networks", 2021, "Smith"),
("P2", "Deep Learning Fundamentals", 2019, "Jones"),
("P3", "GNN Applications", 2022, "Smith"),
]
for paper_id, title, year, author in papers:
db.execute("""
MERGE (p:Paper {id: $id})
SET p.title = $title, p.year = $year
MERGE (a:Author {name: $author})
CREATE (a)-[:AUTHORED]->(p)
""", {'id': paper_id, 'title': title, 'year': year, 'author': author})
# Add citations
db.execute("""
MATCH (p1:Paper {id: 'P3'})
MATCH (p2:Paper {id: 'P1'})
CREATE (p1)-[:CITES]->(p2)
""")
db.execute("""
MATCH (p1:Paper {id: 'P1'})
MATCH (p2:Paper {id: 'P2'})
CREATE (p1)-[:CITES]->(p2)
""")
# Find most cited papers
results = db.execute("""
MATCH (p:Paper)<-[:CITES]-(citing:Paper)
RETURN p.title AS paper, count(citing) AS citation_count
ORDER BY citation_count DESC
""")
print("Most cited papers:")
for row in results:
print(f" {row['paper'].value}: {row['citation_count'].value} citations")
# Find papers by prolific authors
results = db.execute("""
MATCH (a:Author)-[:AUTHORED]->(p:Paper)
RETURN a.name AS author, count(p) AS paper_count
ORDER BY paper_count DESC
""")
print("\nAuthor productivity:")
for row in results:
print(f" {row['author'].value}: {row['paper_count'].value} papers")
db.close()
Example 3: Knowledge Graph from LLM Output
from graphforge import GraphForge
import json
db = GraphForge("knowledge-graph.db")
# Simulated LLM extraction result
llm_output = {
"entities": [
{"name": "Python", "type": "Language", "properties": {"paradigm": "multi"}},
{"name": "Java", "type": "Language", "properties": {"paradigm": "OOP"}},
{"name": "Django", "type": "Framework", "properties": {"category": "web"}},
],
"relationships": [
{"source": "Django", "target": "Python", "type": "WRITTEN_IN"},
{"source": "Python", "target": "Java", "type": "INFLUENCED_BY"},
]
}
# Import entities
for entity in llm_output["entities"]:
props_str = ", ".join([f"{k}: '{v}'" for k, v in entity["properties"].items()])
db.execute(f"""
CREATE (:{entity['type']} {{name: '{entity['name']}', {props_str}}})
""")
# Import relationships
for rel in llm_output["relationships"]:
db.execute(f"""
MATCH (source {{name: '{rel['source']}'}})
MATCH (target {{name: '{rel['target']}'}})
CREATE (source)-[:{rel['type']}]->(target)
""")
# Query the knowledge graph
results = db.execute("""
MATCH (f:Framework)-[:WRITTEN_IN]->(l:Language)
RETURN f.name AS framework, l.name AS language
""")
print("Frameworks and their languages:")
for row in results:
print(f" {row['framework'].value} is written in {row['language'].value}")
# Find influence chains
results = db.execute("""
MATCH (a:Language)-[:INFLUENCED_BY]->(b:Language)
RETURN a.name AS language, b.name AS influenced_by
""")
print("\nLanguage influences:")
for row in results:
print(f" {row['language'].value} was influenced by {row['influenced_by'].value}")
db.close()
Advanced Features
Transaction Isolation
Transactions provide snapshot isolation—queries within a transaction see uncommitted changes.
db = GraphForge("test.db")
db.execute("CREATE (:Person {name: 'Alice'})")
db.begin()
db.execute("CREATE (:Person {name: 'Bob'})")
# Query sees uncommitted Bob
results = db.execute("MATCH (p:Person) RETURN count(*) AS count")
print(results[0]['count'].value) # 2
db.rollback()
# After rollback, Bob is gone
results = db.execute("MATCH (p:Person) RETURN count(*) AS count")
print(results[0]['count'].value) # 1
Deep Property Access
Access nested properties in complex structures.
db.execute("""
CREATE (:Document {
metadata: {
author: 'Alice',
tags: ['ML', 'Python'],
version: {major: 1, minor: 2}
}
})
""")
results = db.execute("""
MATCH (d:Document)
RETURN d.metadata AS metadata
""")
metadata = results[0]['metadata'].value
print(metadata['author'].value) # 'Alice'
print(metadata['tags'].value[0].value) # 'ML'
print(metadata['version'].value['major'].value) # 1
Graph Export
Export subgraphs for sharing or archival.
def export_subgraph(db, query, output_file):
"""Export query results to JSON."""
results = db.execute(query)
nodes = set()
edges = []
for row in results:
# Extract nodes and relationships from result
# (Implementation depends on your export format)
pass
with open(output_file, 'w') as f:
json.dump({'nodes': list(nodes), 'edges': edges}, f)
# Export high-impact authors
export_subgraph(
db,
"MATCH (a:Author) WHERE a.h_index > 40 RETURN a",
"high-impact-authors.json"
)
Design Principles
Spec-Driven Correctness
GraphForge prioritizes semantic correctness over raw performance. All query behavior is validated against the openCypher TCK (Technology Compatibility Kit).
What this means:
- Queries behave predictably and correctly
- Null handling follows openCypher semantics
- Aggregations produce deterministic results
- Type coercion is explicit and safe
Deterministic & Reproducible
GraphForge produces stable, reproducible results across runs.
What this means:
- Same query on same data always produces same results
- Transaction isolation guarantees snapshot consistency
- No hidden state or random behavior
- Ideal for scientific workflows and testing
Inspectable
GraphForge makes query execution observable and debuggable.
What this means:
- Query plans can be inspected (future feature)
- Storage layout is simple SQLite (readable with any SQLite tool)
- Execution behavior is predictable and traceable
- No magic or hidden optimizations
Replaceable Internals
GraphForge components are modular and replaceable.
What this means:
- Parser, planner, executor, storage are independent
- SQLite backend can be swapped for other storage
- Minimal operational dependencies
- Zero configuration required
Architecture
GraphForge is built in four layers:
┌─────────────────────────────────┐
│ Parser (Lark + AST) │ Cypher → Abstract Syntax Tree
├─────────────────────────────────┤
│ Planner (Logical Operators) │ AST → Logical Plan
├─────────────────────────────────┤
│ Executor (Pipeline Engine) │ Plan → Results
├─────────────────────────────────┤
│ Storage (Graph + SQLite) │ In-Memory + Persistence
└─────────────────────────────────┘
Parser: Lark-based openCypher parser with full AST generation Planner: Logical plan generation (ScanNodes, ExpandEdges, Filter, Project, Sort, Aggregate) Executor: Pipeline-based query execution with streaming rows Storage: Dual-mode storage—in-memory graphs with optional SQLite persistence
Storage Backend
GraphForge uses SQLite with Write-Ahead Logging (WAL) for durability:
- ACID guarantees: Atomicity, Consistency, Isolation, Durability
- Zero configuration: No server setup or connection management
- Single-file databases: Easy to version control and share
- Concurrent reads: Multiple readers, single writer
- MessagePack serialization: Efficient binary encoding for complex types
The architecture prioritizes correctness and developer experience over raw performance, with all components designed to be testable, inspectable, and replaceable.
Performance Characteristics
GraphForge is optimized for interactive analysis on small-to-medium graphs (thousands to millions of nodes).
Expected Performance:
- Node/edge creation: ~10-50K operations/sec (in-memory)
- Simple traversals: ~100K-1M edges/sec
- Complex queries: Depends on query complexity and graph size
- Persistence overhead: ~2-5x slower than in-memory
When to Use GraphForge:
- Graphs with < 10M nodes
- Interactive analysis in notebooks
- Iterative graph construction
- Research and exploration workflows
When NOT to Use GraphForge:
- Production applications requiring high throughput
- Graphs with > 100M nodes
- Real-time query serving
- Multi-user concurrent writes
For production workloads, consider Neo4j, Memgraph, or other production graph databases.
Roadmap
Completed (v0.1):
- ✅ MATCH, WHERE, RETURN, ORDER BY, LIMIT, SKIP
- ✅ Aggregations (COUNT, SUM, AVG, MIN, MAX)
- ✅ CREATE, SET, DELETE, MERGE clauses
- ✅ Python builder API
- ✅ SQLite persistence
- ✅ ACID transactions
Planned (v0.2):
- MATCH-CREATE combinations (connecting existing nodes)
- DETACH DELETE (cascading relationship deletion)
- Path expressions and variable-length patterns
- UNWIND for list processing
- CASE expressions
Future Considerations:
- Query plan visualization
- Performance profiling tools
- Import/export to standard formats (GraphML, CSV)
- Integration with popular data science libraries
- Pydantic schema validation
Contributing
GraphForge is in active development. Contributions are welcome!
Development Workflow
Before pushing code, run:
make pre-push
This runs:
- Code formatting checks (ruff format --check)
- Linting (ruff check)
- Type checking (mypy)
- Tests with coverage measurement
- Coverage threshold validation (minimum 85%)
Coverage
View detailed coverage report:
make coverage-report
Check coverage for your changes only:
make coverage-diff
Run tests with coverage manually:
make coverage
For new features, optionally check against a stricter 90% threshold:
make coverage-strict
Coverage Requirements
- Project coverage: 85% of entire codebase (checked by
make pre-push) - Patch coverage: 80% of new/changed lines (checked by codecov in CI)
Best practice: Aim for 100% coverage of new code to ensure both thresholds pass.
Test Analytics
GraphForge uses Codecov Test Analytics to monitor test performance and reliability across our 8,203 tests (481 unit/integration + 7,722 TCK compliance tests).
What we track:
- ⏱️ Test execution time and performance trends
- 🔄 Flaky tests (tests that intermittently fail)
- ❌ Test failure rates and patterns
- 📊 Test suite health over time
Benefits:
- Identify slow tests that need optimization
- Catch flaky tests before they become problematic
- Track test performance degradation
- Improve CI/CD reliability
Test analytics data is automatically collected in CI and viewable on the Codecov dashboard.
Areas for Contribution
- Additional Cypher features
- Performance optimizations
- Documentation and examples
- Bug reports and fixes
- Integration with data science tools
See CONTRIBUTING.md for guidelines.
Documentation
- Quick Start Tutorial — Step-by-step guide for new users
- API Reference — Complete Python API documentation
- Cypher Language Guide — openCypher subset reference
- Architecture Overview — System design and internals
- Requirements Document — Full scope and design rationale
Testing
GraphForge has 368 tests covering:
- Unit tests for parser, planner, executor, storage
- Integration tests for end-to-end workflows
- openCypher TCK compliance tests (17 passing)
Run the test suite:
# Install dev dependencies
uv sync --dev
# Run all tests
pytest
# Run with coverage
pytest --cov=graphforge --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m tck # TCK compliance tests
FAQ
Q: How does GraphForge differ from NetworkX? A: GraphForge adds declarative querying (openCypher), automatic persistence (SQLite), and ACID transactions. NetworkX is great for algorithms; GraphForge is great for data management.
Q: Can I use GraphForge in production? A: GraphForge is designed for research and analysis, not production applications. For production workloads, use Neo4j or Memgraph.
Q: Does GraphForge support distributed queries? A: No. GraphForge is embedded and single-node only.
Q: Can I import data from Neo4j? A: Not directly yet. You can export from Neo4j to CSV and import via Python scripts.
Q: What's the maximum graph size? A: Practical limit is ~10M nodes. Beyond that, query performance degrades significantly.
Q: Is GraphForge thread-safe? A: No. Use one GraphForge instance per thread, or use external synchronization.
License
MIT © David Spencer
GraphForge is open source software released under the MIT License. See LICENSE for details.
Acknowledgments
GraphForge is built on excellent open-source projects:
- Lark — Fast, modern parsing library
- Pydantic — Data validation and settings management
- MessagePack — Efficient binary serialization
- openCypher — Declarative graph query language
Special thanks to the openCypher community for the TCK suite and language specification.
Happy Graph Forging! 🔨📊
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graphforge-0.1.4.tar.gz.
File metadata
- Download URL: graphforge-0.1.4.tar.gz
- Upload date:
- Size: 484.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
131a117258c7d6af9fcd3e07dba9145650855eb9e3aec3e34e1bc89f4ccbdc85
|
|
| MD5 |
60128814a0a626f3e77c9d13af8292a9
|
|
| BLAKE2b-256 |
662cef8a6fd95a57b39a9600567aaba49cb99743922f2c22d9f1a3723326263d
|
File details
Details for the file graphforge-0.1.4-py3-none-any.whl.
File metadata
- Download URL: graphforge-0.1.4-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d9e79a70af5db7cb3c7d993e305447c33c88d2c3d4e0bc1ce4d89fbf791c882
|
|
| MD5 |
61b2f9aa3722cec95647aa00d92700d1
|
|
| BLAKE2b-256 |
2dec0b7f583e6460e9129f2a829045001525c4d8b9dfacfe3880e05edfda9db6
|