Knowledge graph builder with extractor, builder, and enricher components
Project description
GraphForge
Knowledge graph construction toolkit — extract entities and relationships from structured records or free text, build queryable directed graphs, and enrich them with network metrics.
Features
- Dual-mode extraction — parse entities and relationships from dict records or unstructured text via configurable regex patterns
- Domain configuration — define entity types, relationship types, and validation rules in YAML; swap domains without touching code
- Graph querying — find nodes by type, compute shortest paths, list neighbors/predecessors, and extract subgraphs
- Network enrichment — compute PageRank, degree centrality, clustering coefficient, and normalize edge weights in one call
- Community detection — partition graphs using greedy modularity optimization (NetworkX)
- Portable serialization — round-trip graphs to/from plain dicts via node-link format
Quick Start
pip install graph-forge
from graphforge import GraphBuilder, GraphExtractor, GraphEnricher
from graphforge.models import Entity, Relationship
# Build a graph manually
builder = GraphBuilder()
alice = Entity(id="alice", type="person", properties={"name": "Alice"})
bob = Entity(id="bob", type="person", properties={"name": "Bob"})
rel = Relationship(source="alice", target="bob", type="knows", weight=1.0)
builder.add_entity(alice)
builder.add_entity(bob)
builder.add_relationship(rel)
# Query
print(builder.get_neighbors("alice")) # ['bob']
print(builder.get_shortest_path("alice", "bob"))
# Extract from records
extractor = GraphExtractor()
records = [{"id": "p1", "type": "paper", "cites": "p2"}]
entities, relationships = extractor.extract_from_records(records)
# Enrich with metrics
enricher = GraphEnricher(builder.graph)
enricher.compute_centrality()
enricher.compute_pagerank()
enricher.detect_communities()
Architecture
graph-forge/
├── graphforge/
│ ├── models.py # Entity and Relationship dataclasses
│ ├── domains.py # DomainLoader — reads YAML domain configs
│ ├── builder.py # GraphBuilder — constructs and queries DiGraph
│ ├── extractor.py # GraphExtractor — parses records and free text
│ └── enricher.py # GraphEnricher — computes network metrics
├── domains/
│ ├── technology.yaml
│ ├── science.yaml
│ └── social.yaml
└── tests/ # pytest suite, one file per module
Data flow:
Raw data (dicts / text)
│
GraphExtractor ← domain YAML controls entity/rel types
│
GraphBuilder ← NetworkX DiGraph under the hood
│
GraphEnricher ← PageRank, centrality, communities
│
Serialized dict / downstream query
Development
git clone https://github.com/techknowmad/graph-forge.git
cd graph-forge
pip install -e ".[dev]"
# Lint
ruff check .
# Test
pytest -v
All tests must pass and ruff check must be clean before opening a PR.
Contributing
See CONTRIBUTING.md for branch conventions, commit style, and the PR checklist.
License
Built by TechKnowMad Labs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tkm_graphforge-0.1.0.tar.gz.
File metadata
- Download URL: tkm_graphforge-0.1.0.tar.gz
- Upload date:
- Size: 32.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb8197e2b66163da6eb90ab923d5d42305f0cac20532c477505087bea6f42327
|
|
| MD5 |
855c630f5b90a5f57b024ec731c5796b
|
|
| BLAKE2b-256 |
2b0f2d8c33f1f5dc717ebb6d13dbce75a64d301ece7daec08465ee49b96590e0
|
File details
Details for the file tkm_graphforge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tkm_graphforge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1c224c2757a1dc01722f7c9cfefcedc6fc73335e17e7b69f01ef4d2c5ebcf03
|
|
| MD5 |
d87c56566ca6e04ffb3514bd7b4a76bf
|
|
| BLAKE2b-256 |
adb338f7dc27fe7fabcff0cf8814ccd1b0a61c51145132b8568da4af45d81a6b
|