Skip to main content

Knowledge graph builder with extractor, builder, and enricher components

Project description

GraphForge

License: MIT Python 3.12 Tests

Knowledge graph construction toolkit — extract entities and relationships from structured records or free text, build queryable directed graphs, and enrich them with network metrics.


Features

  • Dual-mode extraction — parse entities and relationships from dict records or unstructured text via configurable regex patterns
  • Domain configuration — define entity types, relationship types, and validation rules in YAML; swap domains without touching code
  • Graph querying — find nodes by type, compute shortest paths, list neighbors/predecessors, and extract subgraphs
  • Network enrichment — compute PageRank, degree centrality, clustering coefficient, and normalize edge weights in one call
  • Community detection — partition graphs using greedy modularity optimization (NetworkX)
  • Portable serialization — round-trip graphs to/from plain dicts via node-link format

Quick Start

pip install graph-forge
from graphforge import GraphBuilder, GraphExtractor, GraphEnricher
from graphforge.models import Entity, Relationship

# Build a graph manually
builder = GraphBuilder()
alice = Entity(id="alice", type="person", properties={"name": "Alice"})
bob   = Entity(id="bob",   type="person", properties={"name": "Bob"})
rel   = Relationship(source="alice", target="bob", type="knows", weight=1.0)

builder.add_entity(alice)
builder.add_entity(bob)
builder.add_relationship(rel)

# Query
print(builder.get_neighbors("alice"))   # ['bob']
print(builder.get_shortest_path("alice", "bob"))

# Extract from records
extractor = GraphExtractor()
records = [{"id": "p1", "type": "paper", "cites": "p2"}]
entities, relationships = extractor.extract_from_records(records)

# Enrich with metrics
enricher = GraphEnricher(builder.graph)
enricher.compute_centrality()
enricher.compute_pagerank()
enricher.detect_communities()

Architecture

graph-forge/
├── graphforge/
│   ├── models.py      # Entity and Relationship dataclasses
│   ├── domains.py     # DomainLoader — reads YAML domain configs
│   ├── builder.py     # GraphBuilder — constructs and queries DiGraph
│   ├── extractor.py   # GraphExtractor — parses records and free text
│   └── enricher.py    # GraphEnricher — computes network metrics
├── domains/
│   ├── technology.yaml
│   ├── science.yaml
│   └── social.yaml
└── tests/             # pytest suite, one file per module

Data flow:

Raw data (dicts / text)
        │
   GraphExtractor        ← domain YAML controls entity/rel types
        │
   GraphBuilder          ← NetworkX DiGraph under the hood
        │
   GraphEnricher         ← PageRank, centrality, communities
        │
   Serialized dict / downstream query

Development

git clone https://github.com/techknowmad/graph-forge.git
cd graph-forge
pip install -e ".[dev]"

# Lint
ruff check .

# Test
pytest -v

All tests must pass and ruff check must be clean before opening a PR.


Contributing

See CONTRIBUTING.md for branch conventions, commit style, and the PR checklist.


License

MIT


Built by TechKnowMad Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tkm_graphforge-0.1.0.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tkm_graphforge-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file tkm_graphforge-0.1.0.tar.gz.

File metadata

  • Download URL: tkm_graphforge-0.1.0.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for tkm_graphforge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fb8197e2b66163da6eb90ab923d5d42305f0cac20532c477505087bea6f42327
MD5 855c630f5b90a5f57b024ec731c5796b
BLAKE2b-256 2b0f2d8c33f1f5dc717ebb6d13dbce75a64d301ece7daec08465ee49b96590e0

See more details on using hashes here.

File details

Details for the file tkm_graphforge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tkm_graphforge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for tkm_graphforge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1c224c2757a1dc01722f7c9cfefcedc6fc73335e17e7b69f01ef4d2c5ebcf03
MD5 d87c56566ca6e04ffb3514bd7b4a76bf
BLAKE2b-256 adb338f7dc27fe7fabcff0cf8814ccd1b0a61c51145132b8568da4af45d81a6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page