Skip to main content

Lightweight knowledge graph ingestion and enrichment pipeline

Project description

LiteGraf

Lightweight knowledge graph ingestion and query pipeline. Insert text or documents, extract entities and relationships with an LLM, store them in a graph database, and query with natural language.

from pipeline.litegraf import LiteGraf

kg = LiteGraf()
kg.insert("TP53 is associated with multiple cancers including breast and lung cancer.")
result = kg.query("What cancers are associated with TP53?")
print(result.answer)

Features

  • Single entry pointLiteGraf dataclass with sensible defaults, override only what you need
  • Pluggable backends — Neo4j, Memgraph, Ollama, Cloudflare Workers AI, AWS Bedrock
  • Sync and asyncinsert() / ainsert(), query() / aquery()
  • Content deduplication — hash-based, idempotent inserts
  • LLM response caching — disk-based, avoids redundant API calls
  • Rate limiting — async concurrency limiter for LLM providers
  • PDF and document ingestion — via MarkItDown + PyMuPDF
  • Benchmarking suite — compare extraction quality across LLM providers
  • Enrichment pipeline — entity resolution, ontology integration, evidence scoring

Install

Requires Python 3.11+.

pip install litegraf

With optional backends:

pip install litegraf[neo4j]       # Neo4j graph store
pip install litegraf[bedrock]     # AWS Bedrock LLM
pip install litegraf[all]         # Everything

Or from source with uv:

git clone https://github.com/graffold/litegraf.git
cd litegraf
uv sync --all-extras

Quick Start

Default setup (Ollama + Neo4j)

Start Ollama and Neo4j locally, then:

from pipeline.litegraf import LiteGraf

kg = LiteGraf()  # connects to localhost defaults

# Insert text
kg.insert("BRCA1 interacts with RAD51 in DNA repair pathways.")

# Insert a PDF
kg.insert(open("paper.pdf", "rb").read())

# Query
result = kg.query("What proteins interact with BRCA1?")
print(result.answer)
print(result.context)  # retrieved graph context

Cloudflare Workers AI (free tier)

kg = LiteGraf(
    llm="cloudflare",
    llm_model="@cf/meta/llama-3.1-8b-instruct-fp8",
)

Memgraph backend

kg = LiteGraf(
    graph_store="memgraph",
    graph_uri="bolt://localhost:7687",
    graph_user="",
    graph_password="",
)

Async usage

import asyncio
from pipeline.litegraf import LiteGraf

async def main():
    kg = LiteGraf()
    await kg.ainsert("TP53 suppresses tumor growth.")
    result = await kg.aquery("What does TP53 do?")
    print(result.answer)

asyncio.run(main())

Query modes

# Full pipeline: retrieve context → LLM synthesis
result = kg.query("What cancers involve TP53?")

# Context only (bring your own LLM prompt)
result = kg.query("TP53", mode="only_context")
for chunk in result.context:
    print(chunk.text, chunk.score)

Configuration

All parameters can be set via the LiteGraf constructor:

Parameter Default Description
graph_store "neo4j" Graph backend: "neo4j", "memgraph", or instance
graph_uri "bolt://localhost:7687" Bolt connection URI
graph_user "neo4j" Graph database username
graph_password "" Graph database password
llm "ollama" LLM provider: "ollama", "cloudflare", "bedrock"
llm_model "llama3" Model name/ID
embedding "local" Embedding provider (local sentence-transformers)
chunk_token_size 512 Tokens per chunk
enable_cache True Cache LLM responses to disk
enable_dedup True Skip duplicate content on insert

Benchmarks

Compare extraction quality across LLM providers on biomedical datasets:

python -m pipeline.benchmarks

Results are published to docs/ for GitHub Pages viewing.

Development

uv sync --all-extras --group dev
uv run pytest
uv run ruff check src/

License

AGPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litegraf-0.1.0.tar.gz (410.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litegraf-0.1.0-py3-none-any.whl (471.6 kB view details)

Uploaded Python 3

File details

Details for the file litegraf-0.1.0.tar.gz.

File metadata

  • Download URL: litegraf-0.1.0.tar.gz
  • Upload date:
  • Size: 410.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for litegraf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f729d200726449dda19a7472b73c89218f373be43e23facd9c6bd9a07a2e7bae
MD5 1717940169d00e79015c01479f619d5c
BLAKE2b-256 bcb18b1beedc5aa565d00a8bca35e8f7ac00931015e4d235ecf0877c4fe65295

See more details on using hashes here.

File details

Details for the file litegraf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: litegraf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 471.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for litegraf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2413732667ac4ae6f40431a434ec09d79dd17ef85617770cb8c8f705dc161b8e
MD5 07c105369c4e69b1eccd844cfe27698a
BLAKE2b-256 1c0ebd53aa6e97228fb366aa00c5693d29f725cd9159aea4477ced212f5fb0dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page