A high-performance graph database library with Python bindings written in Rust
Project description
KGLite — Lightweight Knowledge Graph for Python
An embedded, in-memory knowledge graph database for Python — built in Rust for speed, with a Cypher query engine, semantic search, and first-class support for RAG pipelines and AI agents. No server, no setup, no infrastructure. Just pip install kglite and go.
Why KGLite?
- Zero infrastructure — runs inside your Python process. No database server to install, configure, or maintain.
- Fast — Rust core (via PyO3 + petgraph) with zero-copy where possible. Load millions of nodes without leaving Python.
- Query with Cypher — familiar graph query language for pattern matching, mutations, aggregations, and traversals.
- Built for AI — semantic search with
text_score(), schema introspection viadescribe(), and a ready-made MCP server for LLM tool use. - DataFrames in, DataFrames out — bulk-load from pandas, query results as DataFrames. Fits naturally into data science workflows.
Quick Start
pip install kglite
import pandas as pd
import kglite
# Three storage modes — pick by graph size:
# default (in-memory) — small/medium graphs, fastest queries
# storage="mapped" — mmap columns, RAM-friendly as you grow
# storage="disk", path=… — 100M+ nodes, Wikidata-scale, loaded lazily
graph = kglite.KnowledgeGraph()
# Bulk-load nodes from a DataFrame (also: add_nodes_bulk, from_blueprint,
# load_ntriples, or Cypher CREATE for ad-hoc inserts).
people = pd.DataFrame({
"id": ["alice", "bob", "eve"],
"name": ["Alice", "Bob", "Eve"],
"age": [28, 35, 41],
"city": ["Oslo", "Bergen", "Trondheim"],
})
graph.add_nodes(people, node_type="Person", unique_id_field="id", node_title_field="name")
# Bulk-load relationships the same way (also: add_connections_bulk,
# add_connections_from_source for auto-filter by loaded types).
knows = pd.DataFrame({"src": ["alice", "bob"], "tgt": ["bob", "eve"]})
graph.add_connections(knows, connection_type="KNOWS",
source_type="Person", source_id_field="src",
target_type="Person", target_id_field="tgt")
# Query — returns a ResultView (lazy; data stays in Rust until accessed).
result = graph.cypher("""
MATCH (p:Person) WHERE p.age > 30
RETURN p.name AS name, p.city AS city
ORDER BY p.age DESC
""")
for row in result:
print(row['name'], row['city'])
# Or get a pandas DataFrame directly.
df = graph.cypher("MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age", to_df=True)
# Persist to disk and reload.
graph.save("my_graph.kgl")
loaded = kglite.load("my_graph.kgl")
Use Cases
Codebase analysis
Parse Python, Rust, TypeScript, Go, Java, C#, and C++ into a graph of functions, classes, calls, and imports. Trace who-calls-what, find dead code, and review structure without leaving your editor. Pairs naturally with the MCP server so an agent can reason over your repo.
from kglite.code_tree import build
graph = build(".") # parse current directory
graph.cypher("""
MATCH (f:Function)-[:CALLS]->(g:Function)
RETURN g.name, count(f) AS callers
ORDER BY callers DESC LIMIT 10
""")
Agentic AI — memory and tool use
Give an LLM a structured memory it can query. describe() emits a
compact XML schema that fits in a system prompt, and the bundled MCP
server exposes the whole graph as a Cypher tool — drop-in for Claude,
Cursor, or any MCP-capable agent.
xml = graph.describe() # schema for the agent's context
prompt = f"You have a knowledge graph:\n{xml}\nAnswer via graph.cypher()."
# Or: python examples/mcp_server.py path/to/graph.kgl
RAG retrieval
Store documents, chunks, and entities together as one graph. Combine
text_score() semantic similarity with Cypher structure — hybrid
retrieval in one query, no second vector DB.
graph.cypher("""
MATCH (c:Chunk)-[:IN_DOC]->(d:Document)
RETURN c.text, d.title,
text_score(c.embedding, $query_vec) AS score
ORDER BY score DESC LIMIT 5
""", params={"query_vec": query_embedding})
Data exploration and analysis
Load CSVs or DataFrames, walk relationships, run graph algorithms (shortest path, centrality, community detection), and export — all from a notebook.
graph.add_nodes(users_df, node_type="User", unique_id_field="user_id", node_title_field="name")
graph.cypher("""
MATCH path = shortestPath((a:User {name:'Alice'})-[*]-(b:User {name:'Eve'}))
RETURN path
""")
Examples
The examples/
directory has runnable, self-contained scripts covering each of the
use cases above:
code_graph.py— build a code knowledge graph from a source directory viacode_tree.build. ProducesFunction,Class,Module,Filenodes withCALLS,DEFINES,IMPORTSedges.legal_graph.py— end-to-endadd_nodes/add_connectionsfrom pandas DataFrames, covering laws, regulations, and court decisions with citation relationships. Good template for adapting to your own domain.mcp_server.py— drop-in MCP server that exposes any.kglfile to an LLM (Claude, Cursor, …) as a Cypher query tool, with schema disclosure and code-graph–aware helpers.spatial_graph.py— declarative CSV→graph loading via a JSON blueprint; regions, facilities, and sensors with lat/lon coordinates and pipeline-path traversal queries.wikidata_disk.py— Wikidata-scale build + disk-mode storage; loads hundreds of millions of triples viaload_ntriplesinto a mmap-backed graph.
Key Features
| Feature | Description |
|---|---|
| Cypher queries | MATCH, CREATE, SET, DELETE, MERGE, aggregations, ORDER BY, LIMIT, SKIP |
| Semantic search | Vector embeddings + text_score() for similarity ranking |
| Graph algorithms | Shortest path, centrality, community detection, clustering |
| Spatial | Coordinates, WKT geometry, distance and containment queries |
| Timeseries | Time-indexed data with ts_*() Cypher functions |
| Bulk loading | Fluent API (add_nodes / add_connections) for DataFrames |
| Blueprints | Declarative CSV-to-graph loading via JSON config |
| Import/Export | Save/load snapshots, GraphML, CSV export |
| AI integration | describe() introspection, MCP server, agent prompts |
| Code analysis | Parse codebases via tree-sitter (kglite.code_tree) |
Documentation
Full docs at kglite.readthedocs.io:
- Getting Started — installation, first graph, core concepts
- Cypher Guide — queries, mutations, parameters
- Semantic Search — embeddings, vector search
- AI Agents — MCP server,
describe(), agent prompts - API Reference — full auto-generated reference
Requirements
Python 3.10+ (CPython) | macOS (ARM/Intel), Linux (x86_64/aarch64), Windows (x86_64) | pandas >= 1.5
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kglite-0.8.14-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: kglite-0.8.14-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4725c04ec54ec4a4501cfbbe5c254ab26b3512a0fece5f28fcceeaa4694bccde
|
|
| MD5 |
57a09122370883ed6d5e65d187436582
|
|
| BLAKE2b-256 |
c461521c536d82e1b14ff08bc6a9c5c87b357dcb03d13ce353b07f7a3db89fa1
|
File details
Details for the file kglite-0.8.14-cp310-abi3-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: kglite-0.8.14-cp310-abi3-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 6.0 MB
- Tags: CPython 3.10+, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9487b92ef5186bee8f8bded2d48bd664647339b0b447223c8b91b6be1ecbe197
|
|
| MD5 |
a1a1dc220f46a9e095f12a3aaa248d59
|
|
| BLAKE2b-256 |
ba28297eb5a43a3855a647e6331780b4fccded6499d3ab0bd9e29056a29a3c64
|
File details
Details for the file kglite-0.8.14-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: kglite-0.8.14-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.4 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e606e94b369df8b11ea065316ef54d5f338a95044517c85267bb153b3114e91
|
|
| MD5 |
a35825e809d271bdb5dfd1bb97bc13a7
|
|
| BLAKE2b-256 |
192d48ace6f814f18a00c142a1475d856eb66250d8ce9bebdf7c79d0010d838a
|
File details
Details for the file kglite-0.8.14-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: kglite-0.8.14-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 5.8 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93c1bfec64f33d4d116f2709bb2087e8a988e446f5e589699616131973e6b5b2
|
|
| MD5 |
f2185b95046131da90ead78ba917ac07
|
|
| BLAKE2b-256 |
6b1a16655017784f83469b08d20b02421d766ee7181eb9889d3eafbfa951ecfb
|