Skip to main content

A high-performance graph database library with Python bindings written in Rust

Project description

KGLite — Lightweight Knowledge Graph for Python

PyPI version Python versions License: MIT Docs

KGLite is an embedded knowledge graph for Python: pip install, no server, no setup. It speaks Cypher, loads pandas DataFrames, and ships with the connective tissue for AI agents — an MCP server so Claude / Cursor / any MCP-capable LLM can query your graph as a tool, a describe() method that emits a compact XML schema for system prompts, and a code_tree parser that turns any source directory into a graph of functions, classes, calls, and imports across 9 languages.

Three storage modes scale from in-memory (millisecond queries on small graphs) to mmap-backed on disk (1 B+ edges, Wikidata-scale).

Why KGLite?

  • Built for LLM agentsdescribe() XML schema, bundled MCP server, and an agent-oriented query surface (cypher(), graph.select(...).traverse(...)).
  • Codebase → graph in one linekglite.code_tree.build(".") parses Python, Rust, TypeScript, Go, Java, C#, C++, and more into Function / Class / Module nodes with CALLS / DEFINES / IMPORTS edges.
  • Scales without leaving Python — in-memory for prototyping, mmap-backed for notebook-scale, disk-mode CSR for graphs too large for RAM. Same API across modes.
  • Query with CypherMATCH, MERGE, OPTIONAL MATCH, aggregations, parameters, semantic search via text_score().
  • DataFrames in, DataFrames out — bulk-load from pandas, query results as DataFrames.

Quick Start

pip install kglite
import pandas as pd
import kglite

# Three storage modes — pick by graph size:
#   default (in-memory)   — small/medium graphs, fastest queries
#   storage="mapped"      — mmap columns, RAM-friendly as you grow
#   storage="disk", path=…  — 100M+ nodes, Wikidata-scale, loaded lazily
graph = kglite.KnowledgeGraph()

# Bulk-load nodes from a DataFrame (also: add_nodes_bulk, from_blueprint,
# load_ntriples, or Cypher CREATE for ad-hoc inserts).
people = pd.DataFrame({
    "id":   ["alice", "bob", "eve"],
    "name": ["Alice", "Bob", "Eve"],
    "age":  [28, 35, 41],
    "city": ["Oslo", "Bergen", "Trondheim"],
})
graph.add_nodes(people, node_type="Person", unique_id_field="id", node_title_field="name")

# Bulk-load relationships the same way (also: add_connections_bulk,
# add_connections_from_source for auto-filter by loaded types).
knows = pd.DataFrame({"src": ["alice", "bob"], "tgt": ["bob", "eve"]})
graph.add_connections(knows, connection_type="KNOWS",
                      source_type="Person", source_id_field="src",
                      target_type="Person", target_id_field="tgt")

# Query — returns a ResultView (lazy; data stays in Rust until accessed).
result = graph.cypher("""
    MATCH (p:Person) WHERE p.age > 30
    RETURN p.name AS name, p.city AS city
    ORDER BY p.age DESC
""")
for row in result:
    print(row['name'], row['city'])

# Or get a pandas DataFrame directly.
df = graph.cypher("MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age", to_df=True)

# Persist to disk and reload.
graph.save("my_graph.kgl")
loaded = kglite.load("my_graph.kgl")

Use Cases

Codebase analysis

Parse Python, Rust, TypeScript, Go, Java, C#, and C++ into a graph of functions, classes, calls, and imports. Trace who-calls-what, find dead code, and review structure without leaving your editor. Pairs naturally with the MCP server so an agent can reason over your repo.

from kglite.code_tree import build

graph = build(".")                                # parse current directory
graph.cypher("""
    MATCH (f:Function)-[:CALLS]->(g:Function)
    RETURN g.name, count(f) AS callers
    ORDER BY callers DESC LIMIT 10
""")

Agentic AI — memory and tool use

Give an LLM a structured memory it can query. describe() emits a compact XML schema that fits in a system prompt, and the bundled MCP server exposes the whole graph as a Cypher tool — drop-in for Claude, Cursor, or any MCP-capable agent.

xml = graph.describe()                            # schema for the agent's context
prompt = f"You have a knowledge graph:\n{xml}\nAnswer via graph.cypher()."
# Or: python examples/mcp_server.py path/to/graph.kgl

RAG retrieval

Store documents, chunks, and entities together as one graph. Combine text_score() semantic similarity with Cypher structure — hybrid retrieval in one query, no second vector DB.

graph.cypher("""
    MATCH (c:Chunk)-[:IN_DOC]->(d:Document)
    RETURN c.text, d.title,
           text_score(c.embedding, $query_vec) AS score
    ORDER BY score DESC LIMIT 5
""", params={"query_vec": query_embedding})

Data exploration and analysis

Load CSVs or DataFrames, walk relationships, run graph algorithms (shortest path, centrality, community detection), and export — all from a notebook.

graph.add_nodes(users_df, node_type="User", unique_id_field="user_id", node_title_field="name")
graph.cypher("""
    MATCH path = shortestPath((a:User {name:'Alice'})-[*]-(b:User {name:'Eve'}))
    RETURN path
""")

Examples

The examples/ directory has runnable, self-contained scripts covering each of the use cases above:

  • code_graph.py — build a code knowledge graph from a source directory via code_tree.build. Produces Function, Class, Module, File nodes with CALLS, DEFINES, IMPORTS edges.
  • legal_graph.py — end-to-end add_nodes / add_connections from pandas DataFrames, covering laws, regulations, and court decisions with citation relationships. Good template for adapting to your own domain.
  • mcp_server.py — drop-in MCP server that exposes any .kgl file to an LLM (Claude, Cursor, …) as a Cypher query tool, with schema disclosure and code-graph–aware helpers.
  • spatial_graph.py — declarative CSV→graph loading via a JSON blueprint; regions, facilities, and sensors with lat/lon coordinates and pipeline-path traversal queries.
  • wikidata_disk.py — Wikidata-scale build + disk-mode storage; loads hundreds of millions of triples via load_ntriples into a mmap-backed graph.

Benchmarks

KGLite builds and queries Wikidata-scale graphs on a laptop. Measured with bench/wiki_benchmark.py on an M-series MacBook.

Ingest — full pipeline from compressed N-Triples to a queryable graph:

dataset triples nodes edges ingest throughput peak RAM
wiki100m 100 M 938 K 748 K 29 s 3.4 M triples/s 1.3 GB
wiki500m 500 M 5.6 M 6.7 M 157 s 3.2 M triples/s 5.2 GB
wiki1000m 1 B 14.7 M 15.4 M 395 s 2.5 M triples/s 7.0 GB

Reloading a saved 1 B-triple graph from disk (7 GB on-disk): 3.5 s.

Query latency on the 1 B-triple graph (mapped storage):

Cypher wall
MATCH (n)-[:P31]->(:Q5) RETURN count(n) — typed aggregation 0.5 ms
MATCH (a)-[:P31]->(b)-[:P279]->(c) LIMIT 10 — 2-hop typed 0.9 ms
MATCH (a)-[:P31]->(b {nid:'Q64'}) RETURN a LIMIT 20 — pivot 1 ms
MATCH (a)-[:P31]->(:Q5) MATCH (a)-[:P27]->(c) LIMIT 10 — join 44 ms

Disk and mapped storage track within 1 % on build; mapped wins on query shapes backed by its in-memory inverted index, disk wins on unbounded typed traversals by staying on sorted-CSR mmap I/O.

No server, no tuning, same Python process as your code.

Key Features

Feature Description
Cypher queries MATCH, CREATE, SET, DELETE, MERGE, aggregations, ORDER BY, LIMIT, SKIP
Semantic search Vector embeddings + text_score() for similarity ranking
Graph algorithms Shortest path, centrality, community detection, clustering
Spatial Coordinates, WKT geometry, distance and containment queries
Timeseries Time-indexed data with ts_*() Cypher functions
Bulk loading Fluent API (add_nodes / add_connections) for DataFrames
Blueprints Declarative CSV-to-graph loading via JSON config
Import/Export Save/load snapshots, GraphML, CSV export
AI integration describe() introspection, MCP server, agent prompts
Code analysis Parse codebases via tree-sitter (kglite.code_tree)

Documentation

Full docs at kglite.readthedocs.io:

Requirements

Python 3.10+ (CPython) | macOS (ARM/Intel), Linux (x86_64/aarch64), Windows (x86_64) | pandas >= 1.5

License

MIT — see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kglite-0.8.15-cp310-abi3-win_amd64.whl (6.0 MB view details)

Uploaded CPython 3.10+Windows x86-64

kglite-0.8.15-cp310-abi3-manylinux_2_39_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.39+ x86-64

kglite-0.8.15-cp310-abi3-macosx_11_0_arm64.whl (5.4 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

kglite-0.8.15-cp310-abi3-macosx_10_12_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file kglite-0.8.15-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: kglite-0.8.15-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.8.15-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c269c0cf2d6a799cab5a9b9f004375ffcbcd0ecdef17ba9d02b2987797df29d9
MD5 7158b9c1346110a6108333985e96c938
BLAKE2b-256 402e43048b070eed5d1db521721562e2b98a11fbef12621a5d7276ff96cce3e6

See more details on using hashes here.

File details

Details for the file kglite-0.8.15-cp310-abi3-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.8.15-cp310-abi3-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 d5cc8d34df41ac0a17101283c418dd369dedfcd5ffc3c80b06a8a794c2b853d2
MD5 8d8c982b6131af4b2bce0ad6c26d7bee
BLAKE2b-256 5f24733358297ec0b90609df021941d43547675fc86a7b27b42dfe0e50041b31

See more details on using hashes here.

File details

Details for the file kglite-0.8.15-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.8.15-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1bfe46f38ef520ca02e07b3855163cb7274c87fb4e08252f2053f03ab64daf1c
MD5 d81085b0d51479f44a1f47f82885e45e
BLAKE2b-256 db4e10a1d357888434733187da848c234b940fefac1d45eb75dc6949cda8705e

See more details on using hashes here.

File details

Details for the file kglite-0.8.15-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.8.15-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b2d85e8e218f81c6be996eec50f2734c240eb0e424ff07da4ef5a63f38b9c364
MD5 b7c62a13ebca3edc8816d7694541fead
BLAKE2b-256 615cf1af87ba84806431f25ac6784232914ba3c654f567dec7b41354580b1cf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page