Skip to main content

A high-performance graph database library with Python bindings written in Rust

Project description

KGLite — Lightweight Knowledge Graph for Python

PyPI version Python versions License: MIT Docs

An embedded, in-memory knowledge graph database for Python — built in Rust for speed, with a Cypher query engine, semantic search, and first-class support for RAG pipelines and AI agents. No server, no setup, no infrastructure. Just pip install kglite and go.

Why KGLite?

  • Zero infrastructure — runs inside your Python process. No database server to install, configure, or maintain.
  • Fast — Rust core (via PyO3 + petgraph) with zero-copy where possible. Load millions of nodes without leaving Python.
  • Query with Cypher — familiar graph query language for pattern matching, mutations, aggregations, and traversals.
  • Built for AI — semantic search with text_score(), schema introspection via describe(), and a ready-made MCP server for LLM tool use.
  • DataFrames in, DataFrames out — bulk-load from pandas, query results as DataFrames. Fits naturally into data science workflows.

Quick Start

pip install kglite
import pandas as pd
import kglite

# Three storage modes — pick by graph size:
#   default (in-memory)   — small/medium graphs, fastest queries
#   storage="mapped"      — mmap columns, RAM-friendly as you grow
#   storage="disk", path=…  — 100M+ nodes, Wikidata-scale, loaded lazily
graph = kglite.KnowledgeGraph()

# Bulk-load nodes from a DataFrame (also: add_nodes_bulk, from_blueprint,
# load_ntriples, or Cypher CREATE for ad-hoc inserts).
people = pd.DataFrame({
    "id":   ["alice", "bob", "eve"],
    "name": ["Alice", "Bob", "Eve"],
    "age":  [28, 35, 41],
    "city": ["Oslo", "Bergen", "Trondheim"],
})
graph.add_nodes(people, node_type="Person", unique_id_field="id", node_title_field="name")

# Bulk-load relationships the same way (also: add_connections_bulk,
# add_connections_from_source for auto-filter by loaded types).
knows = pd.DataFrame({"src": ["alice", "bob"], "tgt": ["bob", "eve"]})
graph.add_connections(knows, connection_type="KNOWS",
                      source_type="Person", source_id_field="src",
                      target_type="Person", target_id_field="tgt")

# Query — returns a ResultView (lazy; data stays in Rust until accessed).
result = graph.cypher("""
    MATCH (p:Person) WHERE p.age > 30
    RETURN p.name AS name, p.city AS city
    ORDER BY p.age DESC
""")
for row in result:
    print(row['name'], row['city'])

# Or get a pandas DataFrame directly.
df = graph.cypher("MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age", to_df=True)

# Persist to disk and reload.
graph.save("my_graph.kgl")
loaded = kglite.load("my_graph.kgl")

Use Cases

Codebase analysis

Parse Python, Rust, TypeScript, Go, Java, C#, and C++ into a graph of functions, classes, calls, and imports. Trace who-calls-what, find dead code, and review structure without leaving your editor. Pairs naturally with the MCP server so an agent can reason over your repo.

from kglite.code_tree import build

graph = build(".")                                # parse current directory
graph.cypher("""
    MATCH (f:Function)-[:CALLS]->(g:Function)
    RETURN g.name, count(f) AS callers
    ORDER BY callers DESC LIMIT 10
""")

Agentic AI — memory and tool use

Give an LLM a structured memory it can query. describe() emits a compact XML schema that fits in a system prompt, and the bundled MCP server exposes the whole graph as a Cypher tool — drop-in for Claude, Cursor, or any MCP-capable agent.

xml = graph.describe()                            # schema for the agent's context
prompt = f"You have a knowledge graph:\n{xml}\nAnswer via graph.cypher()."
# Or: python examples/mcp_server.py path/to/graph.kgl

RAG retrieval

Store documents, chunks, and entities together as one graph. Combine text_score() semantic similarity with Cypher structure — hybrid retrieval in one query, no second vector DB.

graph.cypher("""
    MATCH (c:Chunk)-[:IN_DOC]->(d:Document)
    RETURN c.text, d.title,
           text_score(c.embedding, $query_vec) AS score
    ORDER BY score DESC LIMIT 5
""", params={"query_vec": query_embedding})

Data exploration and analysis

Load CSVs or DataFrames, walk relationships, run graph algorithms (shortest path, centrality, community detection), and export — all from a notebook.

graph.add_nodes(users_df, node_type="User", unique_id_field="user_id", node_title_field="name")
graph.cypher("""
    MATCH path = shortestPath((a:User {name:'Alice'})-[*]-(b:User {name:'Eve'}))
    RETURN path
""")

Examples

The examples/ directory has runnable, self-contained scripts covering each of the use cases above:

  • code_graph.py — build a code knowledge graph from a source directory via code_tree.build. Produces Function, Class, Module, File nodes with CALLS, DEFINES, IMPORTS edges.
  • legal_graph.py — end-to-end add_nodes / add_connections from pandas DataFrames, covering laws, regulations, and court decisions with citation relationships. Good template for adapting to your own domain.
  • mcp_server.py — drop-in MCP server that exposes any .kgl file to an LLM (Claude, Cursor, …) as a Cypher query tool, with schema disclosure and code-graph–aware helpers.
  • spatial_graph.py — declarative CSV→graph loading via a JSON blueprint; regions, facilities, and sensors with lat/lon coordinates and pipeline-path traversal queries.
  • wikidata_disk.py — Wikidata-scale build + disk-mode storage; loads hundreds of millions of triples via load_ntriples into a mmap-backed graph.

Key Features

Feature Description
Cypher queries MATCH, CREATE, SET, DELETE, MERGE, aggregations, ORDER BY, LIMIT, SKIP
Semantic search Vector embeddings + text_score() for similarity ranking
Graph algorithms Shortest path, centrality, community detection, clustering
Spatial Coordinates, WKT geometry, distance and containment queries
Timeseries Time-indexed data with ts_*() Cypher functions
Bulk loading Fluent API (add_nodes / add_connections) for DataFrames
Blueprints Declarative CSV-to-graph loading via JSON config
Import/Export Save/load snapshots, GraphML, CSV export
AI integration describe() introspection, MCP server, agent prompts
Code analysis Parse codebases via tree-sitter (kglite.code_tree)

Documentation

Full docs at kglite.readthedocs.io:

Requirements

Python 3.10+ (CPython) | macOS (ARM/Intel), Linux (x86_64/aarch64), Windows (x86_64) | pandas >= 1.5

License

MIT — see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kglite-0.8.14-cp310-abi3-win_amd64.whl (5.9 MB view details)

Uploaded CPython 3.10+Windows x86-64

kglite-0.8.14-cp310-abi3-manylinux_2_39_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.39+ x86-64

kglite-0.8.14-cp310-abi3-macosx_11_0_arm64.whl (5.4 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

kglite-0.8.14-cp310-abi3-macosx_10_12_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file kglite-0.8.14-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: kglite-0.8.14-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.8.14-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4725c04ec54ec4a4501cfbbe5c254ab26b3512a0fece5f28fcceeaa4694bccde
MD5 57a09122370883ed6d5e65d187436582
BLAKE2b-256 c461521c536d82e1b14ff08bc6a9c5c87b357dcb03d13ce353b07f7a3db89fa1

See more details on using hashes here.

File details

Details for the file kglite-0.8.14-cp310-abi3-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.8.14-cp310-abi3-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 9487b92ef5186bee8f8bded2d48bd664647339b0b447223c8b91b6be1ecbe197
MD5 a1a1dc220f46a9e095f12a3aaa248d59
BLAKE2b-256 ba28297eb5a43a3855a647e6331780b4fccded6499d3ab0bd9e29056a29a3c64

See more details on using hashes here.

File details

Details for the file kglite-0.8.14-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.8.14-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3e606e94b369df8b11ea065316ef54d5f338a95044517c85267bb153b3114e91
MD5 a35825e809d271bdb5dfd1bb97bc13a7
BLAKE2b-256 192d48ace6f814f18a00c142a1475d856eb66250d8ce9bebdf7c79d0010d838a

See more details on using hashes here.

File details

Details for the file kglite-0.8.14-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.8.14-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 93c1bfec64f33d4d116f2709bb2087e8a988e446f5e589699616131973e6b5b2
MD5 f2185b95046131da90ead78ba917ac07
BLAKE2b-256 6b1a16655017784f83469b08d20b02421d766ee7181eb9889d3eafbfa951ecfb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page