Skip to main content

A high-performance graph database library with Python bindings written in Rust

Project description

KGLite — Lightweight Knowledge Graph for Python

PyPI version Python versions License: MIT Docs

KGLite is an embedded knowledge graph for Python: pip install, no server, no setup. It speaks Cypher, loads pandas DataFrames, and ships with the connective tissue for AI agents — an MCP server so Claude / Cursor / any MCP-capable LLM can query your graph as a tool, a describe() method that emits a compact XML schema for system prompts, and a code_tree parser that turns any source directory into a graph of functions, classes, calls, and imports across 9 languages.

Three storage modes scale from in-memory (millisecond queries on small graphs) to mmap-backed on disk (1 B+ edges, Wikidata-scale). Bundled dataset wrappers turn pip install kglite into a queryable Wikidata or petroleum-domain graph in one line.

Why KGLite?

  • Built for LLM agentsdescribe() XML schema, bundled MCP server, an agent-oriented query surface (cypher(), graph.select(...).traverse(...)), and structural validators (CALL orphan_node({type: ...}) YIELD node) for data-integrity checks that compose with the rest of Cypher.
  • One-line public datasetswikidata.open(path) and sodir.open(path) handle fetch, parallel build, and caching; re-runs reload the cached graph instantly.
  • Codebase → graph in one linekglite.code_tree.build(".") parses Python, Rust, TypeScript, Go, Java, C#, C++, and more into Function / Class / Module nodes with CALLS / DEFINES / IMPORTS edges.
  • Scales without leaving Python — in-memory for prototyping, mmap-backed for notebook-scale, disk-mode CSR for graphs too large for RAM. Same API across modes.
  • Query with CypherMATCH, MERGE, OPTIONAL MATCH, aggregations, parameters, semantic search via text_score().
  • DataFrames in, DataFrames out — bulk-load nodes and edges from pandas with add_nodes / add_connections, query results back as DataFrames. End-to-end walkthrough in the Data Loading guide.

Quick Start

pip install kglite
import pandas as pd
import kglite

# Three storage modes — pick by graph size:
#   default (in-memory)   — small/medium graphs, fastest queries
#   storage="mapped"      — mmap columns, RAM-friendly as you grow
#   storage="disk", path=…  — 100M+ nodes, Wikidata-scale, loaded lazily
graph = kglite.KnowledgeGraph()

# Bulk-load nodes from a DataFrame (also: add_nodes_bulk, from_blueprint,
# load_ntriples, or Cypher CREATE for ad-hoc inserts).
people = pd.DataFrame({
    "id":   ["alice", "bob", "eve"],
    "name": ["Alice", "Bob", "Eve"],
    "age":  [28, 35, 41],
    "city": ["Oslo", "Bergen", "Trondheim"],
})
graph.add_nodes(people, node_type="Person", unique_id_field="id", node_title_field="name")

# Bulk-load relationships the same way (also: add_connections_bulk,
# add_connections_from_source for auto-filter by loaded types).
knows = pd.DataFrame({"src": ["alice", "bob"], "tgt": ["bob", "eve"]})
graph.add_connections(knows, connection_type="KNOWS",
                      source_type="Person", source_id_field="src",
                      target_type="Person", target_id_field="tgt")

# Query — returns a ResultView (lazy; data stays in Rust until accessed).
result = graph.cypher("""
    MATCH (p:Person) WHERE p.age > 30
    RETURN p.name AS name, p.city AS city
    ORDER BY p.age DESC
""")
for row in result:
    print(row['name'], row['city'])

# Or get a pandas DataFrame directly.
df = graph.cypher("MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age", to_df=True)

# Persist to disk and reload.
graph.save("my_graph.kgl")
loaded = kglite.load("my_graph.kgl")

Try it instantly: ready-to-query datasets

Two bundled wrappers turn well-known public sources into queryable graphs without writing a loader. Each call handles the fetch + build + cache cycle, returns a KnowledgeGraph you can cypher() against, and respects a per-dataset cooldown so re-running just loads the cached graph in seconds. KGLite is independent of the upstream organisations — see each module docstring for non-affiliation notes.

Wikidata

Single-stream latest-truthy.nt.bz2 from dumps.wikimedia.org — parallel-decoded with a bit-level block scanner, parsed, built into a queryable graph in one call:

from kglite.datasets import wikidata

g = wikidata.open("/data/wd")                                    # full graph
g = wikidata.open("/data/wd", entity_limit_millions=100)         # 100M slice
g = wikidata.open("/data/wd", storage="memory",                  # in-memory, fast tests
                  entity_limit_millions=10)

Sodir (Norwegian Offshore Directorate)

Petroleum-domain graph from the public ArcGIS REST FeatureServer at factmaps.sodir.no — 33 baseline node types (Field, Wellbore, Discovery, Licence, Stratigraphy, …), ~480 k nodes, parallel-fetched and built in seconds:

from kglite.datasets import sodir

g = sodir.open("/data/sodir")  # in-memory by default; ~30s first run
g = sodir.open("/data/sodir", complement_blueprint="my_extras.json")  # extend

Two-tier cooldown — cheap row-count probes every 14 days; full per-dataset re-fetch every 30 days. Add a complement blueprint to extend the baseline (new node types, custom edges) without touching the canonical schema; the file is persisted into the workdir on first use and auto-loaded after.

Use Cases

Agentic AI — memory and tool use

Give an LLM a structured memory it can query. describe() emits a compact XML schema that fits in a system prompt, and the bundled MCP server exposes the whole graph as a Cypher tool — drop-in for Claude, Cursor, or any MCP-capable agent.

xml = graph.describe()                            # schema for the agent's context
prompt = f"You have a knowledge graph:\n{xml}\nAnswer via graph.cypher()."
# Or serve the whole graph over MCP. Since 0.9.14 the server is a
# Rust-native single binary; install from a kglite source clone:
git clone https://github.com/kkollsga/kglite && cd kglite
cargo install --path crates/kglite-mcp-server
kglite-mcp-server --graph path/to/graph.kgl

Migrating from pip install "kglite[mcp]"? The 0.9.13 Python server was replaced by the Rust binary in 0.9.14. The YAML manifest schema is unchanged — drop your existing <basename>_mcp.yaml next to a .kgl and the new binary picks it up. Update any pinned paths (~/.claude.json, ~/.claude/settings.json) from the old conda-bin location to wherever cargo placed the new binary, typically ~/.cargo/bin/kglite-mcp-server.

Multiple Pythons on your system? PyO3 statically links the binary against one Python at build time. Install kglite + embedder deps into that Python, not a sub-env. Discover which one: otool -L $(which kglite-mcp-server) | grep -i python (macOS), ldd $(which kglite-mcp-server) | grep -i python (Linux). To force a specific interpreter at install time, prefix the cargo install with PYO3_PYTHON=/abs/path/to/python. See the MCP guide for the long version.

Drop a <basename>_mcp.yaml next to the graph to auto-extend the tool surface — source_root: for read/grep/list over your source files, inline Cypher templates as named tools, optional Python hooks behind --trust-tools. No fork required for most customisation. See the MCP guide.

Codebase analysis

Parse Python, Rust, TypeScript, Go, Java, C#, and C++ into a graph of functions, classes, calls, and imports. Trace who-calls-what, find dead code, and review structure without leaving your editor. Pairs naturally with the MCP server so an agent can reason over your repo.

from kglite.code_tree import build

graph = build(".")                                # parse current directory
graph.cypher("""
    MATCH (f:Function)-[:CALLS]->(g:Function)
    RETURN g.name, count(f) AS callers
    ORDER BY callers DESC LIMIT 10
""")

RAG retrieval

Store documents, chunks, and entities together as one graph. Combine text_score() semantic similarity with Cypher structure — hybrid retrieval in one query, no second vector DB.

graph.cypher("""
    MATCH (c:Chunk)-[:IN_DOC]->(d:Document)
    RETURN c.text, d.title,
           text_score(c.embedding, $query_vec) AS score
    ORDER BY score DESC LIMIT 5
""", params={"query_vec": query_embedding})

Data exploration and analysis

Load CSVs or DataFrames, walk relationships, run graph algorithms (shortest path, centrality, community detection), and export — all from a notebook.

graph.add_nodes(users_df, node_type="User", unique_id_field="user_id", node_title_field="name")
graph.cypher("""
    MATCH path = shortestPath((a:User {name:'Alice'})-[*]-(b:User {name:'Eve'}))
    RETURN path
""")

Structural validators — surface data-integrity gaps in one query

Six built-in CALL procedures find the gaps that aren't visible from normal queries: nodes with zero edges, missing-required-edge violations, two-step cycles, duplicate titles, more. They compose with the rest of Cypher — feed the output into WHERE, ORDER BY, or downstream aggregation in a single pass.

# Wellbores in our sodir graph that lack a production licence
graph.cypher("""
    CALL missing_required_edge({type: 'Wellbore', edge: 'IN_LICENCE'}) YIELD node
    RETURN node.id, node.title
""")  # 502 violations on the Sodir April-2026 snapshot

# Cross-reference flagged IDs against any query result, in one Cypher pass
graph.cypher("""
    MATCH (l:Licence {title: '057'})<-[:IN_LICENCE]-(w:Wellbore)
    WITH collect(w.id) AS pl057
    CALL missing_required_edge({type: 'Wellbore', edge: 'DRILLED_BY'}) YIELD node
    WHERE node.id IN pl057
    RETURN count(*) AS pl057_missing_drilled_by
""")

missing_required_edge and missing_inbound_edge validate the (type, edge) direction against the graph's actual schema and refuse to execute when misused. See docs/guides/cypher.md for the full procedure list.

Examples

The examples/ directory has runnable, self-contained artifacts covering each of the use cases above:

  • conference_graph_mcp.yaml — annotated MCP manifest. Drop next to a .kgl file and kglite-mcp-server auto-loads it: source_root: registers sandboxed file-access tools, inline Cypher templates become typed MCP tools, and a trust-gated python: hook adds custom logic. The zero-Python-fork starting point for any new project.
  • legal_graph.py — end-to-end add_nodes / add_connections from pandas DataFrames, covering laws, regulations, and court decisions with citation relationships. The imperative-API alternative when you're building the graph itself, not configuring a server on top.
  • code_graph.py — build a code knowledge graph from a source directory via code_tree.build. Produces Function, Class, Module, File nodes with CALLS, DEFINES, IMPORTS edges.
  • spatial_graph.py — declarative CSV→graph loading via a JSON blueprint; regions, facilities, and sensors with lat/lon coordinates and pipeline-path traversal queries.
  • crates/kglite-mcp-server/ — Rust-native single-binary MCP server (built on rmcp + the mcp-methods framework). Reach for it when the manifest doesn't express what you need; the binary is the reference for layering domain-specific tools on top of the generic source / GitHub / python-tool surface.

For Wikidata- and Sodir-scale builds, see the Public datasets section above — kglite.datasets.wikidata.open(...) and kglite.datasets.sodir.open(...) cover those workflows in one call.

Benchmarks

KGLite builds and queries Wikidata-scale graphs on a laptop. Measured with bench/wiki_benchmark.py on an M-series MacBook.

Ingest — full pipeline from compressed N-Triples to a queryable graph:

dataset triples nodes edges ingest throughput peak RAM
wiki100m 100 M 938 K 748 K 29 s 3.4 M triples/s 1.3 GB
wiki500m 500 M 5.6 M 6.7 M 157 s 3.2 M triples/s 5.2 GB
wiki1000m 1 B 14.7 M 15.4 M 395 s 2.5 M triples/s 7.0 GB

Reloading a saved 1 B-triple graph from disk (7 GB on-disk): 3.5 s.

Query latency on the 1 B-triple graph (mapped storage). Type names match the labels Wikidata ships per language — with languages=["en"] (the default), Q5 is renamed to human:

Cypher wall
MATCH (n)-[:P31]->(:human) RETURN count(n) — typed aggregation 0.5 ms
MATCH (a)-[:P31]->(b)-[:P279]->(c) LIMIT 10 — 2-hop typed 0.9 ms
MATCH (a)-[:P31]->(b {nid:'Q64'}) RETURN a LIMIT 20 — pivot 1 ms
MATCH (a)-[:P31]->(:human) MATCH (a)-[:P27]->(c) LIMIT 10 — join 44 ms

Disk and mapped storage track within 1 % on build; mapped wins on query shapes backed by its in-memory inverted index, disk wins on unbounded typed traversals by staying on sorted-CSR mmap I/O.

No server, no tuning, same Python process as your code.

Key Features

Feature Description
Cypher queries MATCH, CREATE, SET, DELETE, MERGE, UNION/INTERSECT/EXCEPT, aggregations (incl. median, percentile_cont, variance), reduce(), ORDER BY, LIMIT, SKIP
Semantic search Vector embeddings + text_score() for similarity ranking
Text predicates text_edit_distance, text_normalize, text_jaccard, text_ngrams, text_contains_any / text_starts_with_any for fuzzy match
Graph algorithms Shortest path (BFS or Dijkstra via weight_property), centrality, community detection, clustering
Structural validators 14 CALL procedures: orphan_node, missing_required_edge, cycle_2step, inverse_violation, transitivity_violation, cardinality_violation, parallel_edges, null_property, type_domain/range_violation, etc. — agent-discoverable integrity checks composable with normal Cypher
Spatial Coordinates, WKT geometry, distance + containment, geometry primitives (geom_buffer, geom_convex_hull, geom_union/intersection/difference, geom_is_valid, geom_length), kg_knn k-nearest-neighbour
Timeseries Time-indexed data with ts_*() Cypher functions
Bulk loading Fluent API (add_nodes / add_connections) for DataFrames
Blueprints Declarative CSV-to-graph loading via JSON config
Import/Export Save/load snapshots, GraphML, CSV export
AI integration describe() introspection, MCP server, agent prompts
Code analysis Parse codebases via tree-sitter (kglite.code_tree)

Documentation

Full docs at kglite.readthedocs.io:

Requirements

Python 3.10+ (CPython) | macOS (ARM), Linux (x86_64/aarch64), Windows (x86_64) | pandas >= 1.5

License

MIT — see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kglite-0.9.18-cp313-cp313-win_amd64.whl (24.4 MB view details)

Uploaded CPython 3.13Windows x86-64

kglite-0.9.18-cp313-cp313-manylinux_2_39_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

kglite-0.9.18-cp313-cp313-macosx_11_0_arm64.whl (22.9 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

kglite-0.9.18-cp312-cp312-win_amd64.whl (24.4 MB view details)

Uploaded CPython 3.12Windows x86-64

kglite-0.9.18-cp312-cp312-manylinux_2_39_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

kglite-0.9.18-cp312-cp312-macosx_11_0_arm64.whl (22.9 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kglite-0.9.18-cp311-cp311-win_amd64.whl (24.4 MB view details)

Uploaded CPython 3.11Windows x86-64

kglite-0.9.18-cp311-cp311-manylinux_2_39_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

kglite-0.9.18-cp311-cp311-macosx_11_0_arm64.whl (22.9 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kglite-0.9.18-cp310-cp310-win_amd64.whl (24.4 MB view details)

Uploaded CPython 3.10Windows x86-64

kglite-0.9.18-cp310-cp310-manylinux_2_39_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

kglite-0.9.18-cp310-cp310-macosx_11_0_arm64.whl (22.9 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file kglite-0.9.18-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: kglite-0.9.18-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.18-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 7f00a6deab4ec73d45569b7d06ca0911c246193a30efbcf03fcac55e7d485857
MD5 7c7cbea8b1db3456b2a63b244e520156
BLAKE2b-256 6bd160d69cd0a333b20ddaddb633c25d72aa71d5dbfaa76411f1391a19c04a69

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 a0bfb4ad2627fe24dbb82bfd4a2d351a331867713fbd3aa99d33eae8e06028fc
MD5 ac37882dffdaf82e10e87de008099af6
BLAKE2b-256 9c5eaf2b03c8ecab51c033ca3d6e26d8b68bb1f818ad67d4cf2dc822b8573623

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ff0a3e46742b31930dd54ff18607862cc2f3f9663f8497410524885ca39f4e6a
MD5 81e1e26ef29f2c21f89b8438aac543a0
BLAKE2b-256 da8e6702b0935448252d2e1bf85d9c3144ca7c1aea3963abf65a4cc5e2d8d15a

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: kglite-0.9.18-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.18-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 66d0800eb2de262517c0051556fa306c95e3b578fa3e7427540fbfaf9ec1ef40
MD5 d6bb8ae8c7c9d8019f5a7de699846289
BLAKE2b-256 97848a689d4910f0b7f80126ec28cd9a791a21bade8e1a2b19242354f9aa070f

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 4693c3ac48fbe80f0b6c48ce703d05f7b0d0ce65807fb84d83d3cf097537d57b
MD5 b40c881a14474b2af654315e9cf7b424
BLAKE2b-256 b306415a5a4a0909f209d9fb300f7363d6d05939dce49febcbee8248d66a4424

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 39e8961f8ed7888122bb718fcdc5620c0f76d36f43b64ebe02ad58655c714206
MD5 a9f4a5c75bae2fbca55e8b516007c795
BLAKE2b-256 213f6986a67a6fd9c3b2652aa2d6a2e090a124e0d83790f81872aedafb7e32bf

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: kglite-0.9.18-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.18-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 40799401146b574a5b66100aef6614bc7a572aac2e5a3db9789ada1dc2a7717e
MD5 2745574b58ea5d7888cedd438c5ef078
BLAKE2b-256 092256be97019ef4458e08a965dd262bc4fafc6743354a20a6434b977e1146f9

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 94f73448de02e7b715794ddf0dbfac2ace7804e27e20b60695b45ce694083ee3
MD5 242100c2b2d4fd74d1d80eeca50c230b
BLAKE2b-256 2d9655c5f528cb624e529e9bdd093dc1a5e5d4968b82dce6a5a9d52c85efa5ed

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 454ae2b7dacd0f2f3540161faab28363519d1334bfc967b36d24d79331386128
MD5 2c6498781132973b32e4e20517855c14
BLAKE2b-256 82872215ae339b3a9b4c3813a25218372a36a4103729285c2f0c36741b95f487

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: kglite-0.9.18-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.18-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 80d1fbdd553428e3dd090bb8e1220884d3769fc8cb3a6d03b50cdbad761e60fc
MD5 6700f96fdf9b0156c6f3ba6aaab6b2c6
BLAKE2b-256 d11cc0e773e91ba892a290f2baf0b3c403467036a6d73272453304761d7e9a55

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 88ec8137ab44b1a727d7f6fa647802aefa3b99165a6def24990b335e50500c0e
MD5 c749f5f85193be7ad4ed7eddf4f02ce4
BLAKE2b-256 f6f47349ebe9ac047c5ec08e2c2b782050c432f940262799b15426721868ec8b

See more details on using hashes here.

File details

Details for the file kglite-0.9.18-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.9.18-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 da395899f9e05985fdcc79726b3c6a6992ce025674ca6473560d9cf594336ad6
MD5 1508dfc3294856a68f24bbe560a1e656
BLAKE2b-256 3753ea6a505e4ffae803af50fb27980b1281269bf15720afeaa75cb93a458394

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page