A high-performance graph database library with Python bindings written in Rust

These details have not been verified by PyPI

Project links

Project description

KGLite — Lightweight Knowledge Graph for Python

KGLite is an embedded knowledge graph for Python: pip install, no server, no setup. It speaks Cypher, loads pandas DataFrames, and ships with the connective tissue for AI agents — an MCP server so Claude / Cursor / any MCP-capable LLM can query your graph as a tool, a describe() method that emits a compact XML schema for system prompts, and a code_tree parser that turns any source directory into a graph of functions, classes, calls, and imports across 9 languages.

Three storage modes scale from in-memory (millisecond queries on small graphs) to mmap-backed on disk (1 B+ edges, Wikidata-scale). Bundled dataset wrappers turn pip install kglite into a queryable Wikidata or petroleum-domain graph in one line.

Why KGLite?

Built for LLM agents — describe() XML schema, bundled MCP server, an agent-oriented query surface (cypher(), graph.select(...).traverse(...)), and structural validators (CALL orphan_node({type: ...}) YIELD node) for data-integrity checks that compose with the rest of Cypher.
One-line public datasets — wikidata.open(path) and sodir.open(path) handle fetch, parallel build, and caching; re-runs reload the cached graph instantly.
Codebase → graph in one line — kglite.code_tree.build(".") parses Python, Rust, TypeScript, Go, Java, C#, C++, and more into Function / Class / Module nodes with CALLS / DEFINES / IMPORTS edges.
Scales without leaving Python — in-memory for prototyping, mmap-backed for notebook-scale, disk-mode CSR for graphs too large for RAM. Same API across modes.
Query with Cypher — MATCH, MERGE, OPTIONAL MATCH, aggregations, parameters, semantic search via text_score().
DataFrames in, DataFrames out — bulk-load from pandas, query results as DataFrames.

Quick Start

pip install kglite

import pandas as pd
import kglite

# Three storage modes — pick by graph size:
#   default (in-memory)   — small/medium graphs, fastest queries
#   storage="mapped"      — mmap columns, RAM-friendly as you grow
#   storage="disk", path=…  — 100M+ nodes, Wikidata-scale, loaded lazily
graph = kglite.KnowledgeGraph()

# Bulk-load nodes from a DataFrame (also: add_nodes_bulk, from_blueprint,
# load_ntriples, or Cypher CREATE for ad-hoc inserts).
people = pd.DataFrame({
    "id":   ["alice", "bob", "eve"],
    "name": ["Alice", "Bob", "Eve"],
    "age":  [28, 35, 41],
    "city": ["Oslo", "Bergen", "Trondheim"],
})
graph.add_nodes(people, node_type="Person", unique_id_field="id", node_title_field="name")

# Bulk-load relationships the same way (also: add_connections_bulk,
# add_connections_from_source for auto-filter by loaded types).
knows = pd.DataFrame({"src": ["alice", "bob"], "tgt": ["bob", "eve"]})
graph.add_connections(knows, connection_type="KNOWS",
                      source_type="Person", source_id_field="src",
                      target_type="Person", target_id_field="tgt")

# Query — returns a ResultView (lazy; data stays in Rust until accessed).
result = graph.cypher("""
    MATCH (p:Person) WHERE p.age > 30
    RETURN p.name AS name, p.city AS city
    ORDER BY p.age DESC
""")
for row in result:
    print(row['name'], row['city'])

# Or get a pandas DataFrame directly.
df = graph.cypher("MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age", to_df=True)

# Persist to disk and reload.
graph.save("my_graph.kgl")
loaded = kglite.load("my_graph.kgl")

Try it instantly: ready-to-query datasets

Two bundled wrappers turn well-known public sources into queryable graphs without writing a loader. Each call handles the fetch + build + cache cycle, returns a KnowledgeGraph you can cypher() against, and respects a per-dataset cooldown so re-running just loads the cached graph in seconds. KGLite is independent of the upstream organisations — see each module docstring for non-affiliation notes.

Wikidata

Single-stream latest-truthy.nt.bz2 from dumps.wikimedia.org — parallel-decoded with a bit-level block scanner, parsed, built into a queryable graph in one call:

from kglite.datasets import wikidata

g = wikidata.open("/data/wd")                                    # full graph
g = wikidata.open("/data/wd", entity_limit_millions=100)         # 100M slice
g = wikidata.open("/data/wd", storage="memory",                  # in-memory, fast tests
                  entity_limit_millions=10)

Sodir (Norwegian Offshore Directorate)

Petroleum-domain graph from the public ArcGIS REST FeatureServer at factmaps.sodir.no — 33 baseline node types (Field, Wellbore, Discovery, Licence, Stratigraphy, …), ~480 k nodes, parallel-fetched and built in seconds:

from kglite.datasets import sodir

g = sodir.open("/data/sodir")  # in-memory by default; ~30s first run
g = sodir.open("/data/sodir", complement_blueprint="my_extras.json")  # extend

Two-tier cooldown — cheap row-count probes every 14 days; full per-dataset re-fetch every 30 days. Add a complement blueprint to extend the baseline (new node types, custom edges) without touching the canonical schema; the file is persisted into the workdir on first use and auto-loaded after.

Use Cases

Agentic AI — memory and tool use

Give an LLM a structured memory it can query. describe() emits a compact XML schema that fits in a system prompt, and the bundled MCP server exposes the whole graph as a Cypher tool — drop-in for Claude, Cursor, or any MCP-capable agent.

xml = graph.describe()                            # schema for the agent's context
prompt = f"You have a knowledge graph:\n{xml}\nAnswer via graph.cypher()."
# Or: python examples/mcp_server.py path/to/graph.kgl

Codebase analysis

Parse Python, Rust, TypeScript, Go, Java, C#, and C++ into a graph of functions, classes, calls, and imports. Trace who-calls-what, find dead code, and review structure without leaving your editor. Pairs naturally with the MCP server so an agent can reason over your repo.

from kglite.code_tree import build

graph = build(".")                                # parse current directory
graph.cypher("""
    MATCH (f:Function)-[:CALLS]->(g:Function)
    RETURN g.name, count(f) AS callers
    ORDER BY callers DESC LIMIT 10
""")

RAG retrieval

Store documents, chunks, and entities together as one graph. Combine text_score() semantic similarity with Cypher structure — hybrid retrieval in one query, no second vector DB.

graph.cypher("""
    MATCH (c:Chunk)-[:IN_DOC]->(d:Document)
    RETURN c.text, d.title,
           text_score(c.embedding, $query_vec) AS score
    ORDER BY score DESC LIMIT 5
""", params={"query_vec": query_embedding})

Data exploration and analysis

Load CSVs or DataFrames, walk relationships, run graph algorithms (shortest path, centrality, community detection), and export — all from a notebook.

graph.add_nodes(users_df, node_type="User", unique_id_field="user_id", node_title_field="name")
graph.cypher("""
    MATCH path = shortestPath((a:User {name:'Alice'})-[*]-(b:User {name:'Eve'}))
    RETURN path
""")

Structural validators — surface data-integrity gaps in one query

Six built-in CALL procedures find the gaps that aren't visible from normal queries: nodes with zero edges, missing-required-edge violations, two-step cycles, duplicate titles, more. They compose with the rest of Cypher — feed the output into WHERE, ORDER BY, or downstream aggregation in a single pass.

# Wellbores in our sodir graph that lack a production licence
graph.cypher("""
    CALL missing_required_edge({type: 'Wellbore', edge: 'IN_LICENCE'}) YIELD node
    RETURN node.id, node.title
""")  # 502 violations on the Sodir April-2026 snapshot

# Cross-reference flagged IDs against any query result, in one Cypher pass
graph.cypher("""
    MATCH (l:Licence {title: '057'})<-[:IN_LICENCE]-(w:Wellbore)
    WITH collect(w.id) AS pl057
    CALL missing_required_edge({type: 'Wellbore', edge: 'DRILLED_BY'}) YIELD node
    WHERE node.id IN pl057
    RETURN count(*) AS pl057_missing_drilled_by
""")

missing_required_edge and missing_inbound_edge validate the (type, edge) direction against the graph's actual schema and refuse to execute when misused. See docs/guides/cypher.md for the full procedure list.

Examples

The examples/ directory has runnable, self-contained scripts covering each of the use cases above:

code_graph.py — build a code knowledge graph from a source directory via code_tree.build. Produces Function, Class, Module, File nodes with CALLS, DEFINES, IMPORTS edges.
legal_graph.py — end-to-end add_nodes / add_connections from pandas DataFrames, covering laws, regulations, and court decisions with citation relationships. Good template for adapting to your own domain.
mcp_server.py — drop-in MCP server that exposes any .kgl file to an LLM (Claude, Cursor, …) as a Cypher query tool, with schema disclosure and code-graph–aware helpers.
spatial_graph.py — declarative CSV→graph loading via a JSON blueprint; regions, facilities, and sensors with lat/lon coordinates and pipeline-path traversal queries.

For Wikidata- and Sodir-scale builds, see the Public datasets section above — kglite.datasets.wikidata.open(...) and kglite.datasets.sodir.open(...) cover those workflows in one call.

Benchmarks

KGLite builds and queries Wikidata-scale graphs on a laptop. Measured with bench/wiki_benchmark.py on an M-series MacBook.

Ingest — full pipeline from compressed N-Triples to a queryable graph:

dataset	triples	nodes	edges	ingest	throughput	peak RAM
wiki100m	100 M	938 K	748 K	29 s	3.4 M triples/s	1.3 GB
wiki500m	500 M	5.6 M	6.7 M	157 s	3.2 M triples/s	5.2 GB
wiki1000m	1 B	14.7 M	15.4 M	395 s	2.5 M triples/s	7.0 GB

Reloading a saved 1 B-triple graph from disk (7 GB on-disk): 3.5 s.

Query latency on the 1 B-triple graph (mapped storage). Type names match the labels Wikidata ships per language — with languages=["en"] (the default), Q5 is renamed to human:

Cypher	wall
`MATCH (n)-[:P31]->(:human) RETURN count(n)` — typed aggregation	0.5 ms
`MATCH (a)-[:P31]->(b)-[:P279]->(c) LIMIT 10` — 2-hop typed	0.9 ms
`MATCH (a)-[:P31]->(b {nid:'Q64'}) RETURN a LIMIT 20` — pivot	1 ms
`MATCH (a)-[:P31]->(:human)` `MATCH (a)-[:P27]->(c) LIMIT 10` — join	44 ms

Disk and mapped storage track within 1 % on build; mapped wins on query shapes backed by its in-memory inverted index, disk wins on unbounded typed traversals by staying on sorted-CSR mmap I/O.

No server, no tuning, same Python process as your code.

Key Features

Feature	Description
Cypher queries	MATCH, CREATE, SET, DELETE, MERGE, UNION/INTERSECT/EXCEPT, aggregations (incl. `median`, `percentile_cont`, `variance`), `reduce()`, ORDER BY, LIMIT, SKIP
Semantic search	Vector embeddings + `text_score()` for similarity ranking
Text predicates	`text_edit_distance`, `text_normalize`, `text_jaccard`, `text_ngrams`, `text_contains_any` / `text_starts_with_any` for fuzzy match
Graph algorithms	Shortest path (BFS or Dijkstra via `weight_property`), centrality, community detection, clustering
Structural validators	14 `CALL` procedures: `orphan_node`, `missing_required_edge`, `cycle_2step`, `inverse_violation`, `transitivity_violation`, `cardinality_violation`, `parallel_edges`, `null_property`, `type_domain/range_violation`, etc. — agent-discoverable integrity checks composable with normal Cypher
Spatial	Coordinates, WKT geometry, distance + containment, geometry primitives (`geom_buffer`, `geom_convex_hull`, `geom_union/intersection/difference`, `geom_is_valid`, `geom_length`), `kg_knn` k-nearest-neighbour
Timeseries	Time-indexed data with `ts_*()` Cypher functions
Bulk loading	Fluent API (`add_nodes` / `add_connections`) for DataFrames
Blueprints	Declarative CSV-to-graph loading via JSON config
Import/Export	Save/load snapshots, GraphML, CSV export
AI integration	`describe()` introspection, MCP server, agent prompts
Code analysis	Parse codebases via tree-sitter (`kglite.code_tree`)

Documentation

Full docs at kglite.readthedocs.io:

Getting Started — installation, first graph, core concepts
Cypher Guide — queries, mutations, parameters
Semantic Search — embeddings, vector search
AI Agents — MCP server, describe(), agent prompts
API Reference — full auto-generated reference

Requirements

Python 3.10+ (CPython) | macOS (ARM), Linux (x86_64/aarch64), Windows (x86_64) | pandas >= 1.5

License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.7

May 4, 2026

0.9.6

May 3, 2026

0.9.5

May 2, 2026

0.9.4

May 2, 2026

0.9.3

May 2, 2026

0.9.2

May 2, 2026

0.9.0

May 2, 2026

0.8.41

May 1, 2026

0.8.40

May 1, 2026

0.8.39

May 1, 2026

0.8.38

May 1, 2026

0.8.37

May 1, 2026

0.8.36

May 1, 2026

0.8.35

May 1, 2026

0.8.34

Apr 30, 2026

0.8.33

Apr 30, 2026

0.8.32

Apr 30, 2026

0.8.31

Apr 30, 2026

0.8.30

Apr 30, 2026

0.8.29

Apr 30, 2026

0.8.28

Apr 30, 2026

0.8.27

Apr 29, 2026

0.8.26

Apr 28, 2026

0.8.25

Apr 27, 2026

0.8.24

Apr 27, 2026

0.8.23

Apr 27, 2026

0.8.22

Apr 27, 2026

0.8.21

Apr 27, 2026

0.8.20

Apr 27, 2026

0.8.19

Apr 26, 2026

0.8.18

Apr 26, 2026

0.8.17

Apr 26, 2026

0.8.16

Apr 25, 2026

0.8.15

Apr 25, 2026

0.8.14

Apr 24, 2026

0.8.12

Apr 23, 2026

0.8.11

Apr 22, 2026

0.8.10

Apr 20, 2026

0.8.9

Apr 20, 2026

0.8.8

Apr 20, 2026

0.8.7

Apr 19, 2026

0.8.6

Apr 19, 2026

0.8.5

Apr 19, 2026

0.8.4

Apr 19, 2026

0.8.3

Apr 19, 2026

0.8.2

Apr 19, 2026

0.8.1

Apr 19, 2026

0.8.0

Apr 18, 2026

0.7.17

Apr 17, 2026

0.7.16

Apr 17, 2026

0.7.15

Apr 17, 2026

0.7.14

Apr 16, 2026

0.7.12

Apr 16, 2026

0.7.11

Apr 16, 2026

0.7.10

Apr 15, 2026

0.7.9

Apr 15, 2026

0.7.8

Apr 15, 2026

0.7.7

Apr 15, 2026

0.7.6

Apr 12, 2026

0.7.5

Apr 10, 2026

0.7.4

Apr 8, 2026

0.7.3

Apr 8, 2026

0.7.2

Apr 6, 2026

0.7.1

Apr 6, 2026

0.7.0

Apr 4, 2026

0.6.18

Mar 30, 2026

0.6.16

Mar 30, 2026

0.6.15

Mar 30, 2026

0.6.14

Mar 30, 2026

0.6.12

Mar 30, 2026

0.6.11

Mar 29, 2026

0.6.10

Mar 29, 2026

0.6.9

Mar 22, 2026

0.6.8

Mar 19, 2026

0.6.7

Mar 18, 2026

0.6.6

Mar 18, 2026

0.6.5

Mar 18, 2026

0.6.4

Mar 18, 2026

0.6.2

Mar 9, 2026

0.5.89

Mar 6, 2026

0.5.88

Mar 4, 2026

0.5.87

Mar 3, 2026

0.5.86

Mar 3, 2026

0.5.85

Mar 3, 2026

0.5.83

Mar 3, 2026

0.5.82

Mar 3, 2026

0.5.81

Mar 2, 2026

0.5.80

Mar 2, 2026

0.5.79

Mar 2, 2026

0.5.78

Mar 2, 2026

0.5.77

Mar 1, 2026

0.5.76

Mar 1, 2026

0.5.75

Mar 1, 2026

0.5.74

Mar 1, 2026

0.5.73

Feb 27, 2026

0.5.72

Feb 27, 2026

0.5.71

Feb 27, 2026

0.5.70

Feb 26, 2026

0.5.69

Feb 26, 2026

0.5.68

Feb 26, 2026

0.5.67

Feb 26, 2026

0.5.66

Feb 26, 2026

0.5.65

Feb 26, 2026

0.5.64

Feb 25, 2026

0.5.63

Feb 25, 2026

0.5.62

Feb 25, 2026

0.5.61

Feb 24, 2026

0.5.60

Feb 24, 2026

0.5.59

Feb 24, 2026

0.5.58

Feb 24, 2026

0.5.56

Feb 23, 2026

0.5.55

Feb 23, 2026

0.5.54

Feb 23, 2026

0.5.53

Feb 22, 2026

0.5.52

Feb 22, 2026

0.5.51

Feb 21, 2026

0.5.50

Feb 21, 2026

0.5.49

Feb 20, 2026

0.5.48

Feb 20, 2026

0.5.47

Feb 20, 2026

0.5.46

Feb 20, 2026

0.5.45

Feb 20, 2026

0.5.44

Feb 20, 2026

0.5.43

Feb 19, 2026

0.5.42

Feb 19, 2026

0.5.41

Feb 19, 2026

0.5.40

Feb 19, 2026

0.5.39

Feb 19, 2026

0.5.38

Feb 19, 2026

0.5.37

Feb 19, 2026

0.5.36

Feb 18, 2026

0.5.35

Feb 18, 2026

0.5.34

Feb 18, 2026

0.5.33

Feb 18, 2026

0.5.32

Feb 18, 2026

0.5.31

Feb 17, 2026

0.5.30

Feb 17, 2026

0.5.29

Feb 17, 2026

0.5.28

Feb 17, 2026

0.5.27

Feb 17, 2026

0.5.26

Feb 16, 2026

0.5.25

Feb 15, 2026

0.5.24

Feb 15, 2026

0.5.23

Feb 15, 2026

0.5.22

Feb 15, 2026

0.5.21

Feb 15, 2026

0.5.20

Feb 14, 2026

0.5.19

Feb 14, 2026

0.5.18

Feb 14, 2026

0.5.17

Feb 14, 2026

0.5.16

Feb 13, 2026

0.5.15

Feb 13, 2026

0.5.14

Feb 12, 2026

0.5.13

Feb 12, 2026

0.5.11

Feb 12, 2026

0.5.10

Feb 12, 2026

0.5.9

Feb 11, 2026

0.5.8

Feb 11, 2026

0.5.7

Feb 11, 2026

0.5.6

Feb 11, 2026

0.5.5

Feb 11, 2026

0.5.4

Feb 11, 2026

0.5.2

Feb 11, 2026

0.5.1

Feb 11, 2026

0.5.0

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kglite-0.9.7-cp310-abi3-win_amd64.whl (6.6 MB view details)

Uploaded May 4, 2026 CPython 3.10+Windows x86-64

kglite-0.9.7-cp310-abi3-manylinux_2_39_x86_64.whl (6.7 MB view details)

Uploaded May 4, 2026 CPython 3.10+manylinux: glibc 2.39+ x86-64

kglite-0.9.7-cp310-abi3-macosx_11_0_arm64.whl (6.0 MB view details)

Uploaded May 4, 2026 CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file kglite-0.9.7-cp310-abi3-win_amd64.whl.

File metadata

Download URL: kglite-0.9.7-cp310-abi3-win_amd64.whl
Upload date: May 4, 2026
Size: 6.6 MB
Tags: CPython 3.10+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.7-cp310-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`03be60209d702023bcd6d088e901eba2b2dcb48aa9087a5da14d1c28a723df27`
MD5	`9809fa0df274ccffd72254864b6954c5`
BLAKE2b-256	`d77233f736f816f6af4858734d0a3f6076d9c6810256e044f94ecb02a285e3dd`

See more details on using hashes here.

File details

Details for the file kglite-0.9.7-cp310-abi3-manylinux_2_39_x86_64.whl.

File metadata

Download URL: kglite-0.9.7-cp310-abi3-manylinux_2_39_x86_64.whl
Upload date: May 4, 2026
Size: 6.7 MB
Tags: CPython 3.10+, manylinux: glibc 2.39+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.7-cp310-abi3-manylinux_2_39_x86_64.whl
Algorithm	Hash digest
SHA256	`3c59e746e416e7db77ebcf5012a2eeedf4cc537c7f83a9a2f72535274207869f`
MD5	`e0ef24d71d029f396779f32f05e6a647`
BLAKE2b-256	`c7e7500f1d3bb555ef4eaaff2b78d2a9b897a39d0325aa3bfb88f1c8c06d0290`

See more details on using hashes here.

File details

Details for the file kglite-0.9.7-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: kglite-0.9.7-cp310-abi3-macosx_11_0_arm64.whl
Upload date: May 4, 2026
Size: 6.0 MB
Tags: CPython 3.10+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kglite-0.9.7-cp310-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`65984b8207b1aabc4ef558640b59a9ce49e8894ee0e828c1bb655c5a5e8cfb1c`
MD5	`774965a72b64400d506432b1da23a1de`
BLAKE2b-256	`dfecee13b264231cec12bbb0a5c5fdc5f6d42e7f9c226327b0dd53ee5d9c039d`

See more details on using hashes here.

kglite 0.9.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

KGLite — Lightweight Knowledge Graph for Python

Why KGLite?

Quick Start

Try it instantly: ready-to-query datasets

Wikidata

Sodir (Norwegian Offshore Directorate)

Use Cases

Agentic AI — memory and tool use

Codebase analysis

RAG retrieval

Data exploration and analysis

Structural validators — surface data-integrity gaps in one query

Examples

Benchmarks

Key Features

Documentation

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes