HNSW vector search + graph traversal + GGUF embeddings + Node2Vec for SQLite
Project description
sqlite-muninn
Odin's mythic raven of Memory.
Huginn and Muninn fly each day over the wide world.
I fear for Huginn that he may not return,
yet I worry more for Muninn.
- Poetic Edda (Grimnismal, stanza 20)
Odin fears losing Memory more than Thought.
This project aims to build agentic memory and knowledge graph primitives for sqlite as a native C extension. It is an advanced collection of knowledge graph primitives like Vector Similarity Search, HNSW Indexes, Graph database, Community Detection, Node2Vec capabilities and loading GGUF models via llama.cpp integration.
Features
- HNSW Vector Index — O(log N) approximate nearest neighbor search with incremental insert/delete
- Graph Traversal — BFS, DFS, shortest path, connected components, PageRank on any edge table, dbt syntax graph node selection.
llama.cppnative models; Load and use GGUF LLM models natively in sqlite.- Centrality Measures — Degree, betweenness (Brandes), and closeness centrality with weighted/temporal support
- Community Detection — Leiden algorithm for discovering graph communities with modularity scoring
- Node2Vec — Learn structural node embeddings from graph topology, store in HNSW for similarity search
- Zero dependencies — compiles to a single
.dylib/.so/.dll - SIMD accelerated — ARM NEON and x86 SSE distance functions
Build
Requires SQLite development headers and a C11 compiler.
# macOS (Homebrew SQLite recommended)
brew install sqlite
make all
# Linux
sudo apt-get install libsqlite3-dev
make all
# Run tests
make test # C unit tests
make test-python # Python integration tests
make test-all # Both
Quick Start
.load ./muninn
-- Create an HNSW vector index
CREATE VIRTUAL TABLE my_vectors USING hnsw_index(
dimensions=384, metric='cosine', m=16, ef_construction=200
);
-- Insert vectors
INSERT INTO my_vectors (rowid, vector) VALUES (1, ?); -- 384-dim float32 blob
-- KNN search
SELECT rowid, distance FROM my_vectors
WHERE vector MATCH ?query AND k = 10 AND ef_search = 64;
-- Graph traversal on any edge table
SELECT node, depth FROM graph_bfs
WHERE edge_table = 'friendships' AND src_col = 'user_a'
AND dst_col = 'user_b' AND start_node = 'alice' AND max_depth = 3
AND direction = 'both';
-- Connected components
SELECT node, component_id, component_size FROM graph_components
WHERE edge_table = 'friendships' AND src_col = 'user_a' AND dst_col = 'user_b';
-- PageRank
SELECT node, rank FROM graph_pagerank
WHERE edge_table = 'citations' AND src_col = 'citing' AND dst_col = 'cited'
AND damping = 0.85 AND iterations = 20;
-- Betweenness centrality (find bridge nodes)
SELECT node, centrality FROM graph_betweenness
WHERE edge_table = 'friendships' AND src_col = 'user_a' AND dst_col = 'user_b'
AND direction = 'both'
ORDER BY centrality DESC LIMIT 10;
-- Community detection (Leiden algorithm)
SELECT node, community_id, modularity FROM graph_leiden
WHERE edge_table = 'friendships' AND src_col = 'user_a' AND dst_col = 'user_b';
-- Learn structural embeddings from graph topology
SELECT node2vec_train(
'friendships', 'user_a', 'user_b', 'my_vectors',
64, 1.0, 1.0, 10, 80, 5, 5, 0.025, 5
);
Examples
Self-contained examples in the examples/ directory:
| Example | Demonstrates |
|---|---|
| Semantic Search | HNSW index, KNN queries, point lookup, delete |
| Movie Recommendations | Vector similarity for content-based recommendations |
| Social Network | Graph TVFs on a social graph (BFS, components, PageRank) |
| Research Papers | Citation graph analysis with Node2Vec embeddings |
| Transit Routes | Shortest path and graph traversal on route networks |
make all
python examples/semantic_search/semantic_search.py
API Reference
HNSW Virtual Table (hnsw_index)
CREATE VIRTUAL TABLE name USING hnsw_index(
dimensions=N, -- vector dimensionality (required)
metric='l2', -- 'l2' | 'cosine' | 'inner_product'
m=16, -- max connections per node per layer
ef_construction=200 -- beam width during index construction
);
Columns:
| Column | Type | Hidden | Description |
|---|---|---|---|
rowid |
INTEGER | Yes | User-assigned ID for joining with application tables |
vector |
BLOB | No | float32[dim] — input for INSERT, MATCH constraint for search |
distance |
REAL | No | Computed distance (output only, during search) |
k |
INTEGER | Yes | Top-k parameter (search constraint) |
ef_search |
INTEGER | Yes | Search beam width (search constraint) |
Operations:
-- Insert
INSERT INTO t (rowid, vector) VALUES (42, ?blob);
-- KNN search
SELECT rowid, distance FROM t WHERE vector MATCH ?query AND k = 10;
-- Point lookup
SELECT vector FROM t WHERE rowid = 42;
-- Delete (with automatic neighbor reconnection)
DELETE FROM t WHERE rowid = 42;
-- Drop (removes index and all shadow tables)
DROP TABLE t;
Shadow tables (auto-managed):
{name}_config— HNSW parameters{name}_nodes— stored vectors and level assignments{name}_edges— the proximity graph (usable by graph TVFs)
Graph Table-Valued Functions
All graph TVFs work on any existing SQLite table with source/target columns. Table and column names are validated against SQL injection.
graph_bfs / graph_dfs
Breadth-first or depth-first traversal from a start node.
SELECT node, depth, parent FROM graph_bfs
WHERE edge_table = 'edges'
AND src_col = 'src'
AND dst_col = 'dst'
AND start_node = 'node-42'
AND max_depth = 5
AND direction = 'forward'; -- 'forward' | 'reverse' | 'both'
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
depth |
INTEGER | Hop distance from start |
parent |
TEXT | Parent node in traversal tree (NULL for start) |
graph_shortest_path
Unweighted (BFS) or weighted (Dijkstra) shortest path.
-- Unweighted
SELECT node, distance, path_order FROM graph_shortest_path
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
AND start_node = 'A' AND end_node = 'Z' AND weight_col IS NULL;
-- Weighted (Dijkstra)
SELECT node, distance, path_order FROM graph_shortest_path
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
AND start_node = 'A' AND end_node = 'Z' AND weight_col = 'weight';
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node on the path |
distance |
REAL | Cumulative distance from start |
path_order |
INTEGER | Position in path (0-indexed) |
graph_components
Connected components via Union-Find with path compression.
SELECT node, component_id, component_size FROM graph_components
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst';
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
component_id |
INTEGER | Component index (0-based) |
component_size |
INTEGER | Number of nodes in this component |
graph_pagerank
Iterative power method PageRank with configurable damping and iterations.
SELECT node, rank FROM graph_pagerank
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
AND damping = 0.85 -- optional, default 0.85
AND iterations = 20; -- optional, default 20
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
rank |
REAL | PageRank score (sums to ~1.0) |
graph_degree
Degree centrality for all nodes.
SELECT node, in_degree, out_degree, degree, centrality FROM graph_degree
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst';
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
in_degree |
REAL | Count (or weighted sum) of incoming edges |
out_degree |
REAL | Count (or weighted sum) of outgoing edges |
degree |
REAL | Total degree (in + out) |
centrality |
REAL | Normalized degree centrality |
Optional constraints: weight_col, direction, normalized, timestamp_col, time_start, time_end.
graph_betweenness
Betweenness centrality via Brandes' O(VE) algorithm.
SELECT node, centrality FROM graph_betweenness
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
AND direction = 'both';
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
centrality |
REAL | Betweenness centrality score |
Optional constraints: weight_col, direction, normalized, timestamp_col, time_start, time_end.
graph_closeness
Closeness centrality with Wasserman-Faust normalization for disconnected graphs.
SELECT node, centrality FROM graph_closeness
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
AND direction = 'both';
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
centrality |
REAL | Closeness centrality score |
Optional constraints: weight_col, direction, timestamp_col, time_start, time_end.
graph_leiden
Community detection via the Leiden algorithm (Traag et al., 2019).
SELECT node, community_id, modularity FROM graph_leiden
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst';
| Output Column | Type | Description |
|---|---|---|
node |
TEXT | Node identifier |
community_id |
INTEGER | Community assignment (0-based) |
modularity |
REAL | Global modularity score of the partition |
Optional constraints: weight_col, resolution (default 1.0), timestamp_col, time_start, time_end.
node2vec_train()
Learn vector embeddings from graph structure using biased random walks (Node2Vec) and Skip-gram with Negative Sampling (SGNS).
SELECT node2vec_train(
edge_table, -- name of edge table
src_col, -- source column name
dst_col, -- destination column name
output_table, -- HNSW table to store embeddings (must exist)
dimensions, -- embedding size (must match HNSW table)
p, -- return parameter (1.0 = uniform/DeepWalk)
q, -- in-out parameter (1.0 = uniform/DeepWalk)
num_walks, -- walks per node
walk_length, -- max steps per walk
window_size, -- SGNS context window
negative_samples, -- negative samples per positive
learning_rate, -- initial learning rate (decays linearly)
epochs -- training epochs
);
-- Returns: number of nodes embedded
p, q parameter guide:
| Setting | Walk Behavior | Best For |
|---|---|---|
| p=1, q=1 | Uniform (DeepWalk) | General structural similarity |
| Low p (0.25) | BFS-like, stays local | Community/cluster detection |
| Low q (0.5) | DFS-like, explores far | Structural role similarity |
Benchmarks
The project includes a comprehensive benchmark suite comparing muninn against other SQLite extensions across real-world workloads.
Vector search benchmarks compare against sqlite-vector, sqlite-vec, and vectorlite using 3 embedding models (MiniLM, MPNet, BGE-Large) and 2 text datasets (AG News, Wealth of Nations) at scales up to 250K vectors.
Graph traversal benchmarks compare muninn TVFs against recursive CTEs and GraphQLite on synthetic graphs (Erdos-Renyi, Barabasi-Albert) at scales up to 100K nodes.
Results include interactive Plotly charts for insert throughput, search latency, recall, database size, and tipping-point analysis. See the full benchmark results on the documentation site.
make -C benchmarks help # List all benchmark targets
make -C benchmarks analyze # Generate charts and reports from existing results
Project Structure
src/ C11 source (extension entry point, HNSW, graph TVFs, Node2Vec)
test/ C unit tests (custom minimal framework)
pytests/ Python integration tests (pytest)
examples/ Self-contained usage examples
benchmarks/
scripts/ Benchmark runners and analysis scripts
charts/ Plotly JSON chart specs (committed for docs site)
results/ JSONL benchmark data (generated, not committed)
docs/ MkDocs documentation source
Documentation
Full documentation is published at neozenith.github.io/sqlite-muninn via MkDocs Material with interactive Plotly charts.
make docs-serve # Local dev server with live reload
make docs-build # Build static site
Research References
| Feature | Paper |
|---|---|
| HNSW | Malkov & Yashunin, TPAMI 2020 |
| MN-RU insert repair | arXiv:2407.07871, 2024 |
| Patience early termination | SISAP 2025 |
| Betweenness centrality | Brandes, J. Math. Sociol. 2001 |
| Leiden community detection | Traag, Waltman & van Eck, Sci. Rep. 2019 |
| Node2Vec | Grover & Leskovec, KDD 2016 |
| SGNS | Mikolov et al., 2013 |
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlite_muninn-0.3.0a2-py3-none-win_amd64.whl.
File metadata
- Download URL: sqlite_muninn-0.3.0a2-py3-none-win_amd64.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0e73c9794b51ff48328bd325ed51dc18e9bdd64f58dbc145beb899ff6c435cd
|
|
| MD5 |
971f750c94b1037e73536c52b4a550f6
|
|
| BLAKE2b-256 |
f4cdb43aae94b2fe78e34ff6e594663a76fa05c9b55053d51b165969917d9a35
|
File details
Details for the file sqlite_muninn-0.3.0a2-py3-none-manylinux_2_17_x86_64.whl.
File metadata
- Download URL: sqlite_muninn-0.3.0a2-py3-none-manylinux_2_17_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7aa9c2e9786213315355ed86074cea15b96da9e5e6347e863f3b7432534dd9f1
|
|
| MD5 |
e994af9d06d3e026370b1b872d8edf55
|
|
| BLAKE2b-256 |
96e6a5021111ffa9433882eb23db372e764d99b4a2caec53e30c7a2cbedd4959
|
File details
Details for the file sqlite_muninn-0.3.0a2-py3-none-manylinux_2_17_aarch64.whl.
File metadata
- Download URL: sqlite_muninn-0.3.0a2-py3-none-manylinux_2_17_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac0e97db7b1385d90c546075f548ccbcf67100c2506327375f2465948364a19f
|
|
| MD5 |
14a4b11fcf0c8a577de3e6a2ff85bb35
|
|
| BLAKE2b-256 |
c78cf592b2db23423ec68027670026e23f40a6e3ac5d18a35423d35fbe02c60e
|
File details
Details for the file sqlite_muninn-0.3.0a2-py3-none-macosx_11_0_universal2.whl.
File metadata
- Download URL: sqlite_muninn-0.3.0a2-py3-none-macosx_11_0_universal2.whl
- Upload date:
- Size: 2.9 MB
- Tags: Python 3, macOS 11.0+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33118d5206a749c5f5eba0d9119e072a9bf731073cc8c6c8615926013253765f
|
|
| MD5 |
ebe9b60a8358592fe1b467259c3183bc
|
|
| BLAKE2b-256 |
1ea3b8034f91a2a02d8785b0eb44ad087839ae28b21f10ba5da349ecbb334667
|