Skip to main content

HNSW vector search + graph traversal + Node2Vec for SQLite

Project description

sqlite-muninn

Muninn Raven Logo

Odin's mythic raven of Memory.

A zero-dependency C extension for SQLite to add an advanced collection of knowledge graph primitives like Vector Similarity Search, HNSW Indexes, Graph database, Community Detection, Node2Vec capabilities.

Documentation | GitHub

Features

  • HNSW Vector Index — O(log N) approximate nearest neighbor search with incremental insert/delete
  • Graph Traversal — BFS, DFS, shortest path, connected components, PageRank on any edge table
  • Centrality Measures — Degree, betweenness (Brandes), and closeness centrality with weighted/temporal support
  • Community Detection — Leiden algorithm for discovering graph communities with modularity scoring
  • Node2Vec — Learn structural node embeddings from graph topology, store in HNSW for similarity search
  • Zero dependencies — Pure C11, compiles to a single .dylib/.so/.dll
  • SIMD accelerated — ARM NEON and x86 SSE distance functions

Build

Requires SQLite development headers and a C11 compiler.

# macOS (Homebrew SQLite recommended)
brew install sqlite
make all

# Linux
sudo apt-get install libsqlite3-dev
make all

# Run tests
make test        # C unit tests
make test-python # Python integration tests
make test-all    # Both

Quick Start

.load ./muninn

-- Create an HNSW vector index
CREATE VIRTUAL TABLE my_vectors USING hnsw_index(
    dimensions=384, metric='cosine', m=16, ef_construction=200
);

-- Insert vectors
INSERT INTO my_vectors (rowid, vector) VALUES (1, ?);  -- 384-dim float32 blob

-- KNN search
SELECT rowid, distance FROM my_vectors
WHERE vector MATCH ?query AND k = 10 AND ef_search = 64;

-- Graph traversal on any edge table
SELECT node, depth FROM graph_bfs
WHERE edge_table = 'friendships' AND src_col = 'user_a'
  AND dst_col = 'user_b' AND start_node = 'alice' AND max_depth = 3
  AND direction = 'both';

-- Connected components
SELECT node, component_id, component_size FROM graph_components
WHERE edge_table = 'friendships' AND src_col = 'user_a' AND dst_col = 'user_b';

-- PageRank
SELECT node, rank FROM graph_pagerank
WHERE edge_table = 'citations' AND src_col = 'citing' AND dst_col = 'cited'
  AND damping = 0.85 AND iterations = 20;

-- Betweenness centrality (find bridge nodes)
SELECT node, centrality FROM graph_betweenness
WHERE edge_table = 'friendships' AND src_col = 'user_a' AND dst_col = 'user_b'
  AND direction = 'both'
ORDER BY centrality DESC LIMIT 10;

-- Community detection (Leiden algorithm)
SELECT node, community_id, modularity FROM graph_leiden
WHERE edge_table = 'friendships' AND src_col = 'user_a' AND dst_col = 'user_b';

-- Learn structural embeddings from graph topology
SELECT node2vec_train(
    'friendships', 'user_a', 'user_b', 'my_vectors',
    64, 1.0, 1.0, 10, 80, 5, 5, 0.025, 5
);

Examples

Self-contained examples in the examples/ directory:

Example Demonstrates
Semantic Search HNSW index, KNN queries, point lookup, delete
Movie Recommendations Vector similarity for content-based recommendations
Social Network Graph TVFs on a social graph (BFS, components, PageRank)
Research Papers Citation graph analysis with Node2Vec embeddings
Transit Routes Shortest path and graph traversal on route networks
make all
python examples/semantic_search/example.py

API Reference

HNSW Virtual Table (hnsw_index)

CREATE VIRTUAL TABLE name USING hnsw_index(
    dimensions=N,            -- vector dimensionality (required)
    metric='l2',             -- 'l2' | 'cosine' | 'inner_product'
    m=16,                    -- max connections per node per layer
    ef_construction=200      -- beam width during index construction
);

Columns:

Column Type Hidden Description
rowid INTEGER Yes User-assigned ID for joining with application tables
vector BLOB No float32[dim] — input for INSERT, MATCH constraint for search
distance REAL No Computed distance (output only, during search)
k INTEGER Yes Top-k parameter (search constraint)
ef_search INTEGER Yes Search beam width (search constraint)

Operations:

-- Insert
INSERT INTO t (rowid, vector) VALUES (42, ?blob);

-- KNN search
SELECT rowid, distance FROM t WHERE vector MATCH ?query AND k = 10;

-- Point lookup
SELECT vector FROM t WHERE rowid = 42;

-- Delete (with automatic neighbor reconnection)
DELETE FROM t WHERE rowid = 42;

-- Drop (removes index and all shadow tables)
DROP TABLE t;

Shadow tables (auto-managed):

  • {name}_config — HNSW parameters
  • {name}_nodes — stored vectors and level assignments
  • {name}_edges — the proximity graph (usable by graph TVFs)

Graph Table-Valued Functions

All graph TVFs work on any existing SQLite table with source/target columns. Table and column names are validated against SQL injection.

graph_bfs / graph_dfs

Breadth-first or depth-first traversal from a start node.

SELECT node, depth, parent FROM graph_bfs
WHERE edge_table = 'edges'
  AND src_col = 'src'
  AND dst_col = 'dst'
  AND start_node = 'node-42'
  AND max_depth = 5
  AND direction = 'forward';   -- 'forward' | 'reverse' | 'both'
Output Column Type Description
node TEXT Node identifier
depth INTEGER Hop distance from start
parent TEXT Parent node in traversal tree (NULL for start)

graph_shortest_path

Unweighted (BFS) or weighted (Dijkstra) shortest path.

-- Unweighted
SELECT node, distance, path_order FROM graph_shortest_path
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
  AND start_node = 'A' AND end_node = 'Z' AND weight_col IS NULL;

-- Weighted (Dijkstra)
SELECT node, distance, path_order FROM graph_shortest_path
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
  AND start_node = 'A' AND end_node = 'Z' AND weight_col = 'weight';
Output Column Type Description
node TEXT Node on the path
distance REAL Cumulative distance from start
path_order INTEGER Position in path (0-indexed)

graph_components

Connected components via Union-Find with path compression.

SELECT node, component_id, component_size FROM graph_components
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst';
Output Column Type Description
node TEXT Node identifier
component_id INTEGER Component index (0-based)
component_size INTEGER Number of nodes in this component

graph_pagerank

Iterative power method PageRank with configurable damping and iterations.

SELECT node, rank FROM graph_pagerank
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
  AND damping = 0.85        -- optional, default 0.85
  AND iterations = 20;      -- optional, default 20
Output Column Type Description
node TEXT Node identifier
rank REAL PageRank score (sums to ~1.0)

graph_degree

Degree centrality for all nodes.

SELECT node, in_degree, out_degree, degree, centrality FROM graph_degree
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst';
Output Column Type Description
node TEXT Node identifier
in_degree REAL Count (or weighted sum) of incoming edges
out_degree REAL Count (or weighted sum) of outgoing edges
degree REAL Total degree (in + out)
centrality REAL Normalized degree centrality

Optional constraints: weight_col, direction, normalized, timestamp_col, time_start, time_end.

graph_betweenness

Betweenness centrality via Brandes' O(VE) algorithm.

SELECT node, centrality FROM graph_betweenness
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
  AND direction = 'both';
Output Column Type Description
node TEXT Node identifier
centrality REAL Betweenness centrality score

Optional constraints: weight_col, direction, normalized, timestamp_col, time_start, time_end.

graph_closeness

Closeness centrality with Wasserman-Faust normalization for disconnected graphs.

SELECT node, centrality FROM graph_closeness
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst'
  AND direction = 'both';
Output Column Type Description
node TEXT Node identifier
centrality REAL Closeness centrality score

Optional constraints: weight_col, direction, timestamp_col, time_start, time_end.

graph_leiden

Community detection via the Leiden algorithm (Traag et al., 2019).

SELECT node, community_id, modularity FROM graph_leiden
WHERE edge_table = 'edges' AND src_col = 'src' AND dst_col = 'dst';
Output Column Type Description
node TEXT Node identifier
community_id INTEGER Community assignment (0-based)
modularity REAL Global modularity score of the partition

Optional constraints: weight_col, resolution (default 1.0), timestamp_col, time_start, time_end.

node2vec_train()

Learn vector embeddings from graph structure using biased random walks (Node2Vec) and Skip-gram with Negative Sampling (SGNS).

SELECT node2vec_train(
    edge_table,       -- name of edge table
    src_col,          -- source column name
    dst_col,          -- destination column name
    output_table,     -- HNSW table to store embeddings (must exist)
    dimensions,       -- embedding size (must match HNSW table)
    p,                -- return parameter (1.0 = uniform/DeepWalk)
    q,                -- in-out parameter (1.0 = uniform/DeepWalk)
    num_walks,        -- walks per node
    walk_length,      -- max steps per walk
    window_size,      -- SGNS context window
    negative_samples, -- negative samples per positive
    learning_rate,    -- initial learning rate (decays linearly)
    epochs            -- training epochs
);
-- Returns: number of nodes embedded

p, q parameter guide:

Setting Walk Behavior Best For
p=1, q=1 Uniform (DeepWalk) General structural similarity
Low p (0.25) BFS-like, stays local Community/cluster detection
Low q (0.5) DFS-like, explores far Structural role similarity

Benchmarks

The project includes a comprehensive benchmark suite comparing muninn against other SQLite extensions across real-world workloads.

Vector search benchmarks compare against sqlite-vector, sqlite-vec, and vectorlite using 3 embedding models (MiniLM, MPNet, BGE-Large) and 2 text datasets (AG News, Wealth of Nations) at scales up to 250K vectors.

Graph traversal benchmarks compare muninn TVFs against recursive CTEs and GraphQLite on synthetic graphs (Erdos-Renyi, Barabasi-Albert) at scales up to 100K nodes.

Results include interactive Plotly charts for insert throughput, search latency, recall, database size, and tipping-point analysis. See the full benchmark results on the documentation site.

make -C benchmarks help       # List all benchmark targets
make -C benchmarks analyze    # Generate charts and reports from existing results

Project Structure

src/                  C11 source (extension entry point, HNSW, graph TVFs, Node2Vec)
test/                 C unit tests (custom minimal framework)
pytests/              Python integration tests (pytest)
examples/             Self-contained usage examples
benchmarks/
  scripts/            Benchmark runners and analysis scripts
  charts/             Plotly JSON chart specs (committed for docs site)
  results/            JSONL benchmark data (generated, not committed)
docs/                 MkDocs documentation source

Documentation

Full documentation is published at neozenith.github.io/sqlite-muninn via MkDocs Material with interactive Plotly charts.

make docs-serve    # Local dev server with live reload
make docs-build    # Build static site

Research References

Feature Paper
HNSW Malkov & Yashunin, TPAMI 2020
MN-RU insert repair arXiv:2407.07871, 2024
Patience early termination SISAP 2025
Betweenness centrality Brandes, J. Math. Sociol. 2001
Leiden community detection Traag, Waltman & van Eck, Sci. Rep. 2019
Node2Vec Grover & Leskovec, KDD 2016
SGNS Mikolov et al., 2013

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlite_muninn-0.1.0a5-py3-none-win_amd64.whl (116.0 kB view details)

Uploaded Python 3Windows x86-64

sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_x86_64.whl (49.4 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_aarch64.whl (48.9 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

sqlite_muninn-0.1.0a5-py3-none-macosx_11_0_universal2.whl (82.7 kB view details)

Uploaded Python 3macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlite_muninn-0.1.0a5-py3-none-win_amd64.whl.

File metadata

  • Download URL: sqlite_muninn-0.1.0a5-py3-none-win_amd64.whl
  • Upload date:
  • Size: 116.0 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sqlite_muninn-0.1.0a5-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 708262dbab1bde671df59126656f9db72de96796b2fd756f674800d773e755f8
MD5 f2bdd7a4a1441a3b9149a03bc4ad40ae
BLAKE2b-256 f871d06eb6851614bf39392b872e3adf72f43a40e79a861d1ecb11b308803b49

See more details on using hashes here.

File details

Details for the file sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7f39de0d4f4c6ce6e95fa4c829d7fa3bc7eb27b325ad339f4140df8bc464eccb
MD5 eb7a7d360dec5116da755f12f1d8e331
BLAKE2b-256 ba13435b6fe48d5d5c9ed1b44a52eb1c3b4436d45d094dd023f0a294f839a1c9

See more details on using hashes here.

File details

Details for the file sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 48.9 kB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sqlite_muninn-0.1.0a5-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 3a80d1faf536588b40f9dc594dcf6597265897d4f7f3d4c13d63e70982a95fce
MD5 b5727c7a250e276db08f0b7c1da0b3bb
BLAKE2b-256 e21d1f678ca2010a1dc8373cd87b242b2a54dc65589711e4744b4a1840c09721

See more details on using hashes here.

File details

Details for the file sqlite_muninn-0.1.0a5-py3-none-macosx_11_0_universal2.whl.

File metadata

  • Download URL: sqlite_muninn-0.1.0a5-py3-none-macosx_11_0_universal2.whl
  • Upload date:
  • Size: 82.7 kB
  • Tags: Python 3, macOS 11.0+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sqlite_muninn-0.1.0a5-py3-none-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 08566f3aa7e4dec8bfd4e4955d0170dd9d9e599efe1ec95a0af6af00167ebd27
MD5 5e2440c12fd8d12bc5fdb269603c8aac
BLAKE2b-256 91c2a7d28c112980a889ee3e62e5f81cf96454c4f1e98a7c6865bc745b1b407c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page