Skip to main content

Local-first second-brain CLI for Obsidian vaults: hybrid search, typed knowledge graph, MCP server.

Project description

smolbren

Local-first second-brain CLI for Obsidian vaults.

Turns a folder of Markdown into a queryable knowledge base:

  • Hybrid search — vector (sqlite-vec) + keyword (FTS5/BM25), fused with Reciprocal Rank Fusion
  • Typed knowledge graph — self-wired from frontmatter relations, wikilinks, and regex patterns
  • Embedding cache keyed by content hash — renaming or moving chunks costs zero embedding calls
  • Watch mode — re-ingests on file changes with per-file debouncing

Install

Requires Python 3.12+ and a running Ollama with the embedding model pulled:

ollama pull nomic-embed-text

From PyPI (once published):

uv tool install smolbren

From source:

git clone https://github.com/junaidrahim/smolbren && cd smolbren
uv sync
uv run smolbren --help

Quickstart

cd /path/to/vault
smolbren init                          # writes .smolbren/config.toml + db
smolbren ingest                        # parse → chunk → upsert → embed
smolbren search "who's on call?"
smolbren graph neighbors people/jane --depth 2
smolbren stats

Run smolbren ingest --watch to keep the index live while you edit.

Commands

Command What it does
init Scaffold .smolbren/ and write the default config
ingest [--watch] [--no-embed] Parse, chunk, upsert, embed. --watch stays running with debounced re-ingest.
embed Embed any chunks that don't yet have a vector (cache-aware)
search QUERY [--mode hybrid|vector|keyword] [--top-k N] Semantic / keyword / hybrid search
stats Page / chunk / edge counts and type distribution
graph neighbors SLUG [--type T] [--depth N] [--direction out|in|both] BFS reachable neighbors
graph path SRC DST Shortest path between two slugs
graph stats [--top N] Node/edge counts, components, top-degree nodes

Every command accepts --vault PATH (defaults to $SMOLBREN_VAULT, then cwd) and --json for machine-readable output.

How it works

  1. Ingest — Markdown is parsed with frontmatter, chunked by H2 with a token-window overlap, and stored in SQLite (WAL). Code fences are stripped before edge extraction so `[[fake]]` doesn't pollute the graph.
  2. Embed — Chunks without a vector are embedded via Ollama (nomic-embed-text, 768d, L2-normalized). The cache is keyed on (content_hash, model) so chunks that move between files don't re-hit the model.
  3. Search — Vector via sqlite-vec; keyword via FTS5 with BM25. Hybrid overfetches both branches, fuses with RRF (score = weight / (k + rank)), then applies a multiplicative backlink boost (score × (1 + boost·log(1+backlinks))) before slicing to top-k. Boost coefficient lives in config; set to 0 to disable.
  4. Graph — Edges come from frontmatter relations (allow-listed types: works_on, owns, member_of, reports_to, depends_on, attended, decided_in, references, mentions), wikilinks (as mentions), and a small set of regex patterns (X works at [[Y]], X owns [[Y]], attended [[Z]], depends on [[W]], decided in [[V]]). Loaded into a NetworkX MultiDiGraph and cached process-locally; the cache invalidates on any edge mutation via a DB-side version counter.

Config

.smolbren/config.toml:

[embeddings]
model = "nomic-embed-text"
ollama_url = "http://localhost:11434"

[chunking]
strategy = "h2"
max_chunk_tokens = 512
overlap_tokens = 50

[search]
rrf_k = 60
backlink_boost = 0.15
hybrid_weights = [1.0, 1.0]

[ignore]
patterns = ["**/.obsidian/**", "**/node_modules/**"]

Development

uv sync
uv run pytest
uv run ruff check
uv run mypy

Release process

Releases are fully automated. Every push to main is parsed for Conventional Commits; if any commit since the last tag is feat:, fix:, perf:, or contains BREAKING CHANGE, python-semantic-release bumps the version, commits the bump back to main (with [skip ci]), tags it, creates a GitHub Release, and publishes the wheel + sdist to PyPI via the pypi trusted-publisher environment.

Bump rules:

Prefix Bump
feat: minor
fix:, perf: patch
any commit body with BREAKING CHANGE: major
chore:, docs:, ci:, style:, test:, refactor:, build: none

Non-conforming commit messages are ignored (no version bump). Reference implementation: .github/workflows/publish.yaml and the [tool.semantic_release] block in pyproject.toml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smolbren-1.0.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smolbren-1.0.0-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file smolbren-1.0.0.tar.gz.

File metadata

  • Download URL: smolbren-1.0.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smolbren-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f7c47e240faec354476be5e64d47632dcbd2f914c3a5d9d4cfd845d2ed1a80dd
MD5 20c6dadd721e24580e17e54810a1bab3
BLAKE2b-256 4e5d2fc78fc1cea1427312e8fc888c6af73fbe535a19e817bb471a1377d83c53

See more details on using hashes here.

File details

Details for the file smolbren-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: smolbren-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smolbren-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6848c3e2ced2134aa489783c545c209be9b571c0b53038b6878183b8c4c3233c
MD5 33f2987b793e92fd7d7264753a5192dc
BLAKE2b-256 a97e677c0f255fa59ad7e2de4c5573b78b7fb93f068dc9be9eaa18f715c01070

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page