Skip to main content

Semantic search for Obsidian vaults

Project description

obsidian-semantic

PyPI Python License: MIT

Semantic search for Obsidian vaults. Index your vault into vector embeddings, then search by meaning rather than keywords.

obsidian-semantic CLI

Using this with an AI agent (Claude Code, Cursor, etc.)? See SKILL.md for agent-facing guidance — score interpretation, workflows, and known gotchas.

Install

# As a standalone CLI (recommended)
uv tool install obsidian-semantic

# Or with pipx (also installs into an isolated environment)
pipx install obsidian-semantic

# With Gemini embedder support
uv tool install "obsidian-semantic[gemini]"

Then configure:

obsidian-semantic configure

Configuration is stored in ~/.config/obsidian-semantic/config.yaml. Supports Ollama (local), LM Studio (local), and Gemini embedders.

From source

git clone https://github.com/ravila4/obsidian-semantic-search
cd obsidian-semantic-search
uv sync
uv run obsidian-semantic configure

Usage

Index your vault

obsidian-semantic index                # incremental (new/modified files only)
obsidian-semantic index --full         # reindex everything

Search

obsidian-semantic search "dependency injection patterns"
obsidian-semantic search "python testing" --limit 5
obsidian-semantic search "docker" --folder "Programming/"
obsidian-semantic search "habits" --tag "review"
obsidian-semantic search "fisher" --score-min 0.6     # drop low-relevance hits
obsidian-semantic search "fisher" --per-file 0        # show every matching chunk

By default, results are deduped to one chunk per file. Pass --per-file N to allow up to N chunks per file (or 0 for unlimited).

--score-min thresholds need to account for dedup: the second-best file's surviving chunk often scores ~0.05–0.10 lower than the duplicate chunks it displaced, so a threshold tuned against raw chunk scores can drop relevant notes. Calibrate against the post-dedup output. Useful absolute bands on ollama+nomic are roughly: ≥0.65 strong title-level match, ≥0.5 topical, <0.4 likely noise. Other embedders (qwen3, gemini) sit on different scales.

Find related notes

Find notes similar to a given note, useful for discovering connections, linking, or deduplication.

obsidian-semantic related "Programming/Python/Unit Testing.md"
obsidian-semantic related "Daily/2026-02-05.md" --limit 5

If the note isn't in the index, it's chunked and embedded on the fly.

Show a note

Print the full contents of a note straight to stdout. Accepts a vault-relative path or a bare filename (with or without .md); if the basename is unique, it's resolved automatically. Reads from disk, so it works on un-indexed files too (unlike search).

obsidian-semantic show "Fisher's Exact in Empiroar.md"
obsidian-semantic show "Programming/Python/Unit Testing.md"
obsidian-semantic show "Unit Testing.md#Setup#Installation"   # specific section

Append #Heading (or #Parent#Child for nested sections) to print just that section. Heading paths are matched against the breadcrumb suffix and are case-insensitive; ambiguous headings are listed with line numbers.

Suggest missing links

Find semantically similar notes that aren't linked to each other -- surfaces missing wikilinks and potential duplicates.

obsidian-semantic suggest-links
obsidian-semantic suggest-links --threshold 0.85 --limit 10
obsidian-semantic suggest-links --exclude-same-folder "Daily Log"

Folders to exclude can also be set in config so you don't have to type them every time:

suggest_links:
  exclude_same_folder:
    - "Daily Log"

Status

obsidian-semantic status

Options

All commands accept --vault <path> to specify the vault. Alternatively, set OBSIDIAN_VAULT or configure a default with obsidian-semantic configure --vault <path>.

Embedding Backends

Configuration lives in ~/.config/obsidian-semantic/config.yaml. You can also place a .obsidian-semantic.yaml in your vault root to override per-vault.

After changing the embedder or model, reindex with obsidian-semantic index --full.

Ollama with Nomic (default)

Local embeddings with nomic-embed-text (768 dimensions). Uses search_query:/search_document: prefixes for asymmetric retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: ollama
  model: nomic-embed-text
  dimension: 768
  query_prefix: "search_query: "
  document_prefix: "search_document: "
ollama pull nomic-embed-text

Ollama with Qwen3-embedding

Higher-quality embeddings with qwen3-embedding (4096 dimensions). Uses an instruction prefix for queries to improve retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: ollama
  model: qwen3-embedding:8b
  dimension: 4096
  query_prefix: "Instruct: Given a search query, retrieve relevant notes\nQuery: "
ollama pull qwen3-embedding:8b

LM Studio

Local embeddings via LM Studio's OpenAI-compatible API (/v1/embeddings on port 1234). Start the server first:

lms server start

LM Studio with Nomic

vault: ~/Documents/Obsidian-Notes
embedder:
  type: lmstudio
  model: text-embedding-nomic-embed-text-v1.5
  dimension: 768
  query_prefix: "search_query: "
  document_prefix: "search_document: "
lms get -y nomic-ai/nomic-embed-text-v1.5

LM Studio with Qwen3-embedding

Higher-quality embeddings (4096 dimensions). Like the Ollama variant, uses an instruction prefix for queries to improve retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: lmstudio
  model: text-embedding-qwen3-embedding-8b
  dimension: 4096
  query_prefix: "Instruct: Given a search query, retrieve relevant notes\nQuery: "

Gemini

Cloud embeddings via Google's gemini-embedding-001 (3072 dimensions). Handles query vs. document task types automatically -- no prefix config needed. Requires a GEMINI_API_KEY environment variable.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: gemini
  model: gemini-embedding-001
  dimension: 3072

Advanced Options

Timeout Configuration

The embedder request timeout (default: 30 seconds) can be increased for large files or slower models:

embedder:
  timeout: 60.0  # seconds

If you see timeout errors during indexing, try increasing this value. Very large notes with extensive JSON or code blocks may need 60-120 seconds.

Automatic Indexing

Linux (systemd)

Create a service and timer in ~/.config/systemd/user/:

obsidian-semantic-index.service

[Unit]
Description=Index Obsidian vault for semantic search

[Service]
Type=oneshot
EnvironmentFile=%h/.config/obsidian-semantic/env
ExecStart=/home/youruser/.local/bin/obsidian-semantic index

obsidian-semantic-index.timer

[Unit]
Description=Run Obsidian semantic index hourly

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

The EnvironmentFile is optional — use it to store secrets like GEMINI_API_KEY outside of the main config.

Enable and start:

systemctl --user enable --now obsidian-semantic-index.timer

Multiple vaults

To index additional vaults, add more ExecStart lines to the service (they run sequentially):

[Service]
Type=oneshot
EnvironmentFile=%h/.config/obsidian-semantic/env
ExecStart=/home/youruser/.local/bin/obsidian-semantic index
ExecStart=/home/youruser/.local/bin/obsidian-semantic index --vault /path/to/second-vault

macOS (launchd)

A ready-to-edit plist + wrapper script lives in scripts/launchd/. The wrapper opportunistically starts the LM Studio server (lms server start) before each run, so the agent works whether or not you remembered to leave the server up.

Install once:

# Make obsidian-semantic available on PATH
uv tool install -e .

# Edit the absolute paths in the plist to match your home directory, then:
cp scripts/launchd/com.ravila.obsidian-semantic-index.plist ~/Library/LaunchAgents/
launchctl load -w ~/Library/LaunchAgents/com.ravila.obsidian-semantic-index.plist

Logs land at ~/Library/Logs/obsidian-semantic-index.log.

To unload or check status:

launchctl list | grep obsidian-semantic
launchctl unload ~/Library/LaunchAgents/com.ravila.obsidian-semantic-index.plist

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidian_semantic-0.1.0.tar.gz (152.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obsidian_semantic-0.1.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file obsidian_semantic-0.1.0.tar.gz.

File metadata

  • Download URL: obsidian_semantic-0.1.0.tar.gz
  • Upload date:
  • Size: 152.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for obsidian_semantic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e705b873a13021dbab3bb2aa4efaf946cf690e7fe2b139359e9428b068737f43
MD5 99e2773dc6a6587444d888b22957c8f3
BLAKE2b-256 7304066855cf60c2fc02b70bf189ae9dcb7786aff4b3c22c966374ef35dfbd60

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_semantic-0.1.0.tar.gz:

Publisher: publish.yml on ravila4/obsidian-semantic-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file obsidian_semantic-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for obsidian_semantic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bffa76b4393fb9f4cbcc093aa58268e5f943bc9354910a7f247b6075ed4da11
MD5 284636a5e2a3eb8ef30b55b17c52372b
BLAKE2b-256 9ca109443d4dc355c128a191b29d24ca1a810e41099dd9c00b6192c3dccfc5a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_semantic-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ravila4/obsidian-semantic-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page