Local LLM-powered Zettelkasten pipeline for Obsidian — daily note splitter, semantic backlinks, keyword linker, synonym clustering.
Project description
Turn any Obsidian vault into a Zettelkasten graph — locally, with a dozen years of notes in minutes.
Quick Start • How It Works • Before / After • Phases • Models • Troubleshooting
Why?
Obsidian's graph view is only as good as the [[wikilinks]] you remember to write. Years of daily notes pile up with zero connections. LLM "second brain" tools exist — but they send every note to a cloud API.
Umbra runs entirely on your own machine. It splits daily journal entries into titled topic notes, then weaves four layers of backlinks across your whole vault so the graph actually lights up.
┌─────────────────────────┐ ┌─────────────────────────┐
│ OBSIDIAN VAULT │ │ LOCAL GPU │
│ │ file I/O only │ │
│ Daily notes │◄──────────────────►│ Qwen3-4B-Instruct │
│ Project notes │ │ Potion-32M │
│ Folder structure │ │ GTE-large + HDBSCAN │
│ │ │ │
│ [[wikilinks]] graph │ nothing leaves │ 4-phase pipeline │
└─────────────────────────┘ your machine └─────────────────────────┘
Quick Start
Prerequisites
| Your Machine | |
|---|---|
| OS | Linux (tested on Ubuntu 22.04+) |
| Python | 3.10+ |
| GPU | NVIDIA, 12GB+ VRAM |
| CUDA | 12.0+ |
| Model | Qwen3-4B-Instruct Q8_0 GGUF (~4GB) |
| Obsidian | 0.16+ (any recent release) |
1. Install
pip install obsidian-umbra
Or from source if you want the latest main:
git clone https://github.com/jimnoneill/obsidian-umbra
cd obsidian-umbra
pip install -e .
2. Download a Model
Default — Qwen3-4B-Instruct Q8_0 (4.3 GB, best quality on 24GB+ VRAM):
mkdir -p ~/models
huggingface-cli download Qwen/Qwen3-4B-Instruct-2507-GGUF \
Qwen3-4B-Instruct-2507-Q8_0.gguf --local-dir ~/models
Lighter — Qwen3-4B-Instruct Q4_K_M (2.5 GB, runs fine on 12GB):
huggingface-cli download Qwen/Qwen3-4B-Instruct-2507-GGUF \
Qwen3-4B-Instruct-2507-Q4_K_M.gguf --local-dir ~/models
Any GGUF instruct model works — Llama 3, Mistral, Gemma, Phi-3. See docs/models.md for the supported-models matrix.
3. Configure
cp config.yaml.example config.yaml
Edit config.yaml:
vault: ~/Documents/MyVault # Your Obsidian vault
model_path: ~/models/Qwen3-4B-Instruct-2507-Q8_0.gguf # Any GGUF instruct model
model_name: Qwen3-4B-Instruct-2507 # Label used in logs
chat_format: chatml # See docs/models.md
output_subdir: umbra # Topic notes land here
state_dir: ~/.obsidian-umbra # State/cache/logs
cuda_visible_devices: "0" # Which GPU
4. Run
./deploy.sh all
First run on a large vault takes ~2 min per 50 daily notes + ~1 min for the other three phases combined. Re-runs are idempotent and take seconds.
5. Install the Cron Job (Optional)
(crontab -l 2>/dev/null; echo "0 4 * * * $PWD/deploy.sh all") | crontab -
Every morning at 4am, new daily notes get split and the graph is refreshed.
How It Works
Four phases run in sequence. Each is idempotent and can be run alone.
daily notes ─► Phase 1 ─► topic notes ─► Phase 2 ─► Related Notes
Qwen3-4B Potion-32M
JSON top-K cosine
─► Phase 3 ─► inline [[wikilinks]]
keyword
matcher
─► Phase 4 ─► ## Same Concept
GTE-large
+ HDBSCAN
- Daily Splitter — reads each daily note (MM-DD-YYYY or YYYY-MM-DD), calls Qwen3-4B-Instruct locally via llama-cpp-python in JSON mode, extracts distinct topics, writes one titled markdown note per topic with YAML frontmatter and source backlinks.
- Semantic Backlinks — embeds every note with Potion-32M (256-dim static embeddings, deterministic, fast), computes pairwise cosine similarity plus tag-overlap bonus, appends a
## Related Notessection with top-5 links and similarity %. - Keyword Linker — builds a keyword index from non-daily note stems, titles, and folder names. Injects inline
[[wikilinks]]wherever a keyword appears in body text. Skips YAML, code blocks, existing links, headings, URLs, HTML comments. Single-word keywords must be CamelCase / acronym / digit-bearing to avoid false positives on common English. - Synonym Linker — embeds concept-note titles with GTE-large (1024-dim), clusters with cuML HDBSCAN, writes a
## Same Conceptsection between cluster siblings. Mega-clusters (>20 members) collapse to hub-and-spoke — each member gets one link to the centroid-closest representative.
All three section markers (<!-- umbra: ... -->) are used to safely regenerate sections without mangling your writing.
Before / After
Try the included demo vault — a grad student studying Plato's Allegory of the Cave (22 daily notes + 24 topic notes, seeded with real OpenAlex references).
# Compare a single daily note
diff examples/before/01-15-2024.md examples/after/01-15-2024.md
Before (daily note as written):
Distracted day. Reading around the edges.
Thought experiment: what if the cave is literally about sensory
perception vs. mathematical knowledge? The shadows are sense-data.
The puppets are physical objects. The sun is the Form of the Good.
Then the ascent tracks: aisthesis → doxa → dianoia → noesis. The
divided line literally fits inside the cave.
After (same note, Umbra-processed):
Distracted day. Reading around the edges.
Thought experiment: what if the cave is literally about sensory
perception vs. mathematical knowledge? The shadows are sense-data.
The puppets are physical objects. The sun is the [[Form of the Good|Form of the Good]].
Then the ascent tracks: aisthesis → doxa → dianoia → noesis. The
[[Divided Line|divided line]] literally fits inside the cave.
<!-- umbra: generated topic links -->
## Topics
- [[cave-sensory-perception-mathematical-knowledge-2024-01-15|Cave as Narrative of Sensory vs Mathematical Knowledge]] #plato #allegory #perception
- [[plato-revee-intro-comparison-2024-01-15|Reeve's Intro Offers Similar Framework]] #reeve #plato
<!-- umbra: related notes -->
## Related Notes
- [[cave-sensory-perception-mathematical-knowledge-2024-01-15|Cave as Narrative of Sensory vs Mathematical Knowledge]] (93%)
- [[01-08-2024|01-08-2024]] (81%)
- [[01-28-2024|01-28-2024]] (80%)
- [[plato-cave-epistemic-contexts-2024-01-28|The Cave as Epistemic Context Shift]] (77%)
Browse the full examples/after/ directory to see generated topic notes, hub/spoke synonym clusters, and the auto-built NOTE_INDEX.md.
Commands
./deploy.sh install # pip install -e .
./deploy.sh all # Run Phase 1 → 2 → 3 → 4
./deploy.sh split # Phase 1 — daily note splitter
./deploy.sh semantic # Phase 2 — semantic backlinks
./deploy.sh keywords # Phase 3 — keyword linker
./deploy.sh synonyms # Phase 4 — synonym clustering
./deploy.sh status # Tail each phase's log
./deploy.sh logs # Live-tail all logs
./deploy.sh help # Show all commands
Each phase also accepts per-phase flags (--dry-run, --rebuild, --one PATH, --stats). Pass them after the phase name:
./deploy.sh split --dry-run --since 2024-06-01
./deploy.sh synonyms --stats
Documentation
| Document | Description |
|---|---|
| Phases | Deep dive on each of the four phases |
| Models | Supported LLMs + Q4 vs Q8 quantization |
| Configuration | All settings reference |
| Troubleshooting | Common issues & fixes |
| Manual Setup | Step-by-step without scripts |
| Releasing | Maintainer version + PyPI workflow |
| Changelog | Version history (SemVer) |
Troubleshooting
| Issue | Fix |
|---|---|
llama-cpp-python won't build with CUDA |
Rebuild with CMAKE_ARGS="-DGGML_CUDA=on"; see troubleshooting |
cuml import fails |
Install from RAPIDS conda, not pip |
| Every run re-embeds all notes | Writes change mtimes; Umbra refreshes mtime cache after writes — check state_dir/cache/ |
| Phase 3 links generic words like "money" | Already filtered; check your STOP_WORDS / is_specific_keyword logic |
| Phase 4 mega-clusters unusable | Lower max_cluster_full_crosslink in config; hub/spoke always kicks in |
Requirements
- Obsidian: 0.16+ (any recent version)
- Python: 3.10+
- NVIDIA driver: 525+ for CUDA 12
- llama-cpp-python: 0.3.0+ (built with
GGML_CUDA=onfor speed) - sentence-transformers: 3.0+ (pulls GTE-large on first run, ~500MB)
- model2vec: 0.3.0+ (Potion-32M, ~40MB)
- cuml: RAPIDS release matching your CUDA (HDBSCAN on GPU)
Support
If this saved you from hand-wikilinking a decade of journal entries, you can throw a few bucks my way. No pressure.
License
MIT © 2026
Shadows on the wall. The real forms are your notes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file obsidian_umbra-0.1.0.tar.gz.
File metadata
- Download URL: obsidian_umbra-0.1.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81e7c82efd095c1927cbb42bba77dd95a96fbba27bea1cd33af9fc7999bd182b
|
|
| MD5 |
f502e81c5b098159c097e778498dec0e
|
|
| BLAKE2b-256 |
bee6b545c06d8baa8fc55c90d8a84c126900ad9ed9dbe8e87adcd6874e6104c5
|
Provenance
The following attestation bundles were made for obsidian_umbra-0.1.0.tar.gz:
Publisher:
publish.yml on jimnoneill/obsidian-umbra
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
obsidian_umbra-0.1.0.tar.gz -
Subject digest:
81e7c82efd095c1927cbb42bba77dd95a96fbba27bea1cd33af9fc7999bd182b - Sigstore transparency entry: 1346952382
- Sigstore integration time:
-
Permalink:
jimnoneill/obsidian-umbra@8749b80401cd0fc72c60819ee211f0e06a9a7608 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jimnoneill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8749b80401cd0fc72c60819ee211f0e06a9a7608 -
Trigger Event:
release
-
Statement type:
File details
Details for the file obsidian_umbra-0.1.0-py3-none-any.whl.
File metadata
- Download URL: obsidian_umbra-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3d2338edadf7a2b38fcb67a48a58f60bcaaf8d1559a12e8c37c25733545cba0
|
|
| MD5 |
b99c804cd653da4a839d8a44cebf3a38
|
|
| BLAKE2b-256 |
ab836a619ff501e279aa9cfc71c3cd782e217c4f1fb8a48e9a4ffaab9f7d4958
|
Provenance
The following attestation bundles were made for obsidian_umbra-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on jimnoneill/obsidian-umbra
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
obsidian_umbra-0.1.0-py3-none-any.whl -
Subject digest:
c3d2338edadf7a2b38fcb67a48a58f60bcaaf8d1559a12e8c37c25733545cba0 - Sigstore transparency entry: 1346952479
- Sigstore integration time:
-
Permalink:
jimnoneill/obsidian-umbra@8749b80401cd0fc72c60819ee211f0e06a9a7608 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jimnoneill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8749b80401cd0fc72c60819ee211f0e06a9a7608 -
Trigger Event:
release
-
Statement type: