obsidian-umbra

Local LLM-powered Zettelkasten pipeline for Obsidian — daily note splitter, semantic backlinks, keyword linker, synonym clustering.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jimnoneill

These details have not been verified by PyPI

Project description

Obsidian Umbra

Turn any Obsidian vault into a Zettelkasten graph — locally, with a dozen years of notes in minutes.

Quick Start • How It Works • Before / After • Phases • Models • Troubleshooting

Python versions Obsidian Supported models NVIDIA CUDA License

Why?

Obsidian's graph view is only as good as the [[wikilinks]] you remember to write. Years of daily notes pile up with zero connections. LLM "second brain" tools exist — but they send every note to a cloud API.

Umbra runs entirely on your own machine. It splits daily journal entries into titled topic notes, then weaves four layers of backlinks across your whole vault so the graph actually lights up.

┌─────────────────────────┐                    ┌─────────────────────────┐
│     OBSIDIAN VAULT      │                    │        LOCAL GPU        │
│                         │    file I/O only   │                         │
│   Daily notes           │◄──────────────────►│   Qwen3-4B-Instruct     │
│   Project notes         │                    │   Potion-32M            │
│   Folder structure      │                    │   GTE-large + HDBSCAN   │
│                         │                    │                         │
│   [[wikilinks]] graph   │   nothing leaves   │   4-phase pipeline      │
└─────────────────────────┘   your machine     └─────────────────────────┘

Quick Start

Prerequisites

	Your Machine
OS	Linux (tested on Ubuntu 22.04+)
Python	3.10+
GPU	NVIDIA, 12GB+ VRAM
CUDA	12.0+
Model	Qwen3-4B-Instruct Q8_0 GGUF (~4GB)
Obsidian	0.16+ (any recent release)

1. Install

pip install obsidian-umbra

Or from source if you want the latest main:

git clone https://github.com/jimnoneill/obsidian-umbra
cd obsidian-umbra
pip install -e .

2. Download a Model

Default — Qwen3-4B-Instruct Q8_0 (4.3 GB, best quality on 24GB+ VRAM):

mkdir -p ~/models
huggingface-cli download Qwen/Qwen3-4B-Instruct-2507-GGUF \
  Qwen3-4B-Instruct-2507-Q8_0.gguf --local-dir ~/models

Lighter — Qwen3-4B-Instruct Q4_K_M (2.5 GB, runs fine on 12GB):

huggingface-cli download Qwen/Qwen3-4B-Instruct-2507-GGUF \
  Qwen3-4B-Instruct-2507-Q4_K_M.gguf --local-dir ~/models

Any GGUF instruct model works — Llama 3, Mistral, Gemma, Phi-3. See docs/models.md for the supported-models matrix.

3. Configure

cp config.yaml.example config.yaml

Edit config.yaml:

vault: ~/Documents/MyVault                                 # Your Obsidian vault
model_path: ~/models/Qwen3-4B-Instruct-2507-Q8_0.gguf      # Any GGUF instruct model
model_name: Qwen3-4B-Instruct-2507                         # Label used in logs
chat_format: chatml                                        # See docs/models.md
output_subdir: umbra                                       # Topic notes land here
state_dir: ~/.obsidian-umbra                               # State/cache/logs
cuda_visible_devices: "0"                                  # Which GPU

4. Run

./deploy.sh all

First run on a large vault takes ~2 min per 50 daily notes + ~1 min for the other three phases combined. Re-runs are idempotent and take seconds.

5. Install the Cron Job (Optional)

(crontab -l 2>/dev/null; echo "0 4 * * *  $PWD/deploy.sh all") | crontab -

Every morning at 4am, new daily notes get split and the graph is refreshed.

How It Works

Four phases run in sequence. Each is idempotent and can be run alone.

daily notes  ─►  Phase 1  ─►  topic notes  ─►  Phase 2  ─►  Related Notes
                Qwen3-4B                        Potion-32M
                  JSON                          top-K cosine

                                  ─►  Phase 3  ─►  inline [[wikilinks]]
                                      keyword
                                      matcher

                                  ─►  Phase 4  ─►  ## Same Concept
                                      GTE-large
                                      + HDBSCAN

Daily Splitter — reads each daily note (MM-DD-YYYY or YYYY-MM-DD), calls Qwen3-4B-Instruct locally via llama-cpp-python in JSON mode, extracts distinct topics, writes one titled markdown note per topic with YAML frontmatter and source backlinks.
Semantic Backlinks — embeds every note with Potion-32M (256-dim static embeddings, deterministic, fast), computes pairwise cosine similarity plus tag-overlap bonus, appends a ## Related Notes section with top-5 links and similarity %.
Keyword Linker — builds a keyword index from non-daily note stems, titles, and folder names. Injects inline [[wikilinks]] wherever a keyword appears in body text. Skips YAML, code blocks, existing links, headings, URLs, HTML comments. Single-word keywords must be CamelCase / acronym / digit-bearing to avoid false positives on common English.
Synonym Linker — embeds concept-note titles with GTE-large (1024-dim), clusters with cuML HDBSCAN, writes a ## Same Concept section between cluster siblings. Mega-clusters (>20 members) collapse to hub-and-spoke — each member gets one link to the centroid-closest representative.

All three section markers () are used to safely regenerate sections without mangling your writing.

Before / After

Try the included demo vault — a grad student studying Plato's Allegory of the Cave (22 daily notes + 24 topic notes, seeded with real OpenAlex references).

# Compare a single daily note
diff examples/before/01-15-2024.md examples/after/01-15-2024.md

Before (daily note as written):

Distracted day. Reading around the edges.

Thought experiment: what if the cave is literally about sensory
perception vs. mathematical knowledge? The shadows are sense-data.
The puppets are physical objects. The sun is the Form of the Good.

Then the ascent tracks: aisthesis → doxa → dianoia → noesis. The
divided line literally fits inside the cave.

After (same note, Umbra-processed):

Distracted day. Reading around the edges.

Thought experiment: what if the cave is literally about sensory
perception vs. mathematical knowledge? The shadows are sense-data.
The puppets are physical objects. The sun is the [[Form of the Good|Form of the Good]].

Then the ascent tracks: aisthesis → doxa → dianoia → noesis. The
[[Divided Line|divided line]] literally fits inside the cave.

<!-- umbra: generated topic links -->
## Topics
- [[cave-sensory-perception-mathematical-knowledge-2024-01-15|Cave as Narrative of Sensory vs Mathematical Knowledge]] #plato #allegory #perception
- [[plato-revee-intro-comparison-2024-01-15|Reeve's Intro Offers Similar Framework]] #reeve #plato

<!-- umbra: related notes -->
## Related Notes
- [[cave-sensory-perception-mathematical-knowledge-2024-01-15|Cave as Narrative of Sensory vs Mathematical Knowledge]] (93%)
- [[01-08-2024|01-08-2024]] (81%)
- [[01-28-2024|01-28-2024]] (80%)
- [[plato-cave-epistemic-contexts-2024-01-28|The Cave as Epistemic Context Shift]] (77%)

Browse the full examples/after/ directory to see generated topic notes, hub/spoke synonym clusters, and the auto-built NOTE_INDEX.md.

Commands

./deploy.sh install    # pip install -e .
./deploy.sh all        # Run Phase 1 → 2 → 3 → 4
./deploy.sh split      # Phase 1 — daily note splitter
./deploy.sh semantic   # Phase 2 — semantic backlinks
./deploy.sh keywords   # Phase 3 — keyword linker
./deploy.sh synonyms   # Phase 4 — synonym clustering
./deploy.sh status     # Tail each phase's log
./deploy.sh logs       # Live-tail all logs
./deploy.sh help       # Show all commands

Each phase also accepts per-phase flags (--dry-run, --rebuild, --one PATH, --stats). Pass them after the phase name:

./deploy.sh split --dry-run --since 2024-06-01
./deploy.sh synonyms --stats

Documentation

Document	Description
Phases	Deep dive on each of the four phases
Models	Supported LLMs + Q4 vs Q8 quantization
Configuration	All settings reference
Troubleshooting	Common issues & fixes
Manual Setup	Step-by-step without scripts
Releasing	Maintainer version + PyPI workflow
Changelog	Version history (SemVer)

Troubleshooting

Issue	Fix
`llama-cpp-python` won't build with CUDA	Rebuild with `CMAKE_ARGS="-DGGML_CUDA=on"`; see troubleshooting
`cuml` import fails	Install from RAPIDS conda, not pip
Every run re-embeds all notes	Writes change mtimes; Umbra refreshes mtime cache after writes — check `state_dir/cache/`
Phase 3 links generic words like "money"	Already filtered; check your STOP_WORDS / `is_specific_keyword` logic
Phase 4 mega-clusters unusable	Lower `max_cluster_full_crosslink` in config; hub/spoke always kicks in

Full troubleshooting guide →

Requirements

Obsidian: 0.16+ (any recent version)
Python: 3.10+
NVIDIA driver: 525+ for CUDA 12
llama-cpp-python: 0.3.0+ (built with GGML_CUDA=on for speed)
sentence-transformers: 3.0+ (pulls GTE-large on first run, ~500MB)
model2vec: 0.3.0+ (Potion-32M, ~40MB)
cuml: RAPIDS release matching your CUDA (HDBSCAN on GPU)

Support

If this saved you from hand-wikilinking a decade of journal entries, you can throw a few bucks my way. No pressure.

License

_{Shadows on the wall. The real forms are your notes.}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jimnoneill

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Apr 21, 2026

This version

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidian_umbra-0.1.0.tar.gz (21.2 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

obsidian_umbra-0.1.0-py3-none-any.whl (25.7 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file obsidian_umbra-0.1.0.tar.gz.

File metadata

Download URL: obsidian_umbra-0.1.0.tar.gz
Upload date: Apr 21, 2026
Size: 21.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for obsidian_umbra-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`81e7c82efd095c1927cbb42bba77dd95a96fbba27bea1cd33af9fc7999bd182b`
MD5	`f502e81c5b098159c097e778498dec0e`
BLAKE2b-256	`bee6b545c06d8baa8fc55c90d8a84c126900ad9ed9dbe8e87adcd6874e6104c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_umbra-0.1.0.tar.gz:

Publisher: publish.yml on jimnoneill/obsidian-umbra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: obsidian_umbra-0.1.0.tar.gz
- Subject digest: 81e7c82efd095c1927cbb42bba77dd95a96fbba27bea1cd33af9fc7999bd182b
- Sigstore transparency entry: 1346952382
- Sigstore integration time: Apr 21, 2026
Source repository:
- Permalink: jimnoneill/obsidian-umbra@8749b80401cd0fc72c60819ee211f0e06a9a7608
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jimnoneill
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8749b80401cd0fc72c60819ee211f0e06a9a7608
- Trigger Event: release

File details

Details for the file obsidian_umbra-0.1.0-py3-none-any.whl.

File metadata

Download URL: obsidian_umbra-0.1.0-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for obsidian_umbra-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3d2338edadf7a2b38fcb67a48a58f60bcaaf8d1559a12e8c37c25733545cba0`
MD5	`b99c804cd653da4a839d8a44cebf3a38`
BLAKE2b-256	`ab836a619ff501e279aa9cfc71c3cd782e217c4f1fb8a48e9a4ffaab9f7d4958`

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_umbra-0.1.0-py3-none-any.whl:

Publisher: publish.yml on jimnoneill/obsidian-umbra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: obsidian_umbra-0.1.0-py3-none-any.whl
- Subject digest: c3d2338edadf7a2b38fcb67a48a58f60bcaaf8d1559a12e8c37c25733545cba0
- Sigstore transparency entry: 1346952479
- Sigstore integration time: Apr 21, 2026
Source repository:
- Permalink: jimnoneill/obsidian-umbra@8749b80401cd0fc72c60819ee211f0e06a9a7608
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jimnoneill
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8749b80401cd0fc72c60819ee211f0e06a9a7608
- Trigger Event: release

obsidian-umbra 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Why?

Quick Start

Prerequisites

1. Install

2. Download a Model

3. Configure

4. Run

5. Install the Cron Job (Optional)

How It Works

Before / After

Commands

Documentation

Troubleshooting

Requirements

Support

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance