Lightweight Obsidian vault parser — extracts wikilinks, embeds, tags, and frontmatter with a single dependency (PyYAML). No Obsidian app required.
Project description
obsidian-parse
Extract knowledge graph metadata from Obsidian-style markdown vaults.
A helper library for Obsidian parsing — covering only what standard code can't: Obsidian-specific syntax (wikilinks, embeds, nested tags), vault ignore rules, shortest-path link resolution, and graph output. Filesystem traversal, text search, and other general-purpose tasks are intentionally left to the caller.
Parses .md, .canvas, and .base files to extract wikilinks, embeds, tags, and frontmatter — then converts them into a D3-compatible graph structure for visualization or downstream analysis.
Installation
uv add obsidian-parse
Or with pip:
pip install obsidian-parse
Quick Start
from obsidian_parse import parse, results_to_d3
# Parse an entire vault directory
results = parse(["/path/to/your/vault"])
# Convert to D3 graph format
graph = results_to_d3(results)
# graph = {"nodes": [...], "links": [...]}
What It Extracts
| Element | Syntax | Example |
|---|---|---|
| WikiLink | [[Note]] |
[[Project Ideas|alias]] |
| Embed | ![[file]] |
![[image.png]] |
| Tag | #tagname |
#topic/subtopic |
| Frontmatter | YAML header | tags: [python, tools] |
Extraction is block-aware: wikilinks and tags inside code fences or HTML blocks are intentionally ignored.
API
parse(paths)
Accepts a list of file or directory paths. Directories are scanned recursively. Respects .obsidian/app.json ignore rules and skips dotfiles/dotfolders.
Returns a list of ParseResult objects.
Raises:
NoPathsProvidedError— ifpathsis emptyPathNotFoundError— if none of the paths existNoMarkdownFilesError— if paths exist but contain no parseable files
ParseResult
| Property | Type | Description |
|---|---|---|
file_id |
str |
Filename used as node ID — .md extension omitted, .canvas/.base kept (e.g. "Note", "Board.canvas") |
path |
Path |
Original file path |
frontmatter |
dict |
Parsed YAML frontmatter |
wikilinks |
list[WikiLink] |
Wikilinks with line/col positions |
embeds |
list[Embed] |
Embeds with line/col positions |
tags |
list[TagRef] |
Tags with line/col positions |
wikilink_targets |
list[str] |
Deduplicated link targets (computed) |
embed_targets |
list[str] |
Deduplicated embed targets (computed) |
tag_names |
list[str] |
Merged body + frontmatter tags (computed) |
parse_file(file_path)
Parses a single file by dispatching to the correct parser based on extension.
Raises UnsupportedFileTypeError for unregistered extensions.
parse_markdown_file(file_path)
Reads and parses a single .md file directly, returning a ParseResult.
parse_markdown_content(content, file_id, path)
Parses raw markdown string content without reading from disk. Useful for testing or in-memory workflows.
WikiLink
| Field | Type | Description |
|---|---|---|
target |
str |
Link target — .md extension omitted, other extensions kept (e.g. "Note", "Board.canvas") |
section |
str | None |
Heading (#Section) or block ref (^id) |
alias |
str | None |
Display alias after | |
line |
int | None |
Source line number |
col |
int | None |
Source column number |
Embed
| Field | Type | Description |
|---|---|---|
target |
str |
Embed target filename — .md extension omitted, other extensions kept |
section |
str | None |
Heading or block id |
line |
int | None |
Source line number |
col |
int | None |
Source column number |
TagRef
| Field | Type | Description |
|---|---|---|
name |
str |
Tag name without leading # |
line |
int | None |
Source line number |
col |
int | None |
Source column number |
find_file_by_id(vault_root, file_id, *, known_files=None)
Resolves a file_id to a path relative to vault_root.
- Bare stem or
"stem.md"→ matches.mdfiles only "stem.canvas"/"stem.base"→ matches that exact extension
When multiple files match, the shallowest path wins (Obsidian's shortest-path behavior). Pass known_files (from discover_files()) to avoid repeated filesystem traversal.
Returns a Path relative to vault_root, or None if no file is found.
from pathlib import Path
from obsidian_parse import find_file_by_id
vault = Path("/path/to/vault")
find_file_by_id(vault, "Note") # → Path("folder/Note.md") or None
find_file_by_id(vault, "Board.canvas") # → Path("Board.canvas") or None
expand_nested_tag(tag)
Expands a nested tag string into all ancestor tags.
from obsidian_parse.utils.tags import expand_nested_tag
expand_nested_tag("a/b/c") # ["a", "a/b", "a/b/c"]
expand_nested_tag("/foo/bar") # ["/foo", "/foo/bar"]
expand_nested_tag("a//b/c") # ["a", "a//b", "a//b/c"]
The first / of each consecutive slash run is the hierarchy separator; remaining slashes become part of the next segment's name. A leading slash run is part of the first segment's name, never a separator.
results_to_d3(results)
Converts a list of ParseResult into a dict:
{
"nodes": [
{"id": "note-a", "type": "file", "label": "note-a"},
{"id": "#python", "type": "tag", "label": "python"},
],
"links": [
{"source": "note-a", "target": "note-b", "relation": "wikilink"},
{"source": "note-a", "target": "#python", "relation": "tag"},
{"source": "#python/tools", "target": "#python", "relation": "parent"},
]
}
Link relations: wikilink, embed, tag, parent (tag hierarchy).
Supported File Types
.md— Markdown with YAML frontmatter.canvas— Obsidian canvas JSON; extracts wikilinks from file-type nodes and all elements from text nodes.base— Obsidian base files; recorded as graph nodes (filename/path only, no link extraction)
Development
# Install with dev dependencies
uv sync --group dev
# Lint
ruff check src/
# Type check
mypy src/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file obsidian_parse-0.1.1.tar.gz.
File metadata
- Download URL: obsidian_parse-0.1.1.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fee471365c70814f71764bc445ea72b9183a2e3373587c8f5ceb40cfa18d87a
|
|
| MD5 |
b8ad2efc49ca4782c38b615e611862e6
|
|
| BLAKE2b-256 |
c3a3d5f94c5ae658096424f9b9814356cc3a580ab625ee6380c469c7135266b6
|
File details
Details for the file obsidian_parse-0.1.1-py3-none-any.whl.
File metadata
- Download URL: obsidian_parse-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af8d237d40fb700077bebe1d9b9ad1da9d73a9bc85ddc576a71f7cb3e6c2b5c0
|
|
| MD5 |
be30f142e334bd7cf7a4b8763e4ffa09
|
|
| BLAKE2b-256 |
872732d5f46f2f97e82ee3b485e327bceebc361a1aedac407ed3e3525aaa2f51
|