Skip to main content

Lightweight Obsidian vault parser — extracts wikilinks, embeds, tags, and frontmatter with a single dependency (PyYAML). No Obsidian app required.

Project description

obsidian-parse

Extract knowledge graph metadata from Obsidian-style markdown vaults.

A helper library for Obsidian parsing — covering only what standard code can't: Obsidian-specific syntax (wikilinks, embeds, nested tags), vault ignore rules, shortest-path link resolution, and graph output. Filesystem traversal, text search, and other general-purpose tasks are intentionally left to the caller.

Parses .md, .canvas, and .base files to extract wikilinks, embeds, tags, and frontmatter — then converts them into a D3-compatible graph structure for visualization or downstream analysis.

Installation

uv add obsidian-parse

Or with pip:

pip install obsidian-parse

Quick Start

from obsidian_parse import parse, results_to_d3

# Parse an entire vault directory
results = parse(["/path/to/your/vault"])

# Convert to D3 graph format
graph = results_to_d3(results)
# graph = {"nodes": [...], "links": [...]}

What It Extracts

Element Syntax Example
WikiLink [[Note]] [[Project Ideas|alias]]
Embed ![[file]] ![[image.png]]
Tag #tagname #topic/subtopic
Frontmatter YAML header tags: [python, tools]

Extraction is block-aware: wikilinks and tags inside code fences or HTML blocks are intentionally ignored.

API

parse(paths)

Accepts a list of file or directory paths. Directories are scanned recursively. Respects .obsidian/app.json ignore rules and skips dotfiles/dotfolders.

Returns a list of ParseResult objects.

Raises:

  • NoPathsProvidedError — if paths is empty
  • PathNotFoundError — if none of the paths exist
  • NoMarkdownFilesError — if paths exist but contain no parseable files

ParseResult

Property Type Description
file_id str Filename used as node ID — .md extension omitted, .canvas/.base kept (e.g. "Note", "Board.canvas")
path Path Original file path
frontmatter dict Parsed YAML frontmatter
wikilinks list[WikiLink] Wikilinks with line/col positions
embeds list[Embed] Embeds with line/col positions
tags list[TagRef] Tags with line/col positions
wikilink_targets list[str] Deduplicated link targets (computed)
embed_targets list[str] Deduplicated embed targets (computed)
tag_names list[str] Merged body + frontmatter tags (computed)

parse_file(file_path)

Parses a single file by dispatching to the correct parser based on extension.

Raises UnsupportedFileTypeError for unregistered extensions.

parse_markdown_file(file_path)

Reads and parses a single .md file directly, returning a ParseResult.

parse_markdown_content(content, file_id, path)

Parses raw markdown string content without reading from disk. Useful for testing or in-memory workflows.

WikiLink

Field Type Description
target str Link target — .md extension omitted, other extensions kept (e.g. "Note", "Board.canvas")
section str | None Heading (#Section) or block ref (^id)
alias str | None Display alias after |
line int | None Source line number
col int | None Source column number

Embed

Field Type Description
target str Embed target filename — .md extension omitted, other extensions kept
section str | None Heading or block id
line int | None Source line number
col int | None Source column number

TagRef

Field Type Description
name str Tag name without leading #
line int | None Source line number
col int | None Source column number

find_file_by_id(vault_root, file_id, *, known_files=None)

Resolves a file_id to a path relative to vault_root.

  • Bare stem or "stem.md" → matches .md files only
  • "stem.canvas" / "stem.base" → matches that exact extension

When multiple files match, the shallowest path wins (Obsidian's shortest-path behavior). Pass known_files (from discover_files()) to avoid repeated filesystem traversal.

Returns a Path relative to vault_root, or None if no file is found.

from pathlib import Path
from obsidian_parse import find_file_by_id

vault = Path("/path/to/vault")
find_file_by_id(vault, "Note")          # → Path("folder/Note.md") or None
find_file_by_id(vault, "Board.canvas")  # → Path("Board.canvas") or None

expand_nested_tag(tag)

Expands a nested tag string into all ancestor tags.

from obsidian_parse.utils.tags import expand_nested_tag

expand_nested_tag("a/b/c")    # ["a", "a/b", "a/b/c"]
expand_nested_tag("/foo/bar") # ["/foo", "/foo/bar"]
expand_nested_tag("a//b/c")  # ["a", "a//b", "a//b/c"]

The first / of each consecutive slash run is the hierarchy separator; remaining slashes become part of the next segment's name. A leading slash run is part of the first segment's name, never a separator.

results_to_d3(results)

Converts a list of ParseResult into a dict:

{
    "nodes": [
        {"id": "note-a", "type": "file", "label": "note-a"},
        {"id": "#python", "type": "tag", "label": "python"},
    ],
    "links": [
        {"source": "note-a", "target": "note-b", "relation": "wikilink"},
        {"source": "note-a", "target": "#python", "relation": "tag"},
        {"source": "#python/tools", "target": "#python", "relation": "parent"},
    ]
}

Link relations: wikilink, embed, tag, parent (tag hierarchy).

Supported File Types

  • .md — Markdown with YAML frontmatter
  • .canvas — Obsidian canvas JSON; extracts wikilinks from file-type nodes and all elements from text nodes
  • .base — Obsidian base files; recorded as graph nodes (filename/path only, no link extraction)

Development

# Install with dev dependencies
uv sync --group dev

# Lint
ruff check src/

# Type check
mypy src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidian_parse-0.1.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obsidian_parse-0.1.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file obsidian_parse-0.1.1.tar.gz.

File metadata

  • Download URL: obsidian_parse-0.1.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for obsidian_parse-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3fee471365c70814f71764bc445ea72b9183a2e3373587c8f5ceb40cfa18d87a
MD5 b8ad2efc49ca4782c38b615e611862e6
BLAKE2b-256 c3a3d5f94c5ae658096424f9b9814356cc3a580ab625ee6380c469c7135266b6

See more details on using hashes here.

File details

Details for the file obsidian_parse-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: obsidian_parse-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for obsidian_parse-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 af8d237d40fb700077bebe1d9b9ad1da9d73a9bc85ddc576a71f7cb3e6c2b5c0
MD5 be30f142e334bd7cf7a4b8763e4ffa09
BLAKE2b-256 872732d5f46f2f97e82ee3b485e327bceebc361a1aedac407ed3e3525aaa2f51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page