Skip to main content

Lightweight Obsidian vault parser — extracts wikilinks, embeds, tags, and frontmatter with a single dependency (PyYAML). No Obsidian app required.

Project description

obsidian-parse

Extract knowledge graph metadata from Obsidian-style markdown vaults.

A helper library for Obsidian parsing — covering only what standard code can't: Obsidian-specific syntax (wikilinks, embeds, nested tags), vault ignore rules, shortest-path link resolution, and graph output. Filesystem traversal, text search, and other general-purpose tasks are intentionally left to the caller.

Parses .md, .canvas, and .base files to extract wikilinks, embeds, tags, and frontmatter — then converts them into a D3-compatible graph structure for visualization or downstream analysis.

Installation

uv add obsidian-parse

Or with pip:

pip install obsidian-parse

Quick Start

from obsidian_parse import parse, results_to_d3

# Parse an entire vault directory
results = parse(["/path/to/your/vault"])

# Convert to D3 graph format
graph = results_to_d3(results)
# graph = {"nodes": [...], "links": [...]}

What It Extracts

Element Syntax Example
WikiLink [[Note]] [[Project Ideas|alias]]
Embed ![[file]] ![[image.png]]
Tag #tagname #topic/subtopic
Frontmatter YAML header tags: [python, tools]

Extraction is block-aware: wikilinks and tags inside code fences or HTML blocks are intentionally ignored.

API

parse(paths)

Accepts a list of file or directory paths. Directories are scanned recursively. Respects .obsidian/app.json ignore rules and skips dotfiles/dotfolders.

Returns a list of ParseResult objects.

Raises:

  • NoPathsProvidedError — if paths is empty
  • PathNotFoundError — if none of the paths exist
  • NoMarkdownFilesError — if paths exist but contain no parseable files

ParseResult

Property Type Description
file_id str Filename stem, used as node ID
path Path Original file path
frontmatter dict Parsed YAML frontmatter
wikilinks list[WikiLink] Wikilinks with line/col positions
embeds list[Embed] Embeds with line/col positions
tags list[TagRef] Tags with line/col positions
wikilink_targets list[str] Deduplicated link targets (computed)
embed_targets list[str] Deduplicated embed targets (computed)
tag_names list[str] Merged body + frontmatter tags (computed)

parse_file(file_path)

Parses a single file by dispatching to the correct parser based on extension.

Raises UnsupportedFileTypeError for unregistered extensions.

parse_markdown_file(file_path)

Reads and parses a single .md file directly, returning a ParseResult.

parse_markdown_content(content, file_id, path)

Parses raw markdown string content without reading from disk. Useful for testing or in-memory workflows.

WikiLink

Field Type Description
target str Link target (note name)
section str | None Heading (#Section) or block ref (^id)
alias str | None Display alias after |
line int | None Source line number
col int | None Source column number

Embed

Field Type Description
target str Embed target filename
section str | None Heading or block id
line int | None Source line number
col int | None Source column number

TagRef

Field Type Description
name str Tag name without leading #
line int | None Source line number
col int | None Source column number

expand_nested_tag(tag)

Expands a nested tag string into all ancestor tags.

from obsidian_parse.utils.tags import expand_nested_tag

expand_nested_tag("a/b/c")    # ["a", "a/b", "a/b/c"]
expand_nested_tag("/foo/bar") # ["/foo", "/foo/bar"]
expand_nested_tag("a//b/c")  # ["a", "a//b", "a//b/c"]

The first / of each consecutive slash run is the hierarchy separator; remaining slashes become part of the next segment's name. A leading slash run is part of the first segment's name, never a separator.

results_to_d3(results)

Converts a list of ParseResult into a dict:

{
    "nodes": [
        {"id": "note-a", "type": "file", "label": "note-a"},
        {"id": "#python", "type": "tag", "label": "python"},
    ],
    "links": [
        {"source": "note-a", "target": "note-b", "relation": "wikilink"},
        {"source": "note-a", "target": "#python", "relation": "tag"},
        {"source": "#python/tools", "target": "#python", "relation": "parent"},
    ]
}

Link relations: wikilink, embed, tag, parent (tag hierarchy).

Supported File Types

  • .md — Markdown with YAML frontmatter
  • .canvas — Obsidian canvas JSON; extracts wikilinks from file-type nodes and all elements from text nodes
  • .base — Obsidian base files; recorded as graph nodes (filename/path only, no link extraction)

Development

# Install with dev dependencies
uv sync --group dev

# Lint
ruff check src/

# Type check
mypy src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidian_parse-0.1.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obsidian_parse-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file obsidian_parse-0.1.0.tar.gz.

File metadata

  • Download URL: obsidian_parse-0.1.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for obsidian_parse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 25bd556db881b90c2472da6798f1cb5928fe427a2ed3cff6b1bab6c0f6e26c40
MD5 9b5bb4f3d9223232675edab740e77bc0
BLAKE2b-256 086a90b21237194cf38a53b084f79148447439801f2eff75c5e5f6d33c19ff62

See more details on using hashes here.

File details

Details for the file obsidian_parse-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: obsidian_parse-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for obsidian_parse-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0331dde112485d9da13930f30dd1ef8a26b5937923a9f886610c4428f605d106
MD5 67e31e603f69254e74f6731852148d4f
BLAKE2b-256 e0b08e541f1f1533e85b92cafa59b5fb394e8611f5bc4632d89e89165321b0ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page