Skip to main content

A Python library for parsing Obsidian Markdown (.md) files and vaults.

Project description

obsidianmd-parser

A Python package for parsing Obsidian Markdown vaults and notes, with support for Obsidian's built-in markdown format and Dataview queries.

Features

  • Complete Vault Parsing: Load and parse entire Obsidian vaults
  • Note Object Model: Work with notes as Python objects with attributes and methods
  • Obsidian Markdown Support:
    • Wikilinks ([[links]] and [[links|aliases]])
    • Tags (#tag, #nested/tag)
    • Task lists with status tracking
    • Obsidian callouts
  • Relationship Tracking: Analyze backlinks and relationships between notes
  • Dataview Support:
    • Parse Dataview queries from notes
    • Evaluate Dataview queries programmatically
  • Search Capabilities:
    • Exact search for notes
    • Similarity search using various algorithms
  • Code Block Handling: Correctly excludes parsing within code blocks

Installation

pip install obsidianmd-parser

Quick Start

from obsidian_parser import Vault

# Load a vault
vault = Vault("path/to/your/obsidian/vault")

# Find notes by exact name
note = vault.get_note("My Note")

# Search notes by similarity
similar_notes = vault.find_notes("machine learning", case_sensitive=False)

# Access note properties
print(note.title)
print(note.tags)
print(note.wikilinks)
print(note.tasks)

# Work with relationships
backlinks = note.get_backlinks(vault=vault)
related = note.get_forward_links(vault=vault)
most_linked = note.get_most_linked()

Core API

Vault

The Vault class represents an entire Obsidian vault:

# lazy_load = notes are parsed only when accessed (default: True)
vault = Vault("path/to/vault", lazy_load=True)

# Search and retrieval
note = vault.get_note("Note Title")
notes = vault.find_similar_notes("search query", threshold=0.5)

# Vault analysis
note_graph = vault.get_note_graph()                 # Produces a note graph tuple object
dataview_usage = vault.analyze_dataview_usage()     # Get vault statistics for dataview queries
broken_links = vault.find_broken_links()            # Finds all broken links in the vault

Note

The Note class represents an individual note:

# Access note metadata
note.title          # Note title
note.path          # File path
note.content       # Raw markdown content
note.frontmatter   # Parsed YAML frontmatter

# Access parsed elements
note.tags          # List of tags in the note
note.wikilinks     # List of wikilinks (forward)
note.tasks         # List of tasks
note.callouts      # List of callouts

# Access raw frontmatter
raw = note.frontmatter  # Dict-like object with raw values

# Get cleaned frontmatter (removes wikilinks, formats dates)
cleaned = note.frontmatter.clean()

# Custom date formatting
cleaned = note.frontmatter.clean(date_format='DD-MM-YYYY')
cleaned = note.frontmatter.clean(date_format='%B %d, %Y')  # "March 24, 2025"

# Relationships
vault=Vault('path/to/vault')
note.get_backlinks(vault)       # Notes that link to this note
note.get_forward_links(vault)   # Notes this note links to
note.get_related_notes()        # Related notes by various metrics
note.get_link_context("Target") # Get the context for a piece of text in your note 
note.get_link_context(          # E.g. context for a wikilink.
  target=note.wikilinks[0].display_text, 
  context_chars=40)

Sections

for section in note.sections:
    print(f"Section: {section.heading}")
    print(f"  Full path: {section.full_path}")
    print(f"  Parent headings [(level, heading)]: {section.parent_headings}")
    print(f"  Heading list: {section.breadcrumb}")
    print(f"  Heading hierarchy: {section.full_path}")
    print(f"  Has parent: {section.parent is not None}")

Dataview Support

Parse and evaluate Dataview queries:

# Parse Dataview queries from a note
queries = note.dataview_queries

query = queries[0]
query.evaluate(vault, note)

# Evaluate a Dataview query in notes or sections
print(note.get_evaluated_view(vault))

note_section = notes.sections[10]

print(note_section.get_evaluated_view(vault))

Advanced Usage

Custom Search

# Configure similarity search
results = vault.search(
    query="machine learning",
    limit=10
    threshold=0.6
)

Vault Analysis

# Build an note index dataframe of the vault
vault_index = vault.build_index()

# Build and analyze vault graph
graph = vault.get_note_graph()

# Find broken links
broken_links = vault.find_broken_links()

# Relationship analysis
relationship_stats = vault.analyze_relationships()          # Builds a Relationship Analyzer object
stats_report = relationship_stats.build_statistics_report()
df = relationship_stats.export_to_dataframe()               # Pandas dataframe object
relationship_stats.find_hub_notes(                          # Find notes with lots of connections (default = 10)
  min_connections=50
) 
orphaned_notes = relationship_stats.find_orphaned_notes()   # Find orphaned notes (no backlinks)

Working with Parsed Elements

# Access specific elements
for link in note.wikilinks:
    print(f"Link to: {link.target}, alias: {link.alias}")

for task in note.tasks:
    if task.status == " ":
        print(f"TODO: {task.text}")

for tag in note.tags:
    print(f"Tag: #{tag.name}")

Requirements

  • Python 3.12+ (earlier versions may be supported but not yet tested)
  • Dependencies are automatically installed with pip

Contributing

Contributions are welcome! The project is hosted on Codeberg:

https://codeberg.org/paddyd/obsidian-parser

Please feel free to submit issues and pull requests.

License

MIT

Changelog

0.4.0 (2026-01-23)

  • Added functionality to fetch and store all file names in an index in the vault.

0.3.2 (2025-11-01)

  • Added fix to the DataviewParser to handle queries where TABLE/LIST/TASK and FROM clauses appear on the same line

0.3.1 (2025-09-07)

  • Added fix to prevent '#'s in URLs being parsed as tags.
  • Added further unit tests for tag parsing.

0.3.0 (2025-06-14)

  • Added parent heading parsing for Sections.
  • Sections now capture heading hierarchy for the whole note.

0.2.0 (2025-06-07)

  • Added Frontmatter.clean() method for cleaning frontmatter values
  • Frontmatter now returns a dict-like object instead of plain dict
  • Improved wikilink parsing in frontmatter values

0.1.0 (Initial Release)

  • Core vault and note parsing functionality
  • Obsidian markdown format support
  • Dataview query parsing and evaluation
  • Search capabilities (exact and similarity)
  • Relationship tracking and graph building

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidianmd_parser-0.4.0.tar.gz (49.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obsidianmd_parser-0.4.0-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file obsidianmd_parser-0.4.0.tar.gz.

File metadata

  • Download URL: obsidianmd_parser-0.4.0.tar.gz
  • Upload date:
  • Size: 49.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for obsidianmd_parser-0.4.0.tar.gz
Algorithm Hash digest
SHA256 37e7fb66172c1e6dd342d6530a301dfb11208a87864428c99b57258faa6ae2c8
MD5 1296cddacfe96cbd644e4fe0eae25241
BLAKE2b-256 62e8467744cf9bcbf43c3501418e3ba30972baa0b6691acf75f9af36cbc592db

See more details on using hashes here.

File details

Details for the file obsidianmd_parser-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for obsidianmd_parser-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ebc5b37e29329b944d8dd2cf06298ff4312416748d0ebd54f5704c402d165b0
MD5 0793d8c874d410d25dc258654b1bec5c
BLAKE2b-256 a1c4b59e5d8daa9e3d02d154d53522c4349098f684e629d347f0982e06fcf940

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page