Skip to main content

A Python library for parsing Obsidian Markdown (.md) files and vaults.

Project description

obsidianmd-parser

A Python package for parsing Obsidian Markdown vaults and notes, with support for Obsidian's built-in markdown format and Dataview queries.

Features

  • Complete Vault Parsing: Load and parse entire Obsidian vaults
  • Note Object Model: Work with notes as Python objects with attributes and methods
  • Obsidian Markdown Support:
    • Wikilinks ([[links]] and [[links|aliases]])
    • Tags (#tag, #nested/tag)
    • Task lists with status tracking
    • Obsidian callouts
  • Relationship Tracking: Analyze backlinks and relationships between notes
  • Dataview Support:
    • Parse Dataview queries from notes
    • Evaluate Dataview queries programmatically
  • Search Capabilities:
    • Exact search for notes
    • Similarity search using various algorithms
  • Code Block Handling: Correctly excludes parsing within code blocks

Installation

pip install obsidianmd-parser

Quick Start

from obsidian_parser import Vault

# Load a vault
vault = Vault("path/to/your/obsidian/vault")

# Find notes by exact name
note = vault.get_note("My Note")

# Search notes by similarity
similar_notes = vault.find_notes("machine learning", case_sensitive=False)

# Access note properties
print(note.title)
print(note.tags)
print(note.wikilinks)
print(note.tasks)

# Work with relationships
backlinks = note.get_backlinks(vault=vault)
related = note.get_forward_links(vault=vault)
most_linked = note.get_most_linked()

Core API

Vault

The Vault class represents an entire Obsidian vault:

# lazy_load = notes are parsed only when accessed (default: True)
vault = Vault("path/to/vault", lazy_load=True)

# Search and retrieval
note = vault.get_note("Note Title")
notes = vault.find_similar_notes("search query", threshold=0.5)

# Vault analysis
note_graph = vault.get_note_graph()                 # Produces a note graph tuple object
dataview_usage = vault.analyze_dataview_usage()     # Get vault statistics for dataview queries
broken_links = vault.find_broken_links()            # Finds all broken links in the vault

Note

The Note class represents an individual note:

# Access note metadata
note.title          # Note title
note.path          # File path
note.content       # Raw markdown content
note.frontmatter   # Parsed YAML frontmatter

# Access parsed elements
note.tags          # List of tags in the note
note.wikilinks     # List of wikilinks (forward)
note.tasks         # List of tasks
note.callouts      # List of callouts

# Relationships
vault=Vault('path/to/vault')
note.get_backlinks(vault)       # Notes that link to this note
note.get_forward_links(vault)   # Notes this note links to
note.get_related_notes()        # Related notes by various metrics
note.get_link_context("Target") # Get the context for a piece of text in your note 
note.get_link_context(          # E.g. context for a wikilink.
  target=note.wikilinks[0].display_text, 
  context_chars=40)

Dataview Support

Parse and evaluate Dataview queries:

# Parse Dataview queries from a note
queries = note.dataview_queries

query = queries[0]
query.evaluate(vault, note)

# Evaluate a Dataview query in notes or sections
print(note.get_evaluated_view(vault))

note_section = notes.sections[10]

print(note_section.get_evaluated_view(vault))

Advanced Usage

Custom Search

# Configure similarity search
results = vault.search(
    query="machine learning",
    limit=10
    threshold=0.6
)

Vault Analysis

# Build an note index dataframe of the vault
vault_index = vault.build_index()

# Build and analyze vault graph
graph = vault.get_note_graph()

# Find broken links
broken_links = vault.find_broken_links()

# Relationship analysis
relationship_stats = vault.analyze_relationships()          # Builds a Relationship Analyzer object
stats_report = relationship_stats.build_statistics_report()
df = relationship_stats.export_to_dataframe()               # Pandas dataframe object
relationship_stats.find_hub_notes(                          # Find notes with lots of connections (default = 10)
  min_connections=50
) 
orphaned_notes = relationship_stats.find_orphaned_notes()   # Find orphaned notes (no backlinks)

Working with Parsed Elements

# Access specific elements
for link in note.wikilinks:
    print(f"Link to: {link.target}, alias: {link.alias}")

for task in note.tasks:
    if task.status == " ":
        print(f"TODO: {task.text}")

for tag in note.tags:
    print(f"Tag: #{tag.name}")

Requirements

  • Python 3.12+ (earlier versions may be supported but not yet tested)
  • Dependencies are automatically installed with pip

Contributing

Contributions are welcome! The project is hosted on Codeberg:

https://codeberg.org/paddyd/obsidian-parser

Please feel free to submit issues and pull requests.

License

MIT

Changelog

0.1.0 (Initial Release)

  • Core vault and note parsing functionality
  • Obsidian markdown format support
  • Dataview query parsing and evaluation
  • Search capabilities (exact and similarity)
  • Relationship tracking and graph building

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidianmd_parser-0.1.0.tar.gz (46.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obsidianmd_parser-0.1.0-py3-none-any.whl (51.4 kB view details)

Uploaded Python 3

File details

Details for the file obsidianmd_parser-0.1.0.tar.gz.

File metadata

  • Download URL: obsidianmd_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 46.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for obsidianmd_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4a124b5d3a88d5fc844bee6481eafb35ac5eb1c3a42f276bf6b5faccf8d75a9
MD5 61a8236a1dfc97cdd4bdab7d6a6d8f85
BLAKE2b-256 bdb9d75dfb20c9c610c65eb407f5569adeacc70e9ede1f1c6cd8aa01b3f43669

See more details on using hashes here.

File details

Details for the file obsidianmd_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for obsidianmd_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38195316c31b644cc30ec244a5c6de81b07668219481cfadeb4ec06d3339b15f
MD5 eaa3ced54ba1c0333867bab864150e47
BLAKE2b-256 25bc6e023ed1b406b2094901da4f6e966bf601a7dcac2478a68a05517bd50ef2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page