Skip to main content

AI-powered semantic search and chat for Obsidian notes

Project description

Obsidian-AI

A command-line AI assistant that chats with your personal knowledge base using OpenAI's GPT models. Search, read, and semantically explore your notes with natural language queries.

Features

  • Smart Search: Keyword and semantic search across your note collection
  • Safe File Access: Read-only operations with directory sandboxing
  • Interactive Chat: Both single-query and REPL modes
  • Local Embeddings: TF-IDF based semantic search with local caching
  • Rich Output: Beautiful terminal UI with syntax highlighting

Quick Start

Installation

# Clone and install
git clone <repository-url>
cd obsidian-ai
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

Configuration

export OPENAI_API_KEY="your-api-key-here"
export OBSIDIAN_AI_BRAIN_DIR="$HOME/brain"              # Optional: defaults to ~/brain
export OBSIDIAN_AI_MODEL="gpt-4o"                       # Optional: defaults to gpt-4o
export OBSIDIAN_AI_IGNORE_PATTERNS="*.tmp,cache/*,30. Areas/Roleplay"  # Optional: comma-separated ignore patterns

Usage

# Single question
obsidian-ai chat "What notes do I have about machine learning?"

# Interactive chat
obsidian-ai repl

# Direct search
obsidian-ai search "project ideas"

# Read specific file
obsidian-ai read "projects/ai-assistant.md"

# Ignore specific patterns for this session
obsidian-ai --ignore "temp/*" --ignore "*.draft" chat "What are my project ideas?"

How It Works

Obsidian-AI provides your chosen GPT model with three powerful tools to explore your notes:

  1. search(query) - Keyword search across filenames and content
  2. read_file(path) - Safe file reading with byte-range support
  3. semantic_search(query) - Similarity search using local TF-IDF embeddings

The assistant uses these tools to ground its responses in your actual notes, providing specific file citations and relevant excerpts.

Architecture

src/obsidian_ai/
├── cli.py          # Command-line interface
├── chat.py         # OpenAI chat integration with tool calling
├── config.py       # Environment configuration
├── tools.py        # Tool definitions and dispatch
├── search.py       # Keyword search implementation
├── semsearch.py    # Semantic search with local embeddings
├── local_embed.py  # TF-IDF vectorizer implementation
└── fs.py          # File system utilities

Supported File Types

  • Markdown (.md)
  • Text files (.txt)
  • Org-mode (.org)
  • reStructuredText (.rst)
  • Code files (.py, .js, .ts, .java, .go)
  • Data files (.csv, .json, .yaml, .yml)

Safety & Security

  • Read-only: No file modification capabilities
  • Directory sandboxing: File access restricted to configured brain directory
  • No secrets in code: API keys only via environment variables
  • Size limits: Files over 2MB are skipped to prevent abuse

Configuration Options

Environment Variable Default Description
OBSIDIAN_AI_BRAIN_DIR ~/brain Directory containing your notes
OBSIDIAN_AI_MODEL gpt-4o OpenAI model to use
OBSIDIAN_AI_MAX_TOOL_CALLS 5 Maximum tool calls per query
OBSIDIAN_AI_IGNORE_PATTERNS Built-in defaults Comma-separated patterns to ignore
OPENAI_API_KEY required Your OpenAI API key

Advanced Usage

Ignore Patterns

Control which directories and files are excluded from search:

# Environment variable (persistent)
export OBSIDIAN_AI_IGNORE_PATTERNS="30. Areas/Roleplay,temp/*,*.draft,private/*"

# Command-line flags (session-only)
obsidian-ai --ignore "temp/*" --ignore "*.draft" search "project ideas"
obsidian-ai --ignore "30. Areas/Roleplay" chat "Tell me about my notes"

Built-in ignore patterns:

  • .git, .obsidian, .obsidian_ai_cache
  • node_modules, __pycache__
  • .DS_Store, Thumbs.db

Pattern matching:

  • * matches any characters: temp/* ignores anything in temp directories
  • *.ext matches files with specific extensions
  • dirname matches exact directory names anywhere in the path
  • path/to/dir matches specific paths relative to brain directory

Semantic Search Caching

The semantic search builds a local TF-IDF index cached in .obsidian_ai_cache/. The index automatically rebuilds when files change.

File Reading with Ranges

# Read first 1KB of a large file
obsidian-ai read "large-document.md" --start 0 --max-bytes 1024

# Read from specific byte offset
obsidian-ai read "large-document.md" --start 1024 --max-bytes 2048

Verbose Logging

# Enable debug logging
obsidian-ai -v chat "your question"
obsidian-ai -vv repl  # Even more verbose

Development

Testing

# Run tests
uv run pytest tests/

# Run specific test
uv run pytest tests/test_search.py -v

Project Structure

The codebase follows bacterial coding principles - small, modular, self-contained functions that could easily be copied and reused. Each module has a single clear purpose:

  • fs.py - File system iteration
  • search.py - Text search logic
  • local_embed.py - Embedding vectorization
  • semsearch.py - Semantic search coordination
  • tools.py - OpenAI tool integration
  • chat.py - Conversation management

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE for details.

Author

Created by Sumuk Shashidhar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidian_ai-0.1.2.tar.gz (30.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obsidian_ai-0.1.2-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file obsidian_ai-0.1.2.tar.gz.

File metadata

  • Download URL: obsidian_ai-0.1.2.tar.gz
  • Upload date:
  • Size: 30.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for obsidian_ai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 93618a3993d029155603152feb84a9c5b4b4d3c88920a3a31cdcac2193d17578
MD5 a0080c08b5e3240f9565794d39f28148
BLAKE2b-256 18e4ad51ef25db34ce7fce9d8e3fc6c6421a2ccfbe47872cee0ee495abc6b66a

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_ai-0.1.2.tar.gz:

Publisher: publish.yml on sumukshashidhar/obsidian-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file obsidian_ai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: obsidian_ai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for obsidian_ai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a7caee6f721fd5ab0b222048b15123d68622a1da2ee1e299429e254a9aa00e32
MD5 56fd3a6c6866ac33e80ef65abf5235fd
BLAKE2b-256 e033557064a650e4e9acdd5d428b32354cbfd12beded08fdadcc57c954f2741c

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_ai-0.1.2-py3-none-any.whl:

Publisher: publish.yml on sumukshashidhar/obsidian-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page