Skip to main content

A Japanese-enhanced semantic search system for your local documents.

Project description

Oboyu (覚ゆ)

License: MIT Python Version PyPI Version

Lightning-fast semantic search for your local documents with best-in-class Japanese support.

demo

What is Oboyu?

Oboyu (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.

Why Oboyu?

  • 🚀 Fast: Indexes thousands of documents in seconds, searches in milliseconds
  • 🎯 Accurate: Semantic search finds what you mean, not just what you type
  • 🇯🇵 Japanese Excellence: First-class support with automatic encoding detection
  • 🔒 Private: Everything runs locally - your documents never leave your machine
  • 🤖 AI-Ready: Built-in MCP server for Claude, Cursor, and other AI assistants

Quick Start

Get up and running in under 5 minutes:

# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search interactively
oboyu query --interactive

That's it! See our Documentation for complete guides and examples.

Key Features

🔍 Advanced Search Capabilities

  • Hybrid Search: Combines semantic understanding with keyword matching for best results
  • Multiple Modes: Switch between semantic, keyword, or hybrid search modes
  • Smart Reranking: Built-in AI reranker improves result accuracy
  • Interactive Mode: Real-time search with command history and auto-suggestions

📚 Document Support

  • Text File Support: Plain text (.txt), Markdown (.md), HTML (.html), and source code files (.py, .java, etc.) with automatic encoding detection
  • Incremental Indexing: Only process new or changed files for lightning-fast updates
  • Smart Chunking: Intelligent document splitting for optimal search results
  • Automatic Encoding: Handles various text encodings seamlessly (UTF-8, Shift-JIS, EUC-JP, and more)

🇯🇵 Japanese Language Excellence

  • Native Support: Purpose-built for Japanese text processing
  • Automatic Detection: Detects and handles Shift-JIS, EUC-JP, and UTF-8
  • Specialized Models: Optimized embedding models for Japanese content
  • Mixed Language: Seamlessly handles Japanese and English in the same document

🚀 Performance & Integration

  • ONNX Acceleration: 2-4x faster with automatic model optimization
  • MCP Server: Direct integration with Claude Desktop and AI coding assistants
  • Rich CLI: Beautiful terminal interface with progress tracking
  • Low Memory: Efficient processing even on modest hardware

Installation

Using UV (Recommended)

uv tool install oboyu

Using pip

pip install oboyu

From Source

git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .

System Requirements

  • Python: 3.13 or higher
  • OS: macOS, Linux (Windows via WSL)
  • Memory: 2GB RAM minimum
  • Storage: 1GB for models and index

Note: Models are automatically downloaded on first use (~90MB).

Usage Examples

Basic Usage

# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu query "machine learning optimization techniques"

# Interactive mode (recommended!)
oboyu query --interactive

Advanced Examples

# Index only specific file types
oboyu index ~/projects --include "*.md,*.txt"

# Search with filters
oboyu query "API design" --filter "docs/"

# Use semantic search mode
oboyu query "concepts similar to dependency injection" --mode semantic

# Enable reranking for better accuracy
oboyu query "complex technical topic" --rerank

MCP Server for AI Assistants

# Start MCP server
oboyu mcp

# Or configure in Claude Desktop's settings

See our MCP Integration Guide for detailed setup instructions.

Documentation

🚀 Getting Started

💼 Real-world Usage

⚙️ Configuration & Optimization

🔗 Integration & Reference

📖 View Full Documentation →

Common Use Cases

📚 Academic Research

Index and search through research notes and references:

oboyu index ~/research --include "*.md,*.txt"
oboyu query "transformer architecture improvements"

💻 Code Documentation

Search through project documentation and code comments:

oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu query "authentication implementation"

📝 Personal Knowledge Base

Organize and search your notes and documents:

oboyu index ~/Documents/notes
oboyu query "meeting notes from last week"

🌏 Multilingual Documents

Perfect for mixed Japanese and English content:

oboyu index ~/Documents/bilingual
oboyu query "プロジェクト管理 best practices"

Testing

Unit and Integration Tests

# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src

E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

# Run all E2E display tests
python e2e/run_tests.py

# Run specific test category
python e2e/run_tests.py --test search

See our Full Documentation for more details.

Contributing

We welcome contributions! See our Contributing Guidelines for details.

# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"

Support

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
  • Built with ❤️ for the Japanese NLP community
  • Inspired by the goal of making knowledge accessible across languages

Made with 🇯🇵 by sonesuke

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oboyu-0.1.0a1.tar.gz (7.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oboyu-0.1.0a1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file oboyu-0.1.0a1.tar.gz.

File metadata

  • Download URL: oboyu-0.1.0a1.tar.gz
  • Upload date:
  • Size: 7.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 90c3ed53fc82bc597e0a2e3833fc5a5eb76ab9a651117bf3423b877933c2551e
MD5 ff704b365dffd83fe090144e2e34ffb7
BLAKE2b-256 43bd9c42efcee3e6fba24edf1aca604505e5aab4c91beb3e02a4770f91388965

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a1.tar.gz:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oboyu-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: oboyu-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 cab94c3f7c98d5e0c33786ad91503545055dbdef19fc8634bc2502e9233f6951
MD5 ab259728bca7dc575532639eed948bcc
BLAKE2b-256 18cbd063d2e06a3fac4277403a3b6b7aea7879c9e9de82be3db8c6ec6a5ff45e

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a1-py3-none-any.whl:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page