A Japanese-enhanced semantic search system for your local documents.
Project description
Oboyu (覚ゆ)
Lightning-fast semantic search for your local documents with best-in-class Japanese support.
What is Oboyu?
Oboyu (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.
Why Oboyu?
- 🚀 Fast: Indexes thousands of documents in seconds, searches in milliseconds
- 🎯 Accurate: Semantic search finds what you mean, not just what you type
- 🇯🇵 Japanese Excellence: First-class support with automatic encoding detection
- 🔒 Private: Everything runs locally - your documents never leave your machine
- 🤖 AI-Ready: Built-in MCP server for Claude, Cursor, and other AI assistants
Quick Start
Prerequisites
- Python 3.11 or higher
- pip (latest version recommended)
- Operating System: Linux, macOS, or Windows with WSL
- For building from source:
- C++ compiler (build-essential on Linux, Xcode on macOS)
- CMake (for sentencepiece)
Installation
Get up and running in under 5 minutes:
# Install Oboyu
pip install oboyu
# Index your documents
oboyu index ~/Documents
# Search interactively
oboyu query --interactive
That's it! See our Documentation for complete guides and examples.
Key Features
🔍 Advanced Search Capabilities
- Hybrid Search: Combines semantic understanding with keyword matching for best results
- Multiple Modes: Switch between semantic, keyword, or hybrid search modes
- Smart Reranking: Built-in AI reranker improves result accuracy
- Interactive Mode: Real-time search with command history and auto-suggestions
📚 Document Support
- Text File Support: Plain text (.txt), Markdown (.md), HTML (.html), and source code files (.py, .java, etc.) with automatic encoding detection
- Incremental Indexing: Only process new or changed files for lightning-fast updates
- Smart Chunking: Intelligent document splitting for optimal search results
- Automatic Encoding: Handles various text encodings seamlessly (UTF-8, Shift-JIS, EUC-JP, and more)
🇯🇵 Japanese Language Excellence
- Native Support: Purpose-built for Japanese text processing
- Automatic Detection: Detects and handles Shift-JIS, EUC-JP, and UTF-8
- Specialized Models: Optimized embedding models for Japanese content
- Mixed Language: Seamlessly handles Japanese and English in the same document
🚀 Performance & Integration
- ONNX Acceleration: 2-4x faster with automatic model optimization
- MCP Server: Direct integration with Claude Desktop and AI coding assistants
- Rich CLI: Beautiful terminal interface with progress tracking
- Low Memory: Efficient processing even on modest hardware
Installation
Using UV (Recommended)
uv tool install oboyu
Using pip
pip install oboyu
From Source
git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .
System Requirements
- Python: 3.13 or higher
- OS: macOS, Linux (Windows via WSL)
- Memory: 2GB RAM minimum
- Storage: 1GB for models and index
Note: Models are automatically downloaded on first use (~90MB).
Usage Examples
Basic Usage
# Index a directory
oboyu index ~/Documents/notes
# Search your documents
oboyu query "machine learning optimization techniques"
# Interactive mode (recommended!)
oboyu query --interactive
Advanced Examples
# Index only specific file types
oboyu index ~/projects --include "*.md,*.txt"
# Search with filters
oboyu query "API design" --filter "docs/"
# Use semantic search mode
oboyu query "concepts similar to dependency injection" --mode semantic
# Enable reranking for better accuracy
oboyu query "complex technical topic" --rerank
MCP Server for AI Assistants
# Start MCP server
oboyu mcp
# Or configure in Claude Desktop's settings
See our MCP Integration Guide for detailed setup instructions.
Documentation
🚀 Getting Started
- Installation - Install and verify setup
- Your First Index - Create your first searchable index
- Your First Search - Learn to search effectively
💼 Real-world Usage
- Daily Workflows - Essential daily patterns
- Technical Documentation - Code and API docs
- Meeting Notes - Track decisions and actions
- Research Papers - Academic content search
⚙️ Configuration & Optimization
- Configuration Guide - Customize for your needs
- Performance Tuning - Optimize speed and quality
- Japanese Support - Japanese language features
🔗 Integration & Reference
- Claude MCP Integration - AI-powered search
- CLI Reference - All commands and options
- Troubleshooting - Solutions to common issues
Common Use Cases
📚 Academic Research
Index and search through research notes and references:
oboyu index ~/research --include "*.md,*.txt"
oboyu query "transformer architecture improvements"
💻 Code Documentation
Search through project documentation and code comments:
oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu query "authentication implementation"
📝 Personal Knowledge Base
Organize and search your notes and documents:
oboyu index ~/Documents/notes
oboyu query "meeting notes from last week"
🌏 Multilingual Documents
Perfect for mixed Japanese and English content:
oboyu index ~/Documents/bilingual
oboyu query "プロジェクト管理 best practices"
Testing
Unit and Integration Tests
# Run fast tests (recommended for development)
uv run pytest -m "not slow"
# Run all tests with coverage
uv run pytest --cov=src
E2E Display Testing
Oboyu includes comprehensive E2E display testing using Claude Code SDK:
# Run all E2E display tests
python e2e/run_tests.py
# Run specific test category
python e2e/run_tests.py --test search
See our Full Documentation for more details.
Contributing
We welcome contributions! See our Contributing Guidelines for details.
# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"
Support
- 📋 GitHub Issues - Report bugs or request features
- 📖 Documentation - Comprehensive guides and references
- 💬 Discussions - Ask questions and share ideas
License
This project is licensed under the MIT License - see the LICENSE.md file for details.
Acknowledgments
- The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
- Built with ❤️ for the Japanese NLP community
- Inspired by the goal of making knowledge accessible across languages
Made with 🇯🇵 by sonesuke
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oboyu-0.1.0a2.tar.gz.
File metadata
- Download URL: oboyu-0.1.0a2.tar.gz
- Upload date:
- Size: 226.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cc8c63d61c871faba13aea89fd0a4abf0c744a4dcdf6921e6a4d7044a66ed52
|
|
| MD5 |
a29203dbfbe23cbae1f58a1e64a42102
|
|
| BLAKE2b-256 |
69b94b8f419b1e7da6bcb00f01d13808b7f02ef3d9d37c6424c3625f86cd391c
|
Provenance
The following attestation bundles were made for oboyu-0.1.0a2.tar.gz:
Publisher:
prerelease.yml on sonesuke/oboyu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
oboyu-0.1.0a2.tar.gz -
Subject digest:
0cc8c63d61c871faba13aea89fd0a4abf0c744a4dcdf6921e6a4d7044a66ed52 - Sigstore transparency entry: 234963848
- Sigstore integration time:
-
Permalink:
sonesuke/oboyu@ea4ad52b3309b2527c2b60c2093314d024c2e484 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/sonesuke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
prerelease.yml@ea4ad52b3309b2527c2b60c2093314d024c2e484 -
Trigger Event:
push
-
Statement type:
File details
Details for the file oboyu-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: oboyu-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 335.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f21342b3e573b4e75dd88980a9b575db5106167c4c958b75dd3249a9608d746e
|
|
| MD5 |
558af575ebe3cf024a2a1e19705312e7
|
|
| BLAKE2b-256 |
c55a17515a2e31f2f4b871c55b65b8ec0040f960731669537c3bd16db76ecd00
|
Provenance
The following attestation bundles were made for oboyu-0.1.0a2-py3-none-any.whl:
Publisher:
prerelease.yml on sonesuke/oboyu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
oboyu-0.1.0a2-py3-none-any.whl -
Subject digest:
f21342b3e573b4e75dd88980a9b575db5106167c4c958b75dd3249a9608d746e - Sigstore transparency entry: 234963857
- Sigstore integration time:
-
Permalink:
sonesuke/oboyu@ea4ad52b3309b2527c2b60c2093314d024c2e484 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/sonesuke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
prerelease.yml@ea4ad52b3309b2527c2b60c2093314d024c2e484 -
Trigger Event:
push
-
Statement type: