Skip to main content

A Japanese-enhanced semantic search system for your local documents.

Project description

Oboyu (覚ゆ)

License: MIT Python Version PyPI Version

Lightning-fast semantic search for your local documents with best-in-class Japanese support.

demo

What is Oboyu?

Oboyu (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.

Why Oboyu?

  • 🚀 Fast: Indexes thousands of documents in seconds, searches in milliseconds
  • 🎯 Accurate: Semantic search finds what you mean, not just what you type
  • 🇯🇵 Japanese Excellence: First-class support with automatic encoding detection
  • 🔒 Private: Everything runs locally - your documents never leave your machine
  • 🤖 AI-Ready: Built-in MCP server for Claude, Cursor, and other AI assistants

Quick Start

Prerequisites

  • Python 3.11 or higher
  • pip (latest version recommended)
  • Operating System: Linux, macOS, or Windows with WSL
  • For building from source:
    • C++ compiler (build-essential on Linux, Xcode on macOS)
    • CMake (for sentencepiece)

Installation

Get up and running in under 5 minutes:

# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search interactively
oboyu query --interactive

That's it! See our Documentation for complete guides and examples.

Key Features

🔍 Advanced Search Capabilities

  • Hybrid Search: Combines semantic understanding with keyword matching for best results
  • Multiple Modes: Switch between semantic, keyword, or hybrid search modes
  • Smart Reranking: Built-in AI reranker improves result accuracy
  • Interactive Mode: Real-time search with command history and auto-suggestions

📚 Document Support

  • Text File Support: Plain text (.txt), Markdown (.md), HTML (.html), and source code files (.py, .java, etc.) with automatic encoding detection
  • Incremental Indexing: Only process new or changed files for lightning-fast updates
  • Smart Chunking: Intelligent document splitting for optimal search results
  • Automatic Encoding: Handles various text encodings seamlessly (UTF-8, Shift-JIS, EUC-JP, and more)

🇯🇵 Japanese Language Excellence

  • Native Support: Purpose-built for Japanese text processing
  • Automatic Detection: Detects and handles Shift-JIS, EUC-JP, and UTF-8
  • Specialized Models: Optimized embedding models for Japanese content
  • Mixed Language: Seamlessly handles Japanese and English in the same document

🚀 Performance & Integration

  • ONNX Acceleration: 2-4x faster with automatic model optimization
  • MCP Server: Direct integration with Claude Desktop and AI coding assistants
  • Rich CLI: Beautiful terminal interface with progress tracking
  • Low Memory: Efficient processing even on modest hardware

Installation

Using UV (Recommended)

uv tool install oboyu

Using pip

pip install oboyu

From Source

git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .

System Requirements

  • Python: 3.13 or higher
  • OS: macOS, Linux (Windows via WSL)
  • Memory: 2GB RAM minimum
  • Storage: 1GB for models and index

Note: Models are automatically downloaded on first use (~90MB).

Usage Examples

Basic Usage

# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu query "machine learning optimization techniques"

# Interactive mode (recommended!)
oboyu query --interactive

Advanced Examples

# Index only specific file types
oboyu index ~/projects --include "*.md,*.txt"

# Search with filters
oboyu query "API design" --filter "docs/"

# Use semantic search mode
oboyu query "concepts similar to dependency injection" --mode semantic

# Enable reranking for better accuracy
oboyu query "complex technical topic" --rerank

MCP Server for AI Assistants

# Start MCP server
oboyu mcp

# Or configure in Claude Desktop's settings

See our MCP Integration Guide for detailed setup instructions.

Documentation

🚀 Getting Started

💼 Real-world Usage

⚙️ Configuration & Optimization

🔗 Integration & Reference

📖 View Full Documentation →

Common Use Cases

📚 Academic Research

Index and search through research notes and references:

oboyu index ~/research --include "*.md,*.txt"
oboyu query "transformer architecture improvements"

💻 Code Documentation

Search through project documentation and code comments:

oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu query "authentication implementation"

📝 Personal Knowledge Base

Organize and search your notes and documents:

oboyu index ~/Documents/notes
oboyu query "meeting notes from last week"

🌏 Multilingual Documents

Perfect for mixed Japanese and English content:

oboyu index ~/Documents/bilingual
oboyu query "プロジェクト管理 best practices"

Testing

Unit and Integration Tests

# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src

E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

# Run all E2E display tests
python e2e/run_tests.py

# Run specific test category
python e2e/run_tests.py --test search

See our Full Documentation for more details.

Contributing

We welcome contributions! See our Contributing Guidelines for details.

# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"

Support

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
  • Built with ❤️ for the Japanese NLP community
  • Inspired by the goal of making knowledge accessible across languages

Made with 🇯🇵 by sonesuke

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oboyu-0.1.0a2.tar.gz (226.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oboyu-0.1.0a2-py3-none-any.whl (335.0 kB view details)

Uploaded Python 3

File details

Details for the file oboyu-0.1.0a2.tar.gz.

File metadata

  • Download URL: oboyu-0.1.0a2.tar.gz
  • Upload date:
  • Size: 226.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 0cc8c63d61c871faba13aea89fd0a4abf0c744a4dcdf6921e6a4d7044a66ed52
MD5 a29203dbfbe23cbae1f58a1e64a42102
BLAKE2b-256 69b94b8f419b1e7da6bcb00f01d13808b7f02ef3d9d37c6424c3625f86cd391c

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a2.tar.gz:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oboyu-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: oboyu-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 335.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 f21342b3e573b4e75dd88980a9b575db5106167c4c958b75dd3249a9608d746e
MD5 558af575ebe3cf024a2a1e19705312e7
BLAKE2b-256 c55a17515a2e31f2f4b871c55b65b8ec0040f960731669537c3bd16db76ecd00

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a2-py3-none-any.whl:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page