Skip to main content

A Japanese-enhanced semantic search system for your local documents.

Project description

Oboyu (覚ゆ)

License: MIT Python Version PyPI Version

Lightning-fast semantic search for your local documents with best-in-class Japanese support.

demo

What is Oboyu?

Oboyu (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.

Why Oboyu?

  • 🚀 Fast: Indexes thousands of documents in seconds, searches in milliseconds
  • 🎯 Accurate: Semantic search finds what you mean, not just what you type
  • 🇯🇵 Japanese Excellence: First-class support with automatic encoding detection
  • 🔒 Private: Everything runs locally - your documents never leave your machine
  • 🤖 AI-Ready: Built-in MCP server for Claude, Cursor, and other AI assistants

Quick Start

Prerequisites

  • Python 3.11 or higher
  • pip (latest version recommended)
  • Operating System: Linux, macOS, or Windows with WSL
  • For building from source:
    • C++ compiler (build-essential on Linux, Xcode on macOS)
    • CMake (for sentencepiece)

Installation

Get up and running in under 5 minutes:

# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search interactively
oboyu query --interactive

That's it! See our Documentation for complete guides and examples.

Key Features

🔍 Advanced Search Capabilities

  • Hybrid Search: Combines semantic understanding with keyword matching for best results
  • Multiple Modes: Switch between semantic, keyword, or hybrid search modes
  • Smart Reranking: Built-in AI reranker improves result accuracy
  • Interactive Mode: Real-time search with command history and auto-suggestions

📚 Document Support

  • Rich Format Support: PDF documents, plain text (.txt), Markdown (.md), HTML (.html), and source code files (.py, .java, etc.)
  • PDF Processing: Full text extraction with metadata preservation from PDF documents
  • Incremental Indexing: Only process new or changed files for lightning-fast updates
  • Smart Chunking: Intelligent document splitting for optimal search results
  • Automatic Encoding: Handles various text encodings seamlessly (UTF-8, Shift-JIS, EUC-JP, and more)

🇯🇵 Japanese Language Excellence

  • Native Support: Purpose-built for Japanese text processing
  • Automatic Detection: Detects and handles Shift-JIS, EUC-JP, and UTF-8
  • Specialized Models: Optimized embedding models for Japanese content
  • Mixed Language: Seamlessly handles Japanese and English in the same document

🚀 Performance & Integration

  • ONNX Acceleration: 2-4x faster with automatic model optimization
  • MCP Server: Direct integration with Claude Desktop and AI coding assistants
  • Rich CLI: Beautiful terminal interface with progress tracking
  • Low Memory: Efficient processing even on modest hardware

Installation

Using UV (Recommended)

uv tool install oboyu

Using pip

pip install oboyu

From Source

git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .

System Requirements

  • Python: 3.13 or higher
  • OS: macOS, Linux (Windows via WSL)
  • Memory: 2GB RAM minimum
  • Storage: 1GB for models and index

Note: Models are automatically downloaded on first use (~90MB).

Usage Examples

Basic Usage

# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu query "machine learning optimization techniques"

# Interactive mode (recommended!)
oboyu query --interactive

Advanced Examples

# Index only specific file types
oboyu index ~/projects --include "*.md,*.txt"

# Search with filters
oboyu query "API design" --filter "docs/"

# Use semantic search mode
oboyu query "concepts similar to dependency injection" --mode semantic

# Enable reranking for better accuracy
oboyu query "complex technical topic" --rerank

MCP Server for AI Assistants

# Start MCP server
oboyu mcp

# Or configure in Claude Desktop's settings

See our MCP Integration Guide for detailed setup instructions.

Documentation

🚀 Getting Started

💼 Real-world Usage

⚙️ Configuration & Optimization

🔗 Integration & Reference

📖 View Full Documentation →

Common Use Cases

📚 Academic Research

Index and search through research notes and references:

oboyu index ~/research --include "*.md,*.txt"
oboyu query "transformer architecture improvements"

💻 Code Documentation

Search through project documentation and code comments:

oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu query "authentication implementation"

📝 Personal Knowledge Base

Organize and search your notes and documents:

oboyu index ~/Documents/notes
oboyu query "meeting notes from last week"

🌏 Multilingual Documents

Perfect for mixed Japanese and English content:

oboyu index ~/Documents/bilingual
oboyu query "プロジェクト管理 best practices"

Testing

Unit and Integration Tests

# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src

E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

# Run all E2E display tests
python e2e/run_tests.py

# Run specific test category
python e2e/run_tests.py --test search

See our Full Documentation for more details.

Contributing

We welcome contributions! See our Contributing Guidelines for details.

# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"

Support

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
  • Built with ❤️ for the Japanese NLP community
  • Inspired by the goal of making knowledge accessible across languages

Made with 🇯🇵 by sonesuke

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oboyu-0.1.0a3.tar.gz (231.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oboyu-0.1.0a3-py3-none-any.whl (341.1 kB view details)

Uploaded Python 3

File details

Details for the file oboyu-0.1.0a3.tar.gz.

File metadata

  • Download URL: oboyu-0.1.0a3.tar.gz
  • Upload date:
  • Size: 231.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 5654c50265349d8c72909d7e1452cc7501f910ca720d4097b91602424f39a3c7
MD5 ed719bb179e4035207b0fd2cef932aad
BLAKE2b-256 afde45b279ce87c6fe5fc5e885157bba9b6875f3c2be4bf5cea194b20e76427b

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a3.tar.gz:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oboyu-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: oboyu-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 341.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 d5ec11e843410b9a2755058adf87f72c5c8ba9754a281008ab8be0caa7658613
MD5 12e3295324c18761f8b2cff5f1d3ac4c
BLAKE2b-256 f8a575dbace23d4d04820f23a964465f7eacecc024792957939ee4ece68bcc96

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a3-py3-none-any.whl:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page