Skip to main content

A Japanese-enhanced semantic search system for your local documents.

Project description

Oboyu (覚ゆ)

License: MIT Python Version PyPI Version

Lightning-fast semantic search for your local documents with best-in-class Japanese support.

demo

What is Oboyu?

Oboyu (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.

Why Oboyu?

  • 🚀 Fast: Indexes thousands of documents in seconds, searches in milliseconds
  • 🎯 Accurate: Semantic search finds what you mean, not just what you type
  • 🇯🇵 Japanese Excellence: First-class support with automatic encoding detection
  • 🔒 Private: Everything runs locally - your documents never leave your machine
  • 🤖 AI-Ready: Built-in MCP server for Claude, Cursor, and other AI assistants

Quick Start

Prerequisites

  • Python 3.13 or higher (3.11+ supported)
  • pip (latest version recommended)
  • Operating System: Linux, macOS, or Windows with WSL

System Dependencies (for building from source)

Linux (Ubuntu/Debian):

sudo apt-get install -y \
    git \
    curl \
    build-essential \
    cmake \
    pkg-config \
    libfreetype6-dev \
    libfontconfig1-dev \
    libjpeg-dev \
    libpng-dev \
    zlib1g-dev \
    libssl-dev

Linux (CentOS/RHEL):

sudo yum install -y \
    git \
    curl \
    gcc-c++ \
    cmake \
    pkg-config \
    freetype-devel \
    fontconfig-devel \
    libjpeg-devel \
    libpng-devel \
    zlib-devel \
    openssl-devel

macOS:

# Install Xcode Command Line Tools
xcode-select --install

# Install additional dependencies via Homebrew
brew install cmake pkg-config

Installation

Get up and running in under 5 minutes:

# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search your documents
oboyu search "your search term"

That's it! See our Documentation for complete guides and examples.

Key Features

🔍 Advanced Search Capabilities

  • Hybrid Search: Combines semantic understanding with keyword matching for best results
  • Multiple Modes: Switch between semantic, keyword, or hybrid search modes
  • Smart Reranking: Built-in AI reranker improves result accuracy
  • Flexible Querying: Command-line search with various output formats

📚 Document Support

  • Rich Format Support: PDF documents, plain text (.txt), Markdown (.md), HTML (.html), and source code files (.py, .java, etc.)
  • PDF Processing: Full text extraction with metadata preservation from PDF documents
  • Incremental Indexing: Only process new or changed files for lightning-fast updates
  • Smart Chunking: Intelligent document splitting for optimal search results
  • Automatic Encoding: Handles various text encodings seamlessly (UTF-8, Shift-JIS, EUC-JP, and more)

🇯🇵 Japanese Language Excellence

  • Native Support: Purpose-built for Japanese text processing
  • Automatic Detection: Detects and handles Shift-JIS, EUC-JP, and UTF-8
  • Specialized Models: Optimized embedding models for Japanese content
  • Mixed Language: Seamlessly handles Japanese and English in the same document

🚀 Performance & Integration

  • ONNX Acceleration: 2-4x faster with automatic model optimization
  • MCP Server: Direct integration with Claude Desktop and AI coding assistants
  • Rich CLI: Beautiful terminal interface with progress tracking
  • Low Memory: Efficient processing even on modest hardware

Installation

Using UV (Recommended)

uv tool install oboyu

Using pip

pip install oboyu

From Source

git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .

System Requirements

  • Python: 3.13 or higher (3.11+ supported)
  • OS: macOS, Linux (Windows via WSL)
  • Memory: 2GB RAM minimum (4GB recommended)
  • Storage: 1GB for models and index
  • Build Tools: See system dependencies above if building from source

Note: Models are automatically downloaded on first use (~90MB). For installation from PyPI, most system dependencies are not required as we provide pre-built wheels.

Usage Examples

Basic Usage

# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu search "machine learning optimization techniques"

# Get results in JSON format for processing
oboyu search "machine learning" --format json

Advanced Examples

# Index only specific file types
oboyu index ~/projects --include-patterns "*.md,*.txt"

# Search with different modes
oboyu search "API design" --mode vector

# Use semantic search mode
oboyu search "concepts similar to dependency injection" --mode semantic

# Enable reranking for better accuracy
oboyu search "complex technical topic" --rerank

MCP Server for AI Assistants

# Start MCP server
oboyu mcp

# Or configure in Claude Desktop's settings

See our MCP Integration Guide for detailed setup instructions.

Documentation

🚀 Getting Started

💼 Real-world Usage

⚙️ Configuration & Optimization

🔗 Integration & Reference

📖 View Full Documentation →

Common Use Cases

📚 Academic Research

Index and search through research notes and references:

oboyu index ~/research --include "*.md,*.txt"
oboyu search "transformer architecture improvements"

💻 Code Documentation

Search through project documentation and code comments:

oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu search "authentication implementation"

📝 Personal Knowledge Base

Organize and search your notes and documents:

oboyu index ~/Documents/notes
oboyu search "meeting notes from last week"

🌏 Multilingual Documents

Perfect for mixed Japanese and English content:

oboyu index ~/Documents/bilingual
oboyu search "プロジェクト管理 best practices"

Testing

Unit and Integration Tests

# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src

E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

# Run all E2E display tests
python e2e/run_tests.py

# Run specific test category
python e2e/run_tests.py --test search

See our Full Documentation for more details.

Contributing

We welcome contributions! See our Contributing Guidelines for details.

# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"

Support

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
  • Built with ❤️ for the Japanese NLP community
  • Inspired by the goal of making knowledge accessible across languages

Made with 🇯🇵 by sonesuke

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oboyu-0.1.0a4.tar.gz (270.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oboyu-0.1.0a4-py3-none-any.whl (396.5 kB view details)

Uploaded Python 3

File details

Details for the file oboyu-0.1.0a4.tar.gz.

File metadata

  • Download URL: oboyu-0.1.0a4.tar.gz
  • Upload date:
  • Size: 270.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 94586e6ed834a33b700adaad840ada3bf0aeafb96a145fcabc4bfa16c76344a1
MD5 5154017cbcc632f88de88a1120bd7b28
BLAKE2b-256 129d4b62881915df4655dfe9e71f96b84b4e64d563f08dfe36e2ca39f898ab54

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a4.tar.gz:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oboyu-0.1.0a4-py3-none-any.whl.

File metadata

  • Download URL: oboyu-0.1.0a4-py3-none-any.whl
  • Upload date:
  • Size: 396.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for oboyu-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 0155463d1f827ac14587f9d9e08eb21f3479ec89ce81212ad282f287150de085
MD5 b1a13923e0a6a7d75ebf7c1fe3ccb8ae
BLAKE2b-256 bd2084a2132875d67688a14eb0813281e151189e1fc1cd6b337b24a02d42597f

See more details on using hashes here.

Provenance

The following attestation bundles were made for oboyu-0.1.0a4-py3-none-any.whl:

Publisher: prerelease.yml on sonesuke/oboyu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page