Smart full-text and semantic search for your local documents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

ownsearch

Smart local search with full-text search (SQLite FTS5) and semantic search (embeddings via ollama). Zero external dependencies — Python stdlib only.

Installation

pipx install /path/to/ownsearch
# or from the project directory:
pipx install .

Initial setup

# Configure ollama (if not running on localhost:11434)
ownsearch config set ollama_url http://your-ollama-host:11434

# Configure embedding model (default: bge-m3)
ownsearch config set embed_model bge-m3

# Configure database path (default: ~/.ownsearch.db)
ownsearch config set db_path /custom/path.db

# Add directories to index
ownsearch add-dir ~/Documents/notes
ownsearch add-dir ~/workspace/project

# Show current configuration
ownsearch config show

Configuration is stored in ~/.config/ownsearch/config.json.

Usage

# Index (incremental — only new/modified/deleted files)
ownsearch index

# Force full re-index
ownsearch index --full

# Full-text search (fast, literal)
ownsearch search "kubernetes cilium"

# Semantic search (finds related content even with different wording)
ownsearch search --semantic "network security"

# Combined search (FTS + semantic, deduplicated)
ownsearch search --both "migration strategy"

# Filter results by directory
ownsearch search --dir ~/workspace/project "deploy"

# JSON output (for integration with other tools/agents)
ownsearch search --json "query"

# Limit results
ownsearch search --limit 5 "query"

# Show status
ownsearch status

Directory management

ownsearch add-dir PATH      # Add a directory to the index
ownsearch remove-dir PATH   # Remove a directory and its data from the index
ownsearch list-dirs         # List indexed directories

Smart behavior

Auto-pull models: If ollama is reachable but the embedding model is missing, it pulls it automatically during indexing.
Incremental indexing: By default, only processes files whose mtime/size changed since the last run. Deleted files are cleaned up automatically.
Graceful degradation: If ollama is unavailable, FTS5 search still works (semantic search is skipped).
Smart chunking: Splits by markdown headings. Large files are partitioned into ~4000 char chunks while preserving heading context.
Retry with backoff: Embedding requests retry on failure with exponential backoff to handle transient server issues.

Supported file types

Default: .md, .txt, .org, .rst

Configurable in ~/.config/ownsearch/config.json (extensions field).

Requirements

Python >= 3.10 (stdlib only, no external packages)
ollama (optional, for semantic search)

Why bge-m3?

The default embedding model is bge-m3 (~1.2GB). It was chosen after benchmarking against nomic-embed-text, mxbai-embed-large, and snowflake-arctic-embed2 on a real multilingual corpus (Spanish/English mixed documents). Results:

nomic-embed-text: Essentially useless for non-English content — returned random results for Spanish queries.
mxbai-embed-large: Good scores but introduced noise on technical queries (e.g., kubernetes results mixed with unrelated content).
snowflake-arctic-embed2: Precise results but lower overall scores.
bge-m3: Best balance — top results were consistently correct for both Spanish and English queries, with clean ranking and no noise.

You can change the model with ownsearch config set embed_model <model>. Embeddings are automatically invalidated and regenerated on the next index run when the model changes.

Using ownsearch from AI coding agents (skills)

ownsearch is the retrieval half of a RAG: instead of building a separate vector-DB stack, you expose this CLI to your coding agent as a skill so it knows to search your indexed docs (instead of grepping blindly) and how to call it. The --json output is designed exactly for this.

Claude Code, opencode, and Pi all support the Agent Skills standard: a SKILL.md Markdown file with name + description frontmatter. The same skill works in all three — only the install location and invocation differ.

The skill file

Create ownsearch/SKILL.md:

---
name: ownsearch
description: Search the user's locally indexed documentation with hybrid full-text + semantic search. Use this BEFORE grepping or guessing when a question is likely answered in the indexed docs — how something is deployed, configured or operated, infra details, runbooks, past decisions.
---

# ownsearch — local hybrid documentation search

`ownsearch` (already in PATH) searches the user's indexed docs with FTS5 (lexical)
+ semantic embeddings. Reach for it when an answer probably lives in the corpus.

## How to search

Prefer hybrid search with JSON output so you can parse hits programmatically:

    ownsearch search --json --both "your query here"

- `--both`     combine lexical + semantic, deduplicated (best default)
- `--semantic` semantic only (related content with different wording)
- (no flag)    fast literal FTS5 only
- `--dir PATH` scope to one indexed directory
- `--limit N`  cap results
- `--json`     machine-readable hits (file path + chunk); always use from a tool flow

Each JSON hit gives the source file path and the matching chunk. Open the file to
get full context before answering — this is retrieval only; reason over the results
yourself, don't treat a single chunk as the whole answer.

## Keeping the index fresh

If results look stale or a recently edited doc is missing:

    ownsearch index     # incremental
    ownsearch status    # DB size, indexed dirs, chunk/embedding counts, ollama health

## Discover what's indexed

    ownsearch list-dirs

Where to put it, per agent

Agent	Location (user-level)	Project-level	Invocation
Claude Code	`~/.claude/skills/ownsearch/SKILL.md`	`.claude/skills/ownsearch/SKILL.md`	auto-discovered; or `/ownsearch`
opencode	`~/.config/opencode/skills/ownsearch/SKILL.md`	`.opencode/skills/ownsearch/SKILL.md`	auto-discovered
Pi	`~/.pi/agent/skills/ownsearch/SKILL.md`	—	`/skill:ownsearch`, or auto-discovered

Claude Code also accepts a flat ~/.claude/skills/ownsearch.md (no subdirectory). The ownsearch/SKILL.md directory form is the portable one that works across all three agents.

To avoid permission prompts on every call, allowlist the read-only commands in your agent's settings — e.g. for Claude Code add Bash(ownsearch search:*) and Bash(ownsearch status:*) to permissions.allow.

opencode/Pi alternative: a slash command

If you prefer an explicit command over an auto-discovered skill, both opencode (~/.config/opencode/commands/ownsearch.md) and Claude Code support command-style Markdown where the filename becomes /ownsearch. A skill is usually better here because the agent invokes it on its own when a question matches the description.

Troubleshooting

`HTTP Error 500` / some chunks never get embeddings

A 500 during ownsearch index usually comes from the ollama embedding server, not ownsearch. Two distinct causes:

Transient (server busy, model briefly evicted from VRAM, OOM): ownsearch retries with backoff, and any file whose embeddings failed is automatically re-indexed on the next ownsearch index run (it is not marked as up-to-date).
Permanent / content-specific: some embedding models (notably bge-m3 under ollama) emit NaN for certain token sequences, and ollama then returns failed to encode response: json: unsupported value: NaN (HTTP 500). Retrying never helps, so ownsearch skips just that chunk (logged as "Skipping unembeddable chunk") and leaves it FTS-searchable but not semantic. The rest of the file is unaffected.

To find chunks that are missing an embedding (excluding short ones, which are skipped by design): they stay searchable via plain FTS5, so this is rarely worth chasing. If a specific important chunk is affected, lightly rewording it (e.g. punctuation) usually sidesteps the model's NaN.

License

This project is licensed under the GNU General Public License v3.0 — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

millaguie

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ownsearch-0.1.0.tar.gz (26.4 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ownsearch-0.1.0-py3-none-any.whl (26.8 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file ownsearch-0.1.0.tar.gz.

File metadata

Download URL: ownsearch-0.1.0.tar.gz
Upload date: Jun 26, 2026
Size: 26.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ownsearch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`73f83fd11768656af0dab7b2cab0d420f7c26afccc45638508474c4a10c7b10c`
MD5	`b24c5fd6fce5a100ee60e082959788ec`
BLAKE2b-256	`9edb8abf6722d6e99e2850abcb24d4c99e3cf70ccce8cdbbfb81e1721da43472`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ownsearch-0.1.0.tar.gz:

Publisher: publish.yml on millaguie/ownsearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ownsearch-0.1.0.tar.gz
- Subject digest: 73f83fd11768656af0dab7b2cab0d420f7c26afccc45638508474c4a10c7b10c
- Sigstore transparency entry: 1968759535
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: millaguie/ownsearch@456a4b5d86f6374bda684f96af7469b4c6947926
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/millaguie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@456a4b5d86f6374bda684f96af7469b4c6947926
- Trigger Event: release

File details

Details for the file ownsearch-0.1.0-py3-none-any.whl.

File metadata

Download URL: ownsearch-0.1.0-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ownsearch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`640a9ece4eaf8d6d24dbda43f3990b0c328f51b6fd44fc44c091a3f12794cd7e`
MD5	`97ff10efd9193223396444c9853b1bfa`
BLAKE2b-256	`f52e2719b943cc89f8c5ad7399dc8965f03ac44ff8d3681c358af5912d5bd712`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ownsearch-0.1.0-py3-none-any.whl:

Publisher: publish.yml on millaguie/ownsearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ownsearch-0.1.0-py3-none-any.whl
- Subject digest: 640a9ece4eaf8d6d24dbda43f3990b0c328f51b6fd44fc44c091a3f12794cd7e
- Sigstore transparency entry: 1968759612
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: millaguie/ownsearch@456a4b5d86f6374bda684f96af7469b4c6947926
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/millaguie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@456a4b5d86f6374bda684f96af7469b4c6947926
- Trigger Event: release

ownsearch 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ownsearch

Installation

Initial setup

Usage

Directory management

Smart behavior

Supported file types

Requirements

Why bge-m3?

Using ownsearch from AI coding agents (skills)

The skill file

Where to put it, per agent

opencode/Pi alternative: a slash command

Troubleshooting

HTTP Error 500 / some chunks never get embeddings

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`HTTP Error 500` / some chunks never get embeddings