Smart full-text and semantic search for your local documents
Project description
ownsearch
Smart local search with full-text search (SQLite FTS5) and semantic search (embeddings via ollama). Zero external dependencies — Python stdlib only.
Installation
pipx install /path/to/ownsearch
# or from the project directory:
pipx install .
Initial setup
# Configure ollama (if not running on localhost:11434)
ownsearch config set ollama_url http://your-ollama-host:11434
# Configure embedding model (default: bge-m3)
ownsearch config set embed_model bge-m3
# Configure database path (default: ~/.ownsearch.db)
ownsearch config set db_path /custom/path.db
# Add directories to index
ownsearch add-dir ~/Documents/notes
ownsearch add-dir ~/workspace/project
# Show current configuration
ownsearch config show
Configuration is stored in ~/.config/ownsearch/config.json.
Usage
# Index (incremental — only new/modified/deleted files)
ownsearch index
# Force full re-index
ownsearch index --full
# Full-text search (fast, literal)
ownsearch search "kubernetes cilium"
# Semantic search (finds related content even with different wording)
ownsearch search --semantic "network security"
# Combined search (FTS + semantic, deduplicated)
ownsearch search --both "migration strategy"
# Filter results by directory
ownsearch search --dir ~/workspace/project "deploy"
# JSON output (for integration with other tools/agents)
ownsearch search --json "query"
# Limit results
ownsearch search --limit 5 "query"
# Show status
ownsearch status
Directory management
ownsearch add-dir PATH # Add a directory to the index
ownsearch remove-dir PATH # Remove a directory and its data from the index
ownsearch list-dirs # List indexed directories
Smart behavior
- Auto-pull models: If ollama is reachable but the embedding model is missing, it pulls it automatically during indexing.
- Incremental indexing: By default, only processes files whose mtime/size changed since the last run. Deleted files are cleaned up automatically.
- Graceful degradation: If ollama is unavailable, FTS5 search still works (semantic search is skipped).
- Smart chunking: Splits by markdown headings. Large files are partitioned into ~4000 char chunks while preserving heading context.
- Retry with backoff: Embedding requests retry on failure with exponential backoff to handle transient server issues.
Supported file types
Default: .md, .txt, .org, .rst
Configurable in ~/.config/ownsearch/config.json (extensions field).
Requirements
- Python >= 3.10 (stdlib only, no external packages)
- ollama (optional, for semantic search)
Why bge-m3?
The default embedding model is bge-m3 (~1.2GB). It was chosen after benchmarking against nomic-embed-text, mxbai-embed-large, and snowflake-arctic-embed2 on a real multilingual corpus (Spanish/English mixed documents). Results:
- nomic-embed-text: Essentially useless for non-English content — returned random results for Spanish queries.
- mxbai-embed-large: Good scores but introduced noise on technical queries (e.g., kubernetes results mixed with unrelated content).
- snowflake-arctic-embed2: Precise results but lower overall scores.
- bge-m3: Best balance — top results were consistently correct for both Spanish and English queries, with clean ranking and no noise.
You can change the model with ownsearch config set embed_model <model>. Embeddings are automatically invalidated and regenerated on the next index run when the model changes.
Using ownsearch from AI coding agents (skills)
ownsearch is the retrieval half of a RAG: instead of building a separate vector-DB stack, you expose this CLI to your coding agent as a skill so it knows to search your indexed docs (instead of grepping blindly) and how to call it. The --json output is designed exactly for this.
Claude Code, opencode, and Pi all support the Agent Skills standard: a SKILL.md Markdown file with name + description frontmatter. The same skill works in all three — only the install location and invocation differ.
The skill file
Create ownsearch/SKILL.md:
---
name: ownsearch
description: Search the user's locally indexed documentation with hybrid full-text + semantic search. Use this BEFORE grepping or guessing when a question is likely answered in the indexed docs — how something is deployed, configured or operated, infra details, runbooks, past decisions.
---
# ownsearch — local hybrid documentation search
`ownsearch` (already in PATH) searches the user's indexed docs with FTS5 (lexical)
+ semantic embeddings. Reach for it when an answer probably lives in the corpus.
## How to search
Prefer hybrid search with JSON output so you can parse hits programmatically:
ownsearch search --json --both "your query here"
- `--both` combine lexical + semantic, deduplicated (best default)
- `--semantic` semantic only (related content with different wording)
- (no flag) fast literal FTS5 only
- `--dir PATH` scope to one indexed directory
- `--limit N` cap results
- `--json` machine-readable hits (file path + chunk); always use from a tool flow
Each JSON hit gives the source file path and the matching chunk. Open the file to
get full context before answering — this is retrieval only; reason over the results
yourself, don't treat a single chunk as the whole answer.
## Keeping the index fresh
If results look stale or a recently edited doc is missing:
ownsearch index # incremental
ownsearch status # DB size, indexed dirs, chunk/embedding counts, ollama health
## Discover what's indexed
ownsearch list-dirs
Where to put it, per agent
| Agent | Location (user-level) | Project-level | Invocation |
|---|---|---|---|
| Claude Code | ~/.claude/skills/ownsearch/SKILL.md |
.claude/skills/ownsearch/SKILL.md |
auto-discovered; or /ownsearch |
| opencode | ~/.config/opencode/skills/ownsearch/SKILL.md |
.opencode/skills/ownsearch/SKILL.md |
auto-discovered |
| Pi | ~/.pi/agent/skills/ownsearch/SKILL.md |
— | /skill:ownsearch, or auto-discovered |
Claude Code also accepts a flat
~/.claude/skills/ownsearch.md(no subdirectory). Theownsearch/SKILL.mddirectory form is the portable one that works across all three agents.
To avoid permission prompts on every call, allowlist the read-only commands in your
agent's settings — e.g. for Claude Code add Bash(ownsearch search:*) and
Bash(ownsearch status:*) to permissions.allow.
opencode/Pi alternative: a slash command
If you prefer an explicit command over an auto-discovered skill, both opencode
(~/.config/opencode/commands/ownsearch.md) and Claude Code support command-style
Markdown where the filename becomes /ownsearch. A skill is usually better here
because the agent invokes it on its own when a question matches the description.
Troubleshooting
HTTP Error 500 / some chunks never get embeddings
A 500 during ownsearch index usually comes from the ollama embedding server, not
ownsearch. Two distinct causes:
- Transient (server busy, model briefly evicted from VRAM, OOM): ownsearch retries
with backoff, and any file whose embeddings failed is automatically re-indexed on the
next
ownsearch indexrun (it is not marked as up-to-date). - Permanent / content-specific: some embedding models (notably
bge-m3under ollama) emitNaNfor certain token sequences, and ollama then returnsfailed to encode response: json: unsupported value: NaN(HTTP 500). Retrying never helps, so ownsearch skips just that chunk (logged as "Skipping unembeddable chunk") and leaves it FTS-searchable but not semantic. The rest of the file is unaffected.
To find chunks that are missing an embedding (excluding short ones, which are skipped by design): they stay searchable via plain FTS5, so this is rarely worth chasing. If a specific important chunk is affected, lightly rewording it (e.g. punctuation) usually sidesteps the model's NaN.
License
This project is licensed under the GNU General Public License v3.0 — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ownsearch-0.1.0.tar.gz.
File metadata
- Download URL: ownsearch-0.1.0.tar.gz
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73f83fd11768656af0dab7b2cab0d420f7c26afccc45638508474c4a10c7b10c
|
|
| MD5 |
b24c5fd6fce5a100ee60e082959788ec
|
|
| BLAKE2b-256 |
9edb8abf6722d6e99e2850abcb24d4c99e3cf70ccce8cdbbfb81e1721da43472
|
Provenance
The following attestation bundles were made for ownsearch-0.1.0.tar.gz:
Publisher:
publish.yml on millaguie/ownsearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ownsearch-0.1.0.tar.gz -
Subject digest:
73f83fd11768656af0dab7b2cab0d420f7c26afccc45638508474c4a10c7b10c - Sigstore transparency entry: 1968759535
- Sigstore integration time:
-
Permalink:
millaguie/ownsearch@456a4b5d86f6374bda684f96af7469b4c6947926 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/millaguie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@456a4b5d86f6374bda684f96af7469b4c6947926 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ownsearch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ownsearch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
640a9ece4eaf8d6d24dbda43f3990b0c328f51b6fd44fc44c091a3f12794cd7e
|
|
| MD5 |
97ff10efd9193223396444c9853b1bfa
|
|
| BLAKE2b-256 |
f52e2719b943cc89f8c5ad7399dc8965f03ac44ff8d3681c358af5912d5bd712
|
Provenance
The following attestation bundles were made for ownsearch-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on millaguie/ownsearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ownsearch-0.1.0-py3-none-any.whl -
Subject digest:
640a9ece4eaf8d6d24dbda43f3990b0c328f51b6fd44fc44c091a3f12794cd7e - Sigstore transparency entry: 1968759612
- Sigstore integration time:
-
Permalink:
millaguie/ownsearch@456a4b5d86f6374bda684f96af7469b4c6947926 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/millaguie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@456a4b5d86f6374bda684f96af7469b4c6947926 -
Trigger Event:
release
-
Statement type: