Semantic grep for codebases - local-first, SQLite-backed, with local or cloud embeddings
Project description
ogrep
Semantic grep for codebases — local-first, SQLite-backed, and built for Claude Code.
ogrep lets you search your codebase by meaning, not just keywords.
It builds a tiny local index (.ogrep/index.sqlite by default) and uses embeddings to answer questions like:
- "where is authentication handled?"
- "how are API errors mapped to exceptions?"
- "where do we open DB connections and run queries?"
Embedding Providers
Choose your embedding source:
| Provider | Cost | Privacy | Setup |
|---|---|---|---|
| OpenAI API | $0.02/M tokens | Cloud | Just add OPENAI_API_KEY |
| LM Studio (local) | Free | 100% local | Run lms server start |
# OpenAI (cloud)
export OPENAI_API_KEY="sk-..."
ogrep index . -m small
# LM Studio (local, free, offline)
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index . -m nomic
Both work identically — same CLI, same index format, same queries.
Why ogrep?
Local-first & simple
- Index lives in one SQLite file (per repo, or per profile)
- Designed to be fast to start and easy to reset
- No external services required (with local models)
Built for real dev workflows
- Smart embedding reuse: unchanged files skipped; only changed chunks re-embedded
- Source-only defaults: reduces noise, avoids indexing junk
- Auto-tuning: finds optimal chunk size for your codebase
Two ways to use it
| Method | Best For |
|---|---|
CLI (pip/pipx) |
Terminal users, CI/CD, scripts |
| Claude Code Plugin | If you live in Claude Code (recommended) |
Note: This repo is primarily a Claude Code Skill + Marketplace plugin integration — not an MCP server. If you want MCP for other clients, see Optional Extras.
Installation
Option A: pip / pipx (CLI users)
# Install with pipx (isolated environment)
pipx install ogrep
# Or with pip
pip install ogrep
Option B: Claude Code Marketplace + Plugin
# Add the marketplace
/plugin marketplace add gplv2/ogrep-marketplace
# Install the plugin
/plugin install ogrep@ogrep-marketplace
Optional Extras
pip install "ogrep[speed]" # Faster scoring with numpy
pip install "ogrep[mcp]" # MCP server support
Quick Start
With OpenAI
export OPENAI_API_KEY="sk-..."
ogrep index . # Index current directory
ogrep query "where is auth handled?" -n 10 # Semantic search
ogrep status # Check index stats
With LM Studio (Local, Free)
# 1. Install LM Studio from https://lmstudio.ai
# 2. Download and load a model
lms get nomic-embed-text-v1.5 -y
lms load nomic-ai/nomic-embed-text-v1.5-GGUF -y
lms server start
# 3. Point ogrep to local server
export OGREP_BASE_URL=http://localhost:1234/v1
# 4. Index and query
ogrep index . -m nomic
ogrep query "database connection handling" -m nomic
See LOCAL_EMBEDDINGS_GUIDE.md for detailed setup and tuning.
CLI Commands
| Command | Description |
|---|---|
ogrep index . |
Index current directory |
ogrep query "text" -n 10 |
Semantic search |
ogrep status |
Show index statistics |
ogrep reset -f |
Delete index |
ogrep reindex . |
Rebuild from scratch |
ogrep clean --vacuum |
Remove stale entries |
ogrep models |
List available embedding models |
ogrep tune . |
Auto-tune chunk size for your codebase |
ogrep benchmark . |
Compare all models (accuracy, speed, settings) |
Embedding Models
OpenAI Models (Cloud)
| Model | Alias | Dimensions | Price | Best For |
|---|---|---|---|---|
| text-embedding-3-small | small |
1536 | $0.02/M | Most use cases (default) |
| text-embedding-3-large | large |
3072 | $0.13/M | High-accuracy, multi-language |
| text-embedding-ada-002 | ada |
1536 | $0.10/M | Legacy compatibility |
Local Models (via LM Studio)
| Model | Alias | Dimensions | Optimal Chunks | Accuracy | Notes |
|---|---|---|---|---|---|
| all-MiniLM-L6-v2 | minilm |
384 | 30 lines | 96% | Best accuracy, smallest (~25MB) |
| nomic-embed-text-v1.5 | nomic |
768 | 90 lines | 72% | Larger context windows |
| bge-base-en-v1.5 | bge |
768 | 30 lines | 52% | Fallback option |
| bge-m3 | bge-m3 |
1024 | 60 lines | TBD | Multi-lingual (100+ languages) |
# Use model alias (minilm auto-selected when OGREP_BASE_URL is set)
ogrep index . -m minilm
# Or set environment for persistent config
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index . # Auto-uses minilm
Important: Query model must match index model. Use
ogrep statusto check.
Smart Defaults
ogrep is optimized for source code search out of the box.
Source-Only Indexing
By default, ogrep indexes only source files and excludes:
| Category | Examples |
|---|---|
| Docs | *.md, *.txt, *.rst, docs/* |
| Config | *.json, *.yaml, *.toml, .editorconfig |
| Secrets | .env, secrets.*, credentials.* |
| Build | dist/*, build/*, *.min.js |
| Binary | Images, fonts, media, archives, databases |
| Lock files | package-lock.json, yarn.lock, poetry.lock |
Skipped directories: .git/, node_modules/, .venv/, __pycache__/, .ogrep/
Smart Embedding Reuse
ogrep minimizes API costs with intelligent incremental indexing:
$ ogrep index .
Indexed into .ogrep/index.sqlite
Files: 3 indexed, 42 skipped
Chunks: 12 total (9 reused, ~900 tokens saved)
| Edit Pattern | Without Reuse | With Reuse | Savings |
|---|---|---|---|
| Edit 1 line in 300-line file | 5 embeds | 1 embed | 80% |
| Append function to file | 5 embeds | 1 embed | 80% |
| No changes | 5 embeds | 0 embeds | 100% |
Auto-Tuning
Different models and codebases have different optimal chunk sizes. Find yours:
ogrep tune . -m nomic
Testing chunk size 30... accuracy=0.32 (2/5 hits)
Testing chunk size 45... accuracy=0.56 (4/5 hits)
Testing chunk size 60... accuracy=0.36 (3/5 hits)
Testing chunk size 90... accuracy=0.72 (5/5 hits) <-- OPTIMAL
Testing chunk size 120... accuracy=0.68 (5/5 hits)
Recommended chunk size: 90 lines
Save & Apply Tuning Results
# Just save for later (writes to .env)
ogrep tune . -m nomic --save
# Reindex immediately with optimal settings
ogrep tune . -m nomic --apply
# Both: save AND reindex
ogrep tune . -m nomic --save --apply
The OGREP_CHUNK_LINES environment variable persists your tuned value.
Model Benchmarking
Compare all available models to find the best one for your codebase:
ogrep benchmark . -s 10
RESULTS BY MODEL
--------------------------------------------------------------------------------
Model Dims Chunk/Overlap Accuracy Index Query
--------------------------------------------------------------------------------
minilm 384 30 / 5 0.96 0.89s 0.01s *
nomic 768 90 / 15 0.72 1.87s 0.01s
bge 768 30 / 10 0.52 1.65s 0.01s
large 3072 30 / 15 0.52 3.12s 0.03s
small 1536 45 / 15 0.48 2.34s 0.02s
--------------------------------------------------------------------------------
RECOMMENDATIONS
================================================================================
* BEST OVERALL: minilm
Accuracy: 96% | Speed: 0.89s | Cost: FREE
Optimal: 30-line chunks, 5-line overlap
* BEST CLOUD: large
Accuracy: 52% | Speed: 3.12s | Cost: $0.13/M tokens
Benchmark Options
ogrep benchmark . --local-only # Only test local models
ogrep benchmark . --cloud-only # Only test OpenAI models
ogrep benchmark . --save # Save optimal settings to .env
ogrep benchmark . --json # Output as JSON
ogrep benchmark . -v # Verbose per-configuration results
File Filtering
Include Normally-Excluded Files
# Include markdown files
ogrep index . -i '*.md'
# Include multiple patterns
ogrep index . -i '*.md' -i '*.json'
Add Extra Exclusions
# Exclude test files
ogrep index . -e 'test_*' -e '*_test.py'
# Exclude specific directories
ogrep index . -e 'fixtures/*' -e 'mocks/*'
Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key (required for cloud) | — |
OGREP_BASE_URL |
Local server URL (e.g., LM Studio) | — |
OGREP_MODEL |
Default embedding model | Smart default* |
OGREP_CHUNK_LINES |
Tuned chunk size | Model default |
OGREP_DIMENSIONS |
Embedding dimensions | Model default |
Smart Model Default:
- If
OGREP_BASE_URLis set → defaults tominilm(local) - Otherwise → defaults to
text-embedding-3-small(OpenAI)
This means you can just set OGREP_BASE_URL and ogrep will automatically use the best local model.
Multi-Repo Scope Management
Prevent cross-repo pollution:
| Flag | Description |
|---|---|
--db PATH |
Custom database path |
--profile NAME |
Named profile (.ogrep/<name>/index.sqlite) |
--global-cache |
Use ~/.cache/ogrep/<hash>/index.sqlite |
--repo-root PATH |
Explicit repo root |
Example Queries
# Find implementations
ogrep query "where is user authentication handled?" -n 10
# Find error handling
ogrep query "how are API errors handled?" -n 15
# Find database operations
ogrep query "database connection and queries" -n 10
# Find specific patterns
ogrep query "recursive file scanning" -n 5
Documentation
- LOCAL_EMBEDDINGS_GUIDE.md — Local model setup, tuning, and troubleshooting
- QUICKSTART.md — Quick start guide
- CLAUDE.md — Developer guide for Claude Code
Development
git clone https://github.com/gplv2/ogrep-marketplace.git
cd ogrep-marketplace
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
make test # Run tests (151 tests)
make lint # Run linters
make check # All checks
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ogrep-0.4.2.tar.gz.
File metadata
- Download URL: ogrep-0.4.2.tar.gz
- Upload date:
- Size: 80.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
588787ca066649cf494e0d0a7d2a153689eaab4b2e80e3f171db1f99012df310
|
|
| MD5 |
0337e8b8f79f39de7a445ab9cbff0e9e
|
|
| BLAKE2b-256 |
97336a3708c5594c77db0feeb71df885d20a5f4ad3558039ffbedbcca821a0ae
|
File details
Details for the file ogrep-0.4.2-py3-none-any.whl.
File metadata
- Download URL: ogrep-0.4.2-py3-none-any.whl
- Upload date:
- Size: 43.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98d409ae576c6dab068f02236ef494ce8f7ddf7eddf9ba7fca72f0470f665ed4
|
|
| MD5 |
84b688f336a8d0bcf63dbd1e63ecfd4b
|
|
| BLAKE2b-256 |
b08170d7292e570a0f639179714ad5be88fc170c50f490b887ff1a4614068e4f
|