Skip to main content

Semantic grep for codebases - local-first, SQLite-backed, with local or cloud embeddings

Project description

ogrep

Semantic grep for codebases — local-first, SQLite-backed, and built for Claude Code.

ogrep lets you search your codebase by meaning, not just keywords.

It builds a tiny local index (.ogrep/index.sqlite by default) and uses embeddings to answer questions like:

  • "where is authentication handled?"
  • "how are API errors mapped to exceptions?"
  • "where do we open DB connections and run queries?"

Embedding Providers

Choose your embedding source:

Provider Cost Privacy Setup
OpenAI API $0.02/M tokens Cloud Just add OPENAI_API_KEY
LM Studio (local) Free 100% local Run lms server start
# OpenAI (cloud)
export OPENAI_API_KEY="sk-..."
ogrep index . -m small

# LM Studio (local, free, offline)
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index . -m nomic

Both work identically — same CLI, same index format, same queries.


Why ogrep?

Local-first & simple

  • Index lives in one SQLite file (per repo, or per profile)
  • Designed to be fast to start and easy to reset
  • No external services required (with local models)

Built for real dev workflows

  • Smart embedding reuse: unchanged files skipped; only changed chunks re-embedded
  • Source-only defaults: reduces noise, avoids indexing junk
  • Auto-tuning: finds optimal chunk size for your codebase

Two ways to use it

Method Best For
CLI (pip/pipx) Terminal users, CI/CD, scripts
Claude Code Plugin If you live in Claude Code (recommended)

Note: This repo is primarily a Claude Code Skill + Marketplace plugin integration — not an MCP server. If you want MCP for other clients, see Optional Extras.


Installation

Option A: pip / pipx (CLI users)

# Install with pipx (isolated environment)
pipx install ogrep

# Or with pip
pip install ogrep

Option B: Claude Code Marketplace + Plugin

# Add the marketplace
/plugin marketplace add gplv2/ogrep-marketplace

# Install the plugin
/plugin install ogrep@ogrep-marketplace

Optional Extras

pip install "ogrep[speed]"   # Faster scoring with numpy
pip install "ogrep[mcp]"     # MCP server support

Quick Start

With OpenAI

export OPENAI_API_KEY="sk-..."

ogrep index .                              # Index current directory
ogrep query "where is auth handled?" -n 10 # Semantic search
ogrep status                               # Check index stats

With LM Studio (Local, Free)

# 1. Install LM Studio from https://lmstudio.ai
# 2. Download and load a model
lms get nomic-embed-text-v1.5 -y
lms load nomic-ai/nomic-embed-text-v1.5-GGUF -y
lms server start

# 3. Point ogrep to local server
export OGREP_BASE_URL=http://localhost:1234/v1

# 4. Index and query
ogrep index . -m nomic
ogrep query "database connection handling" -m nomic

See LOCAL_EMBEDDINGS_GUIDE.md for detailed setup and tuning.


CLI Commands

Command Description
ogrep index . Index current directory
ogrep query "text" -n 10 Semantic search
ogrep status Show index statistics
ogrep reset -f Delete index
ogrep reindex . Rebuild from scratch
ogrep clean --vacuum Remove stale entries
ogrep models List available embedding models
ogrep tune . Auto-tune chunk size for your codebase
ogrep benchmark . Compare all models (accuracy, speed, settings)

Embedding Models

OpenAI Models (Cloud)

Model Alias Dimensions Price Best For
text-embedding-3-small small 1536 $0.02/M Most use cases (default)
text-embedding-3-large large 3072 $0.13/M High-accuracy, multi-language
text-embedding-ada-002 ada 1536 $0.10/M Legacy compatibility

Local Models (via LM Studio)

Model Alias Dimensions Optimal Chunks Accuracy Notes
all-MiniLM-L6-v2 minilm 384 30 lines 96% Best accuracy, smallest (~25MB)
nomic-embed-text-v1.5 nomic 768 90 lines 72% Larger context windows
bge-base-en-v1.5 bge 768 30 lines 52% Fallback option
bge-m3 bge-m3 1024 60 lines TBD Multi-lingual (100+ languages)
# Use model alias (minilm auto-selected when OGREP_BASE_URL is set)
ogrep index . -m minilm

# Or set environment for persistent config
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index .   # Auto-uses minilm

Important: Query model must match index model. Use ogrep status to check.


Smart Defaults

ogrep is optimized for source code search out of the box.

Source-Only Indexing

By default, ogrep indexes only source files and excludes:

Category Examples
Docs *.md, *.txt, *.rst, docs/*
Config *.json, *.yaml, *.toml, .editorconfig
Secrets .env, secrets.*, credentials.*
Build dist/*, build/*, *.min.js
Binary Images, fonts, media, archives, databases
Lock files package-lock.json, yarn.lock, poetry.lock

Skipped directories: .git/, node_modules/, .venv/, __pycache__/, .ogrep/

Smart Embedding Reuse

ogrep minimizes API costs with intelligent incremental indexing:

$ ogrep index .
Indexed into .ogrep/index.sqlite
  Files: 3 indexed, 42 skipped
  Chunks: 12 total (9 reused, ~900 tokens saved)
Edit Pattern Without Reuse With Reuse Savings
Edit 1 line in 300-line file 5 embeds 1 embed 80%
Append function to file 5 embeds 1 embed 80%
No changes 5 embeds 0 embeds 100%

Auto-Tuning

Different models and codebases have different optimal chunk sizes. Find yours:

ogrep tune . -m nomic
Testing chunk size 30... accuracy=0.32 (2/5 hits)
Testing chunk size 45... accuracy=0.56 (4/5 hits)
Testing chunk size 60... accuracy=0.36 (3/5 hits)
Testing chunk size 90... accuracy=0.72 (5/5 hits)  <-- OPTIMAL
Testing chunk size 120... accuracy=0.68 (5/5 hits)

Recommended chunk size: 90 lines

Save & Apply Tuning Results

# Just save for later (writes to .env)
ogrep tune . -m nomic --save

# Reindex immediately with optimal settings
ogrep tune . -m nomic --apply

# Both: save AND reindex
ogrep tune . -m nomic --save --apply

The OGREP_CHUNK_LINES environment variable persists your tuned value.


Model Benchmarking

Compare all available models to find the best one for your codebase:

ogrep benchmark . -s 10
RESULTS BY MODEL
--------------------------------------------------------------------------------
Model                   Dims  Chunk/Overlap  Accuracy  Index    Query
--------------------------------------------------------------------------------
minilm                   384       30 / 5       0.96    0.89s   0.01s  *
nomic                    768       90 / 15      0.72    1.87s   0.01s
bge                      768       30 / 10      0.52    1.65s   0.01s
large                   3072       30 / 15      0.52    3.12s   0.03s
small                   1536       45 / 15      0.48    2.34s   0.02s
--------------------------------------------------------------------------------

RECOMMENDATIONS
================================================================================
* BEST OVERALL: minilm
  Accuracy: 96% | Speed: 0.89s | Cost: FREE
  Optimal: 30-line chunks, 5-line overlap

* BEST CLOUD: large
  Accuracy: 52% | Speed: 3.12s | Cost: $0.13/M tokens

Benchmark Options

ogrep benchmark . --local-only     # Only test local models
ogrep benchmark . --cloud-only     # Only test OpenAI models
ogrep benchmark . --save           # Save optimal settings to .env
ogrep benchmark . --json           # Output as JSON
ogrep benchmark . -v               # Verbose per-configuration results

File Filtering

Include Normally-Excluded Files

# Include markdown files
ogrep index . -i '*.md'

# Include multiple patterns
ogrep index . -i '*.md' -i '*.json'

Add Extra Exclusions

# Exclude test files
ogrep index . -e 'test_*' -e '*_test.py'

# Exclude specific directories
ogrep index . -e 'fixtures/*' -e 'mocks/*'

Environment Variables

Variable Description Default
OPENAI_API_KEY OpenAI API key (required for cloud)
OGREP_BASE_URL Local server URL (e.g., LM Studio)
OGREP_MODEL Default embedding model Smart default*
OGREP_CHUNK_LINES Tuned chunk size Model default
OGREP_DIMENSIONS Embedding dimensions Model default

Smart Model Default:

  • If OGREP_BASE_URL is set → defaults to minilm (local)
  • Otherwise → defaults to text-embedding-3-small (OpenAI)

This means you can just set OGREP_BASE_URL and ogrep will automatically use the best local model.


Multi-Repo Scope Management

Prevent cross-repo pollution:

Flag Description
--db PATH Custom database path
--profile NAME Named profile (.ogrep/<name>/index.sqlite)
--global-cache Use ~/.cache/ogrep/<hash>/index.sqlite
--repo-root PATH Explicit repo root

Example Queries

# Find implementations
ogrep query "where is user authentication handled?" -n 10

# Find error handling
ogrep query "how are API errors handled?" -n 15

# Find database operations
ogrep query "database connection and queries" -n 10

# Find specific patterns
ogrep query "recursive file scanning" -n 5

Documentation


Development

git clone https://github.com/gplv2/ogrep-marketplace.git
cd ogrep-marketplace
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

make test    # Run tests (151 tests)
make lint    # Run linters
make check   # All checks

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ogrep-0.4.2.tar.gz (80.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ogrep-0.4.2-py3-none-any.whl (43.9 kB view details)

Uploaded Python 3

File details

Details for the file ogrep-0.4.2.tar.gz.

File metadata

  • Download URL: ogrep-0.4.2.tar.gz
  • Upload date:
  • Size: 80.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for ogrep-0.4.2.tar.gz
Algorithm Hash digest
SHA256 588787ca066649cf494e0d0a7d2a153689eaab4b2e80e3f171db1f99012df310
MD5 0337e8b8f79f39de7a445ab9cbff0e9e
BLAKE2b-256 97336a3708c5594c77db0feeb71df885d20a5f4ad3558039ffbedbcca821a0ae

See more details on using hashes here.

File details

Details for the file ogrep-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: ogrep-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 43.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for ogrep-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 98d409ae576c6dab068f02236ef494ce8f7ddf7eddf9ba7fca72f0470f665ed4
MD5 84b688f336a8d0bcf63dbd1e63ecfd4b
BLAKE2b-256 b08170d7292e570a0f639179714ad5be88fc170c50f490b887ff1a4614068e4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page