Skip to main content

Local semantic search CLI tool for code and text files

Project description

lgrep

A local-first semantic search CLI tool for code and text files. Search your codebase using natural language queries powered by AI embeddings.

Installation

From PyPI

# Using pip
pip install lgrep-cli

# Using uv
uv pip install lgrep-cli

# With OpenAI support (optional)
pip install "lgrep-cli[openai]"

From Source

Using uv (recommended)

uv is a fast Python package manager written in Rust.

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

# With OpenAI support (optional)
uv pip install -e ".[openai]"

# With development tools
uv pip install -e ".[dev]"

Or use uv run to run without activating the venv:

uv run lgrep index .
uv run lgrep search "your query"

Using pip

# Install in editable mode
pip install -e .

# With OpenAI support (optional)
pip install -e ".[openai]"

# With development tools
pip install -e ".[dev]"

Using Docker

# Build the image
docker build -t lgrep .

# Run lgrep on your project (mount your code to /workspace)
docker run -v $(pwd):/workspace lgrep index .
docker run -v $(pwd):/workspace lgrep search "your query"

# Persist the index between runs
docker run -v $(pwd):/workspace -v lgrep-data:/workspace/.lgrep lgrep index .
docker run -v $(pwd):/workspace -v lgrep-data:/workspace/.lgrep lgrep search "database queries"

# Using OpenAI embeddings
docker run -v $(pwd):/workspace -e OPENAI_API_KEY=$OPENAI_API_KEY lgrep search "your query" --provider openai

Quick Start

# 1. Index your project
lgrep index .

# 2. Search semantically
lgrep search "database connection handling"

Multi-Repository Search

Search across multiple repositories (local folders or GitHub repos):

# Add repositories to the global index
lgrep repo add https://github.com/anthropics/anthropic-cookbook
lgrep repo add ~/Projects/my-project

# Search across all repositories
lgrep search "embeddings" --global

# Search a specific repository
lgrep search "authentication" --repo anthropic-cookbook

# Search only agent files (README.md, AGENTS.md, CLAUDE.md)
lgrep search "usage instructions" --agent-files

Commands

Index

Index a directory for semantic search:

# Index current directory
lgrep index

# Index a specific directory
lgrep index /path/to/project

# Clear existing index and reindex
lgrep index --clear

# Quiet mode (no progress output)
lgrep index --quiet

Search

Search for semantically similar content:

# Basic search
lgrep search "error handling in API routes"

# Limit results
lgrep search "authentication" --limit 5

# Set minimum similarity score (0-1)
lgrep search "logging" --min-score 0.7

# Show context lines before/after matches
lgrep search "database queries" --context 3

# List only matching files (no content)
lgrep search "config parsing" --files

# Filter by file pattern
lgrep search "tests" --file "test_*.py"

# Hide content snippets
lgrep search "imports" --no-content

Watch

Watch a directory and automatically reindex on changes:

# Watch current directory
lgrep watch

# Watch a specific directory
lgrep watch /path/to/project

# Press Ctrl+C to stop

Status

Show index statistics:

lgrep status

Config

Manage configuration:

# Initialize a new config file
lgrep config init

# Show current configuration
lgrep config show

# Show config file path
lgrep config path

Repo (Multi-Repository Management)

Manage repositories for global cross-repository search:

# Add a GitHub repository (clones and indexes automatically)
lgrep repo add https://github.com/owner/repo

# Add a local folder
lgrep repo add /path/to/project

# Add a local folder with GitHub URL metadata
lgrep repo add /path/to/project --url https://github.com/owner/repo

# Specify a branch for GitHub repos
lgrep repo add https://github.com/owner/repo --branch develop

# List all registered repositories
lgrep repo list

# Show detailed info about a repository (including agent files)
lgrep repo info my-repo

# Sync a repository (git pull for remote, re-index for all)
lgrep repo sync my-repo

# Sync all repositories
lgrep repo sync

# Remove a repository from the index
lgrep repo remove my-repo

Global Search Options

When searching across multiple repositories:

# Search all registered repositories
lgrep search "query" --global

# Search a specific repository by name or ID
lgrep search "query" --repo my-repo

# Search only agent files (README.md, AGENTS.md, CLAUDE.md)
lgrep search "query" --agent-files

# Combine filters
lgrep search "authentication" --repo my-repo --agent-files

Agent files are automatically detected and flagged during indexing. These are files commonly used by AI agents to understand a project:

  • README.md - Project documentation
  • AGENTS.md - Agent-specific instructions
  • CLAUDE.md - Claude-specific instructions

Configuration

Configuration is stored in .lgrep/config.toml. Create one with lgrep config init or manually:

[embedding]
provider = "local"  # "local" (fastembed), "sentence-transformers", or "openai"
model = "BAAI/bge-small-en-v1.5"  # Default fastembed model

[embedding.openai]
api_key = "${OPENAI_API_KEY}"
model = "text-embedding-3-small"

[index]
chunk_size = 512
chunk_overlap = 50
include = ["**/*.py", "**/*.ts", "**/*.js", "**/*.md", "**/*.txt"]
exclude = ["node_modules", ".git", "__pycache__", ".venv", "venv"]

[search]
default_limit = 10
min_score = 0.5

Embedding Providers

Local (default) - FastEmbed

Uses FastEmbed with ONNX runtime. Lightweight (~50MB), no PyTorch/NVIDIA dependencies.

# Available models
BAAI/bge-small-en-v1.5    # 384 dims, ~50MB (default)
BAAI/bge-base-en-v1.5     # 768 dims, ~100MB
sentence-transformers/all-MiniLM-L6-v2  # 384 dims

Sentence-Transformers (optional)

For GPU acceleration or different models. Requires PyTorch.

# Install with sentence-transformers support
pip install -e ".[sentence-transformers]"

# Use it
lgrep index --provider sentence-transformers
lgrep search "query" --provider sentence-transformers

OpenAI API

For cloud-based embeddings:

# Set your API key
export OPENAI_API_KEY=your-key-here

# Use OpenAI provider
lgrep index --provider openai
lgrep search "your query" --provider openai

Or configure in .lgrep/config.toml:

[embedding]
provider = "openai"

Supported File Types

  • Python: .py
  • JavaScript/TypeScript: .js, .ts, .tsx, .jsx
  • Go: .go
  • Rust: .rs
  • Java: .java
  • C/C++: .c, .cpp, .h, .hpp
  • Markdown: .md
  • Text: .txt
  • Config: .json, .yaml, .yml, .toml
  • And more...

How It Works

  1. Indexing: Files are read, split into chunks (preserving line numbers), and converted to embeddings using FastEmbed (default), sentence-transformers, or OpenAI
  2. Storage: Embeddings are stored locally using ChromaDB in .lgrep/index/
  3. Search: Your query is converted to an embedding and compared against stored embeddings using cosine similarity
  4. Results: Matching chunks are returned with file paths, line numbers, and similarity scores

Data Storage

Local Project Storage

Per-project data is stored in the .lgrep/ directory within your project:

your-project/
└── .lgrep/
    ├── config.toml    # Configuration
    ├── index/         # ChromaDB vector store
    └── cache/         # Embedding cache

Global Multi-Repository Storage

Multi-repository data is stored in ~/.lgrep/:

~/.lgrep/
├── repos.toml     # Repository registry
├── repos/         # Cloned GitHub repositories
│   ├── a1b2c3d4/  # Repo ID (hash of URL)
│   └── ...
└── index/         # Global ChromaDB index (all repos)

Running Tests

# Using uv
uv pip install -e ".[dev]"
uv run pytest tests/

# Using pip
pip install -e ".[dev]"
pytest tests/

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lgrep_cli-0.1.1.tar.gz (291.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lgrep_cli-0.1.1-py3-none-any.whl (48.3 kB view details)

Uploaded Python 3

File details

Details for the file lgrep_cli-0.1.1.tar.gz.

File metadata

  • Download URL: lgrep_cli-0.1.1.tar.gz
  • Upload date:
  • Size: 291.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lgrep_cli-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a5edd90cb1800279e0797aa21800eab78de1dbe4aaabb996072ec9fef86c3605
MD5 dc2ee1a7ccff0612303d90a41d35c54e
BLAKE2b-256 d9f8855285dd7c6bb191db8d27afdfc7d79c070b5ae353a82b063fcd6a7c12a9

See more details on using hashes here.

File details

Details for the file lgrep_cli-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: lgrep_cli-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 48.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lgrep_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a8d7a77880d0f19c98775f2ebb65be67306ebcd3860948cbbca23f1620fbe1fb
MD5 883aac61a89e0fade1e4815edef7a0f6
BLAKE2b-256 6a8433414ffd25ce982723b71967d73612f7fe0a123bb6fc990311acbf6dd752

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page