Skip to main content

Local semantic code search - hybrid SQLite/LanceDB, works offline with Ollama

Project description

semgrepll

Local semantic code search for AI agents and developers

Search your codebase using natural language — no API keys, no cloud, 100% offline.

PyPI Version License: MIT Python 3.10+

What is semgrepll?

semgrepll (pronounced "sem-grep-ell") is a local semantic code search tool that lets you search your codebase using natural language queries. Unlike traditional grep that searches for exact text matches, semgrepll understands meaning — so you can ask questions like:

  • "Where is authentication configured?"
  • "How does the user login flow work?"
  • "Find the payment processing logic"

Why semgrepll?

For Developers

  • Offline & Private — Your code never leaves your machine
  • No API Keys — Works without OpenAI, Anthropic, or any cloud service
  • Fast — Local Ollama embeddings, SQLite/LanceDB storage
  • Universal — Works with any programming language

For AI Agents

  • Understand Codebases — Semantic search helps agents navigate unfamiliar code
  • Reduce Token Usage — Instead of reading entire files, find exact locations
  • Faster Context — Get relevant code sections in milliseconds

Comparison

Tool Type Requires API Offline Best For
semgrepll Semantic ❌ No ✅ Yes Local AI dev
GitHub Copilot Semantic ✅ Yes ❌ No Cloud IDEs
ripgrep (rg) Exact ❌ No ✅ Yes Known patterns
Sourcegraph Semantic ✅ Yes ❌ No Enterprise

Installation

# Basic (SQLite backend - works out of the box)
pip install semgrepll

# With LanceDB (recommended for large projects)
pip install semgrepll[lance]

Requirements

  • Python 3.10+
  • Ollama running locally with mxbai-embed-large model
# Install Ollama and the embedding model
ollama pull mxbai-embed-large

Quick Start

# 1. Index your project (one-time)
semgrep index /path/to/your/project

# 2. Search semantically
semgrep search "how does authentication work"

# 3. List indexed projects
semgrep ls

# 4. Remove a project
semgrep rm project-name

Usage

CLI Commands

semgrep index <path>           # Index a project for search
semgrep search <query>         # Search indexed code
semgrep ls                     # List all indexed projects
semgrep rm <project>           # Remove a project index

Options

semgrep search "query"         # Search all indexed projects
semgrep search "query" -p myproject  # Search specific project
semgrep search "query" -e "pattern"  # Fallback to ripgrep

Configuration

Environment Variables

# Ollama endpoint (default: http://127.0.0.1:11434)
export OLLAMA_URL="http://localhost:11434/api/embeddings"

# Embedding model (default: mxbai-embed-large)
export EMBED_MODEL="mxbai-embed-large"

# Storage backend (auto | sqlite | lance)
# - auto: SQLite for small projects, LanceDB for large
# - sqlite: Force SQLite (no extra deps)
# - lance: Force LanceDB (needs lancedb installed)
export SEMGREP_BACKEND=auto

# Database path (default: ~/.semgrepll/db)
export SEMGREP_DB_PATH="/path/to/db"

When to Use Which Backend

Project Size Recommended Backend Why
Small (< 100 files) SQLite Zero deps, fast enough
Large (100+ files) LanceDB Better vector indexing
Mixed auto Automatic selection

How It Works

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Your      │────▶│   semgrepll   │────▶│   Ollama    │
│   Query     │     │  (embed)      │     │ (mxbai)     │
└─────────────┘     └──────────────┘     └─────────────┘
                                                  │
                     ┌──────────────┐            │
                     │   SQLite or   │◀───────────┘
                     │   LanceDB     │
                     │  (similarity) │
                     └──────────────┘
  1. Index — Your code files are chunked and embedded using Ollama
  2. Search — Your query is embedded, then compared against indexed code
  3. Results — Most similar code sections returned with relevance scores

Use Cases

Developer Onboarding

# New to the codebase? Ask questions!
semgrep search "how do I add a new API route"
semgrep search "where is error handling"

AI Agent Integration

# In your AI agent
subprocess.run(["semgrep", "search", "-p", "myproject", "auth configuration"])
# Returns: file paths + code snippets + relevance scores

Code Review

# Find all places that touch payments
semgrep search "payment processing"

Example Output

🔍 Searching: how does authentication work

📄 auth.ts (score: 0.85)
   // Authentication module
   export class AuthService {
     async signIn(email: string, password: string) {
       return this.client.auth.signInWithPassword({...});
     }
   }

📄 middleware.ts (score: 0.72)
   export function authMiddleware(request: NextRequest) {
     const token = request.headers.get('authorization');
     ...
   }

Contributing

# Clone and develop
git clone https://github.com/rizperdana/semgrepll
cd semgrepll
pip install -e ".[all]"
pip install pytest black mypy

# Run tests
pytest

# Format
black semgrepll/

License

MIT License — see LICENSE for details.

Related


TL;DR: pip install semgrepllsemgrep index ./srcsemgrep search "how does X work"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semgrepll-1.2.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semgrepll-1.2.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file semgrepll-1.2.0.tar.gz.

File metadata

  • Download URL: semgrepll-1.2.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semgrepll-1.2.0.tar.gz
Algorithm Hash digest
SHA256 e870c8af61ff6c834c412ce5868f78647711d02f9b930a60257e01056f707f65
MD5 846be8fb72134e2fa9797a71794de152
BLAKE2b-256 6d2d2a27cb23edfd78ee835f872e5e8549869a7a8a6891fe498aeb3746b08cc7

See more details on using hashes here.

File details

Details for the file semgrepll-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: semgrepll-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semgrepll-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e8bb477ed11c920a3982dcbcd0aef5093a260b672045b892b3303291990e473a
MD5 bf8f51f8824601fd4c9d3d3db951e135
BLAKE2b-256 58db0ea8b72cf5adab62842552b5dadda25282bb067e07dc001d981d4c149314

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page