Skip to main content

Intelligent file indexing and search system

Project description

FileSift

FileSift

A fast, open-source utility that helps AI coding agents intelligently search and understand codebases.

PyPI Python License


FileSift lets your AI coding agent search across a codebase based on what code does, rather than what it looks like. Instead of sifting through entire files after a grep, your agent can jump straight to the most relevant code using natural language queries like "authentication middleware" or "database connection pooling". Everything runs locally on your machine — your code never leaves your environment.

Key benefits:

  • Smarter search — hybrid keyword + semantic search finds code by intent, not just string matching
  • Less context wasted — agents get pointed to the right files immediately, saving token budget on exploration

Installation

pip install filesift

Usage

There are three ways to use FileSift, depending on your workflow:

1. CLI

The most straightforward approach. Good for testing queries, managing indexes, and configuring settings.

# Index a project
filesift index /path/to/your/project

# Search for files by what they do
filesift find "authentication and session handling"

# Search in a specific directory
filesift find "retry logic for API calls" --path /path/to/project

2. MCP Server

Installing FileSift also provides a filesift-mcp command — a lightweight MCP server that exposes indexing and search as tools over STDIO. This works with most popular coding agents including Claude Code, Cursor, Copilot, and more.

Add it to your agent's MCP configuration:

{
  "mcpServers": {
    "filesift": {
      "command": "filesift-mcp"
    }
  }
}

The MCP server exposes four tools:

  • filesift_search — search an indexed codebase by natural language query
  • filesift_find_related — find files related to a given file via imports and semantic similarity
  • filesift_index — index a directory to enable searching
  • filesift_status — check indexing status of a directory

3. Skills

FileSift ships with a search-codebase skill that can be installed directly into your coding agent's skill directory. This lets the agent interact with the FileSift CLI through bash, without requiring MCP support.

# Install for Claude Code (default)
filesift skill install

# Install for other agents
filesift skill install --agent cursor
filesift skill install --agent copilot
filesift skill install --agent codex

Supported agents: claude, codex, cursor, copilot, gemini, roo, windsurf.

How It Works

FileSift uses a daemonized embedding model to keep searches fast. At its core, it generates embeddings from code descriptions and performs searches against small vector stores called indexes.

  1. Indexingfilesift index first builds a fast keyword/structural index (completes in seconds), then triggers background semantic indexing that generates embeddings for each file.

  2. Daemon — A background daemon loads indexes into memory and automatically shuts down after a configurable period of inactivity. After the first cold-start search, subsequent searches are near-instant.

  3. Search — Queries are matched using both keyword (BM25) and semantic (FAISS) search, then combined via Reciprocal Rank Fusion for the best of both approaches.

Indexes are stored in a .filesift directory within each indexed project.

Configuration

FileSift uses a TOML configuration file, manageable via the CLI:

# View all settings
filesift config list --all

# Set a value
filesift config set search.MAX_RESULTS 20
filesift config set daemon.INACTIVITY_TIMEOUT 600

# Manage ignore patterns
filesift config add-ignore "node_modules" ".venv"
filesift config list-ignore

Configuration sections: search, indexing, daemon, models, paths.

Contributing

Contributions are welcome! To get started:

git clone https://github.com/roshunsunder/filesift.git
cd filesift
pip install -e .
  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit your changes and open a pull request

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filesift-0.2.0.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filesift-0.2.0-py3-none-any.whl (52.2 kB view details)

Uploaded Python 3

File details

Details for the file filesift-0.2.0.tar.gz.

File metadata

  • Download URL: filesift-0.2.0.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for filesift-0.2.0.tar.gz
Algorithm Hash digest
SHA256 83aeca230131954b847e82d82efad6f479dd00d75e333a8769dc1620f7cc8d92
MD5 3f4ce8869d97648f4b344c231d4a0a8a
BLAKE2b-256 f918ce5e4ad2eeba2179168117c78c607320cb7bc2cc48db847806ac2163f03f

See more details on using hashes here.

File details

Details for the file filesift-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: filesift-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 52.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for filesift-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e46caac3cb5903c569e38cca413d396b112af23cdf5332a2130b77e718804f12
MD5 c1249ec1f80da13f11d246307f8f9d5e
BLAKE2b-256 f6449a195567a4414515c0b86bfde4cff4b2fc1bacc2e4c4a34d21d783e5a124

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page