Semantic file search powered by Google Gemini embeddings
Project description
EmbeddedFinder
Semantic file search for your local filesystem.
Ask questions in plain English — find what you need across code, documents, images, audio, and video.
Powered by Google Gemini Embedding 2 and ChromaDB.
❯ efind
╭─ ◆ EmbeddedFinder v0.1.0 ─────────────────────────────────────╮
│ Semantic file search powered by Gemini Embedding 2 │
│ ● 142 files (387 chunks) │ .embeddedfinder/db │
╰──────────────────────────────────────────────────────────────────╯
Type a query to search, or /help for commands.
❯ functions that validate user authentication tokens
5 results (0.3s) │ "functions that validate user authentication tokens"
1 95% PY auth.py 4K
src/auth/auth.py
▸ def validate_token(token: str) -> bool: ...
2 87% PY middleware.py 2K
src/middleware/middleware.py
▸ class AuthMiddleware: def process_request(self, req)...
Why EmbeddedFinder?
Traditional file search (grep, find, ag) matches exact text. EmbeddedFinder understands meaning. Search for "error handling in payments" and find files about exception catching in billing code — even if those exact words never appear.
It works on everything: source code, config files, PDFs, Word documents, images, audio, and video — all in one index.
Features
- Natural language search — describe what you're looking for, not keywords
- Multimodal indexing — code, text, PDFs, DOCX, images, audio, and video files
- Interactive TUI — rich terminal UI with slash commands, progress bars, and color-coded results
- First-run setup wizard — guided onboarding with API key validation
- Incremental indexing — content-hashed, only re-processes changed files
- Batch embedding — groups chunks into minimal API calls for fast indexing
- File watching — auto-reindex when files change on disk
- One-shot CLI — scriptable commands for CI/automation
- Smart ranking — filename matching, file type relevance, and content-aware scoring
Quick start
Install
pip install embedded-finder
Or from source:
git clone https://github.com/vladmarian20005/EmbeddedFinder.git
cd EmbeddedFinder
pip install .
Run
efind
On first launch, a setup wizard walks you through:
- Enter your Google AI API key (free tier available)
- The key is validated and saved securely to
~/.config/embeddedfinder/config.json - Optionally index a directory right away
That's it — start searching.
Already have a key?
# Option A: environment variable
export GOOGLE_API_KEY=your-key-here
# Option B: .env file in your project root
echo "GOOGLE_API_KEY=your-key-here" > .env
# Option C: set it interactively
efind
# then type: /key set
Usage
Interactive mode (default)
efind
Type natural language queries at the ❯ prompt:
❯ database migration scripts
❯ files that handle image resizing
❯ error handling in the payment module
❯ screenshots of the dashboard
❯ audio files with speech
Results show similarity scores, file types, paths, and content snippets — color-coded by relevance.
Slash commands
| Command | Description |
|---|---|
/index <path> |
Index a directory |
/reindex <path> |
Re-index only changed files |
/status |
Show index statistics |
/clear |
Clear the entire index |
/watch <path> |
Watch a directory and auto-reindex |
/key |
Show current API key info |
/key set |
Set or change your API key |
/key delete |
Remove saved API key |
/key show |
Reveal the full API key |
/help |
Show available commands |
/quit or Ctrl+C |
Exit |
CLI commands
For scripting and one-off use:
# Index a directory
efind index ./src
# Index specific file types only
efind index ./src -e .py -e .ts
# Search
efind search "authentication middleware"
# Search with options
efind search "config parsing" --top 5 --min-score 0.7
# Plain text output (no colors, good for piping)
efind search "database models" --plain
# Re-index changed files only
efind reindex ./src
# Watch for changes
efind watch ./src
# Show index stats
efind status
# Clear the index
efind clear
# Check version
efind --version
Supported file types
| Category | Extensions |
|---|---|
| Code | .py .js .ts .jsx .tsx .java .c .cpp .h .hpp .go .rs .rb .php .swift .kt .scala .sh .bash .zsh .lua .pl .ex .exs .r .m .sql |
| Markup | .html .css .scss .less .xml .svg |
| Config | .json .yaml .yml .toml .ini .cfg .conf |
| Text | .txt .md .rst .csv |
| Documents | .pdf .docx |
| Images | .png .jpg .jpeg .gif .webp .bmp |
| Audio | .mp3 .wav .ogg .flac .m4a |
| Video | .mp4 .mov .avi .mkv .webm |
Images, audio, and video are embedded natively using Gemini's multimodal capabilities — no transcription or OCR needed.
PDFs with 6 or fewer pages are embedded natively; larger PDFs use text extraction for efficiency.
How it works
Directory EmbeddedFinder ChromaDB
───────── ───────────────────────── ─────────────────────
files/ ──→ 1. Crawl (skip .git, etc.)
──→ 2. Extract (text / bytes)
──→ 3. Chunk (~2000 tokens)
──→ 4. Hash (SHA-256 dedup)
──→ 5. Embed (Gemini API) ──→ Store vectors
query ──→ 6. Embed query ──→ Nearest-neighbor
──→ 7. Deduplicate by file search
──→ 8. Re-rank & boost ──→ Results
- Content hashing — files are fingerprinted with SHA-256; re-indexing skips anything unchanged
- Batch embedding — text chunks are grouped into batches (up to 100 per API call) for throughput
- Rate limiting — built-in token bucket limiter respects Gemini API quotas
- Parallel processing — multi-threaded extraction and embedding with up to 4 workers
- Smart ranking — results are boosted by filename match, file type relevance to query, content overlap, and path depth
- Directory filtering — hidden directories (starting with
.) and common non-content directories (node_modules,__pycache__,.venv,dist,build, etc.) are automatically skipped during crawling
Configuration
| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY |
— | Google AI API key (required) |
EMBEDDEDFINDER_DB_DIR |
.embeddedfinder/db |
Path to the ChromaDB database |
The API key can also be stored via the setup wizard or /key set, which saves it to ~/.config/embeddedfinder/config.json with owner-only permissions.
Project structure
embedded_finder/
├── cli.py # Click CLI — subcommands + TUI launcher
├── tui.py # Interactive Rich-based REPL
├── config.py # Settings, supported extensions, env vars
├── config_store.py # Persistent config file management
├── crawler.py # Recursive file discovery
├── extractor.py # Text extraction, chunking, MIME detection
├── embedder.py # Gemini Embedding API client + batching
├── store.py # ChromaDB vector store
├── indexer.py # Orchestrates crawl → extract → embed → store
├── search.py # Query embedding + nearest-neighbor search
├── ranker.py # Result ranking, dedup, and formatting
├── rate_limiter.py # Token bucket rate limiter
└── watcher.py # Filesystem watcher (watchdog)
Development
# Clone and install with dev dependencies
git clone https://github.com/vladmarian20005/EmbeddedFinder.git
cd EmbeddedFinder
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=embedded_finder
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a pull request
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedded_finder-0.2.2.tar.gz.
File metadata
- Download URL: embedded_finder-0.2.2.tar.gz
- Upload date:
- Size: 52.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7934ced6a4b38ea0db9eb3759898829e1dbd5ce1b18fce62a4a818d2a8647f54
|
|
| MD5 |
47454ef047a68e0ff4037541ace323ab
|
|
| BLAKE2b-256 |
1756729de950aa47b89de8e7b3329858e459e141934c8cbd58d7ad908058b4ae
|
File details
Details for the file embedded_finder-0.2.2-py3-none-any.whl.
File metadata
- Download URL: embedded_finder-0.2.2-py3-none-any.whl
- Upload date:
- Size: 40.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
293764a9bf28b845d83403f71226769cdafa28f60edd2b45955168d55784d1bb
|
|
| MD5 |
5b389c1d5701d2200a5379086acc0fcf
|
|
| BLAKE2b-256 |
9f58ce6be0fcdad2cbec5624a99cf8f34fa4495af1571432bb024755a94f11f5
|