FLAMEHAVEN FileSearch - Open source semantic document search powered by Google Gemini
Project description
๐ฅ FLAMEHAVEN FileSearch
Open Source Semantic Document Search
๐ฏ What is FLAMEHAVEN FileSearch?
FLAMEHAVEN FileSearch is a practical, developer-friendly RAG (Retrieval Augmented Generation) solution for modern semantic document search. It empowers rapid deployment, customization, and experimentation for startups, researchers, and SaaS builders.
This project is proof that powerful AI search can be fast, simple, and open. Solo builders now have the tools to run advanced semantic file search in minutesโno corporate barriers, with full transparency and flexibility.
โจ Key Features
๐บ Python & FastAPI Based
Deploy and start searching files in under 10 minutes. Production-ready REST API with interactive documentation.
๐บ Multi-Format Support
Handles PDF, DOCX, TXT, MD with a simple 50MB upload cap for MVP environments.
๐บ Integrated Google Gemini Embedding
Delivers accurate semantic search aligned with state-of-the-art LLM capabilities (gemini-2.5-flash).
๐บ Source Citations
Every answer is traceableโprecise titles and URIs ensure verifiability. Maximum 5 sources in Lite tier.
๐บ Open Source for Real Collaboration
Built for rapid prototyping and true community-driven growth. MIT licensed.
๐บ Lightweight, Open Architecture
- Fast DIY deployments
- Transparent control and easy extensibility
- Instant setup without cloud vendor lock-in
- Code visibility, forkability, and rapid iteration
- Perfect for solo developers and startups
๐ How Does It Differ from Google Gemini API File Search Tool?
| Feature | Google Gemini File Search | FLAMEHAVEN FileSearch |
|---|---|---|
| Infrastructure | Fully managed, enterprise-grade | Self-hosted, lightweight |
| Scaling | Unlimited, automated | MVP-focused (50MB cap) |
| Control | Black box | Full code transparency |
| Deployment | Cloud-only | Docker, on-premise, anywhere |
| Setup Time | Variable | Under 10 minutes |
| Cost | Pay-per-use | Free & open source |
| Customization | Limited | Fully extensible |
| Vendor Lock-in | Yes (Google Cloud) | No lock-in |
| Use Case | Enterprise, scale | Startups, DIY, prototyping |
Google Gemini API File Search Tool
Offers fully managed, enterprise-grade RAG with robust infrastructure, unlimited scaling, automated chunking, and seamless context injection at scale. Ideal for organizations seeking highly scalable, cost-effective, and hands-off document grounding.
FLAMEHAVEN FileSearch
Provides lightweight, open architecture for fast DIY deployments with transparent control, easy extensibility, instant setup without complex onboarding, and code visibilityโperfect for solo developers and startups.
๐ Quick Start (3 Steps, 2 Minutes!)
Installation
# Core library only
pip install flamehaven-filesearch
# With API server (recommended)
pip install flamehaven-filesearch[api]
Set API Key
export GEMINI_API_KEY="your-gemini-api-key-here"
Get your API key at: https://ai.google.dev/
Start Searching!
Option 1: Python Library (3 lines of code!)
from flamehaven_filesearch import FlamehavenFileSearch
searcher = FlamehavenFileSearch()
searcher.upload_file("document.pdf")
result = searcher.search("What are the key findings?")
print(result['answer'])
print(f"Sources: {result['sources']}")
Option 2: API Server
# Start server
uvicorn flamehaven_filesearch.api:app --reload
# Upload file
curl -X POST "http://localhost:8000/upload" \
-F "file=@document.pdf"
# Search
curl "http://localhost:8000/search?q=key+findings"
Interactive API docs: http://localhost:8000/docs
๐ Table of Contents
- Installation
- Basic Usage
- API Server
- Docker Deployment
- Configuration
- Architecture
- Examples
- Testing
- Contributing
- License
๐ฆ Installation Options
Option 1: PyPI (Recommended)
# Minimal installation
pip install flamehaven-filesearch
# With API server support
pip install flamehaven-filesearch[api]
# With development tools
pip install flamehaven-filesearch[dev]
# Everything
pip install flamehaven-filesearch[all]
Option 2: From Source
git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch
pip install -e ".[api]"
Option 3: Docker
docker pull flamehaven/filesearch:latest
# OR build locally
docker build -t flamehaven-filesearch .
๐ก Basic Usage
Simple Example (Library)
from flamehaven_filesearch import FlamehavenFileSearch
import os
# Initialize
searcher = FlamehavenFileSearch(api_key=os.getenv("GEMINI_API_KEY"))
# Upload a file
result = searcher.upload_file("research_paper.pdf")
print(f"โ Uploaded: {result['status']}")
# Search
answer = searcher.search("What methodology did they use?")
print(f"\nAnswer: {answer['answer']}")
print(f"\nSources:")
for i, source in enumerate(answer['sources'], 1):
print(f" {i}. {source['title']}")
Multiple Stores (Organize by Project)
# Create separate stores
searcher.create_store("research")
searcher.create_store("legal")
searcher.create_store("business")
# Upload to specific stores
searcher.upload_file("paper.pdf", store_name="research")
searcher.upload_file("contract.pdf", store_name="legal")
searcher.upload_file("plan.docx", store_name="business")
# Search in specific context
research_answer = searcher.search("methodology", store_name="research")
legal_answer = searcher.search("termination clause", store_name="legal")
Batch Upload
files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
result = searcher.upload_files(files, store_name="project-alpha")
print(f"โ Uploaded {result['success']}/{result['total']} files")
๐ก API Server
Start Server
# Method 1: Using uvicorn directly
export GEMINI_API_KEY="your-key"
uvicorn flamehaven_filesearch.api:app --reload
# Method 2: Using provided script
./scripts/start_server.sh
# Method 3: Using Makefile
make server
# Production mode (4 workers)
make server-prod
Server starts on: http://localhost:8000
Interactive docs: http://localhost:8000/docs
API Endpoints
๐ค Upload Files
# Single file
curl -X POST "http://localhost:8000/upload" \
-F "file=@document.pdf" \
-F "store=default"
# Multiple files
curl -X POST "http://localhost:8000/upload-multiple" \
-F "files=@doc1.pdf" \
-F "files=@doc2.pdf" \
-F "store=research"
๐ Search
# GET (simple)
curl "http://localhost:8000/search?q=key+findings&store=default"
# POST (advanced)
curl -X POST "http://localhost:8000/search" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the main conclusions?",
"store_name": "default",
"temperature": 0.7,
"max_tokens": 512
}'
๐๏ธ Manage Stores
# List all stores
curl "http://localhost:8000/stores"
# Create store
curl -X POST "http://localhost:8000/stores" \
-H "Content-Type: application/json" \
-d '{"name": "my-project"}'
# Delete store
curl -X DELETE "http://localhost:8000/stores/my-project"
๐ Health & Metrics
# Health check
curl "http://localhost:8000/health"
# Metrics
curl "http://localhost:8000/metrics"
Python API Client
import requests
class FlamehavenAPIClient:
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url
def upload(self, file_path, store="default"):
with open(file_path, "rb") as f:
files = {"file": f}
data = {"store": store}
response = requests.post(f"{self.base_url}/upload",
files=files, data=data)
return response.json()
def search(self, query, store="default"):
response = requests.get(f"{self.base_url}/search",
params={"q": query, "store": store})
return response.json()
# Usage
client = FlamehavenAPIClient()
client.upload("document.pdf")
result = client.search("summary")
print(result['answer'])
๐ณ Docker Deployment
Quick Start
# Run with environment variable
docker run -d \
-p 8000:8000 \
-e GEMINI_API_KEY="your-key" \
--name flamehaven-api \
flamehaven-filesearch
Docker Compose (Recommended for Production)
# docker-compose.yml
version: '3.8'
services:
flamehaven-api:
image: flamehaven-filesearch:latest
ports:
- "8000:8000"
environment:
- GEMINI_API_KEY=${GEMINI_API_KEY}
- MAX_FILE_SIZE_MB=50
- WORKERS=4
volumes:
- ./uploads:/tmp/uploads
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Start
docker-compose up -d
# Stop
docker-compose down
Build Custom Image
# Build
docker build -t my-sovdef-lite .
# Run
docker run -d -p 8000:8000 \
-e GEMINI_API_KEY="your-key" \
my-sovdef-lite
โ๏ธ Configuration
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
GEMINI_API_KEY |
Google Gemini API key | - | โ Yes |
MAX_FILE_SIZE_MB |
Maximum file size (MB) | 50 | No |
UPLOAD_TIMEOUT_SEC |
Upload timeout (seconds) | 60 | No |
DEFAULT_MODEL |
Gemini model to use | gemini-2.5-flash | No |
MAX_OUTPUT_TOKENS |
Max response tokens | 1024 | No |
TEMPERATURE |
Model temperature (0.0-1.0) | 0.5 | No |
MAX_SOURCES |
Max citation sources | 5 | No |
HOST |
API server host | 0.0.0.0 | No |
PORT |
API server port | 8000 | No |
WORKERS |
Uvicorn workers | 1 | No |
.env File
# Copy example
cp .env.example .env
# Edit .env
nano .env
# .env
GEMINI_API_KEY=your-api-key-here
MAX_FILE_SIZE_MB=50
DEFAULT_MODEL=gemini-2.5-flash
TEMPERATURE=0.5
MAX_SOURCES=5
Programmatic Configuration
from flamehaven_filesearch import FlamehavenFileSearch, Config
# Custom configuration
config = Config(
api_key="your-key",
max_file_size_mb=100,
default_model="gemini-2.5-flash",
temperature=0.7,
max_sources=10
)
searcher = FlamehavenFileSearch(config=config)
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FLAMEHAVEN File Search Tool โ
โ (FLAMEHAVEN FileSearch v1.0.0) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ FastAPI โโโโโโโโโโถโ FlamehavenFileSearch โ โ
โ โ REST API โ โ Core โ โ
โ โโโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Google Gemini File Search โ โ
โ โ (gemini-2.5-flash) โ โ
โ โ โ โ
โ โ โข Semantic embedding โ โ
โ โ โข Document chunking โ โ
โ โ โข Grounding & citations โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Components
-
Core Library (
flamehaven_filesearch/core.py)FlamehavenFileSearchclass - Main interface- File upload with validation
- Store management
- Search with automatic grounding
-
API Server (
flamehaven_filesearch/api.py)- FastAPI application
- RESTful endpoints
- OpenAPI/Swagger documentation
- Error handling & logging
-
Configuration (
flamehaven_filesearch/config.py)- Environment-based config
- Validation & defaults
- Driftlock settings
Data Flow
1. Upload: File โ Validation โ Google File Search Store
2. Search: Query โ Gemini 2.5 Flash โ Grounded Answer + Citations
3. Result: Answer + Sources (titles, URIs) โ User
๐ Examples
Example 1: Document Q&A
from flamehaven_filesearch import FlamehavenFileSearch
searcher = FlamehavenFileSearch()
# Upload technical documentation
searcher.upload_file("api_docs.pdf", store_name="docs")
searcher.upload_file("user_guide.pdf", store_name="docs")
# Ask questions
answer = searcher.search(
"How do I authenticate with the API?",
store_name="docs",
temperature=0.3 # Lower for factual queries
)
print(answer['answer'])
Example 2: Research Paper Analysis
# Upload multiple papers
papers = [
"paper1_methodology.pdf",
"paper2_results.pdf",
"paper3_discussion.pdf"
]
result = searcher.upload_files(papers, store_name="research")
# Analyze across papers
answer = searcher.search(
"Compare the methodologies used in these papers",
store_name="research",
max_tokens=2048 # Longer response
)
for i, source in enumerate(answer['sources'], 1):
print(f"{i}. {source['title']}")
Example 3: Legal Document Search
# Upload contracts
searcher.upload_file("contract_v1.pdf", store_name="legal")
searcher.upload_file("terms_of_service.pdf", store_name="legal")
# Search for specific clauses
answer = searcher.search(
"What are the termination and renewal clauses?",
store_name="legal",
temperature=0.1 # Very factual
)
More examples in examples/ directory.
๐งช Testing
Run Tests
# All unit tests
pytest
# With coverage
pytest --cov=flamehaven_filesearch --cov-report=html
# Specific test file
pytest tests/test_core.py -v
# Integration tests (requires API key)
pytest -m integration
Using Makefile
make test # Run unit tests
make test-cov # With coverage report
make test-integration # Integration tests
Test Coverage
Current coverage: >85%
View HTML report: htmlcov/index.html
๐ ๏ธ Development
Setup Development Environment
# Clone repository
git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch
# Install with dev dependencies
pip install -e ".[dev,api]"
# Copy environment file
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY
Code Quality Tools
# Format code
make format
# OR
black flamehaven_filesearch/ tests/ examples/
isort flamehaven_filesearch/ tests/ examples/
# Run linters
make lint
# OR
flake8 flamehaven_filesearch/
mypy flamehaven_filesearch/
Build Package
# Build distribution
make build
# Test on TestPyPI
make publish-test
# Publish to PyPI (when ready)
make publish
๐ค Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Ways to Contribute
- ๐ Report bugs via Issues
- ๐ก Suggest features via Discussions
- ๐ Improve documentation
- ๐ง Submit pull requests
- โญ Star the repository to show support!
Quick Contribution Guide
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
pytest) - Commit (
git commit -m 'Add amazing feature') - Push (
git push origin feature/amazing-feature) - Open a Pull Request
๐ Performance Benchmarks
| Operation | Time | Notes |
|---|---|---|
| File Upload (10MB) | ~5s | Including validation |
| Search Query | ~2s | With 5 sources |
| Store Creation | ~1s | One-time operation |
| Batch Upload (3 files) | ~12s | Parallel processing |
Benchmarks on standard VM (2 CPU, 4GB RAM)
๐บ๏ธ Roadmap
v1.1.0 (Planned)
- Caching layer for repeated queries
- Rate limiting and authentication
- Batch search operations
- WebSocket support for streaming
- Enhanced file type support
v2.0.0 (Future)
- Standard tier with advanced features
- Custom model fine-tuning
- Multi-language support
- Admin dashboard
- Analytics and insights
See CHANGELOG.md for version history.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License - Copyright (c) 2025 SovDef Team
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
๐ Acknowledgments
- Built on Google Gemini API
- Powered by FastAPI
- Inspired by Google File Search
๐ Support & Community
- GitHub Issues: Report bugs
- Discussions: Ask questions
- Email: dev@sovdef.ai
- Documentation: GitHub Wiki
๐ Why Choose FLAMEHAVEN FileSearch?
For Solo Developers
โ No corporate barriers - Get started in minutes โ Full code access - Understand and modify everything โ Zero vendor lock-in - Deploy anywhere โ Free & open source - No hidden costs
For Startups
โ Rapid prototyping - MVP in under 10 minutes โ Production-ready - FastAPI, Docker, CI/CD included โ Scalable architecture - Upgrade path to Standard tier โ Community support - Growing ecosystem
For Researchers
โ Transparent algorithms - Know how it works โ Extensible design - Easy to customize โ Academic-friendly - MIT license for research โ Reproducible results - Consistent API
๐ฅ Get Started Now!
# Install
pip install flamehaven-filesearch[api]
# Set API key
export GEMINI_API_KEY="your-key"
# Start searching!
python -c "
from flamehaven_filesearch import FlamehavenFileSearch
s = FlamehavenFileSearch()
s.upload_file('doc.pdf')
print(s.search('summary')['answer'])
"
Join the community and help redefine open AI search!
Made with โค๏ธ by the SovDef Team
๐ Project Stats
Tags: #opensource #filesearch #AI #RAG #GeminiAPI #startup #searchtools #python #fastapi #docker
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flamehaven_filesearch-1.0.0.tar.gz.
File metadata
- Download URL: flamehaven_filesearch-1.0.0.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dab2cc91de2709ee433187754b8fcd1706719c3c6fee0f36fbf34da531762924
|
|
| MD5 |
459e590fbc4ad605318e339ef806d714
|
|
| BLAKE2b-256 |
0b3888129826e67d17069f10db33d0a5a20cad6d2841b4e80f25be66f7f63a0b
|
Provenance
The following attestation bundles were made for flamehaven_filesearch-1.0.0.tar.gz:
Publisher:
publish.yml on flamehaven01/Flamehaven-Filesearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flamehaven_filesearch-1.0.0.tar.gz -
Subject digest:
dab2cc91de2709ee433187754b8fcd1706719c3c6fee0f36fbf34da531762924 - Sigstore transparency entry: 695967705
- Sigstore integration time:
-
Permalink:
flamehaven01/Flamehaven-Filesearch@aa1aa54e2e9df83f0dda0f390aed105d1716ea75 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/flamehaven01
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aa1aa54e2e9df83f0dda0f390aed105d1716ea75 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file flamehaven_filesearch-1.0.0-py3-none-any.whl.
File metadata
- Download URL: flamehaven_filesearch-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f769a1b37ca23517cd444729c6ca49a79174cccea2167c1a276813123e95791d
|
|
| MD5 |
bbec3a153bae8a26c290a1fdf2d4448c
|
|
| BLAKE2b-256 |
9e99a94c3672231b86f1358f24c8d9abc39c0921de061e79ebdc8f46682ad2ca
|
Provenance
The following attestation bundles were made for flamehaven_filesearch-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on flamehaven01/Flamehaven-Filesearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flamehaven_filesearch-1.0.0-py3-none-any.whl -
Subject digest:
f769a1b37ca23517cd444729c6ca49a79174cccea2167c1a276813123e95791d - Sigstore transparency entry: 695967716
- Sigstore integration time:
-
Permalink:
flamehaven01/Flamehaven-Filesearch@aa1aa54e2e9df83f0dda0f390aed105d1716ea75 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/flamehaven01
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aa1aa54e2e9df83f0dda0f390aed105d1716ea75 -
Trigger Event:
workflow_dispatch
-
Statement type: