Skip to main content

FLAMEHAVEN FileSearch - Open source semantic document search powered by Google Gemini

Project description

๐Ÿ”ฅ FLAMEHAVEN FileSearch

Open Source Semantic Document Search

CI/CD PyPI version Python Versions License: MIT PRs Welcome


๐ŸŽฏ What is FLAMEHAVEN FileSearch?

FLAMEHAVEN FileSearch is a practical, developer-friendly RAG (Retrieval Augmented Generation) solution for modern semantic document search. It empowers rapid deployment, customization, and experimentation for startups, researchers, and SaaS builders.

This project is proof that powerful AI search can be fast, simple, and open. Solo builders now have the tools to run advanced semantic file search in minutesโ€”no corporate barriers, with full transparency and flexibility.


โœจ Key Features

๐Ÿ”บ Python & FastAPI Based

Deploy and start searching files in under 10 minutes. Production-ready REST API with interactive documentation.

๐Ÿ”บ Multi-Format Support

Handles PDF, DOCX, TXT, MD with a simple 50MB upload cap for MVP environments.

๐Ÿ”บ Integrated Google Gemini Embedding

Delivers accurate semantic search aligned with state-of-the-art LLM capabilities (gemini-2.5-flash).

๐Ÿ”บ Source Citations

Every answer is traceableโ€”precise titles and URIs ensure verifiability. Maximum 5 sources in Lite tier.

๐Ÿ”บ Open Source for Real Collaboration

Built for rapid prototyping and true community-driven growth. MIT licensed.

๐Ÿ”บ Lightweight, Open Architecture

  • Fast DIY deployments
  • Transparent control and easy extensibility
  • Instant setup without cloud vendor lock-in
  • Code visibility, forkability, and rapid iteration
  • Perfect for solo developers and startups

๐Ÿ†š How Does It Differ from Google Gemini API File Search Tool?

Feature Google Gemini File Search FLAMEHAVEN FileSearch
Infrastructure Fully managed, enterprise-grade Self-hosted, lightweight
Scaling Unlimited, automated MVP-focused (50MB cap)
Control Black box Full code transparency
Deployment Cloud-only Docker, on-premise, anywhere
Setup Time Variable Under 10 minutes
Cost Pay-per-use Free & open source
Customization Limited Fully extensible
Vendor Lock-in Yes (Google Cloud) No lock-in
Use Case Enterprise, scale Startups, DIY, prototyping

Google Gemini API File Search Tool

Offers fully managed, enterprise-grade RAG with robust infrastructure, unlimited scaling, automated chunking, and seamless context injection at scale. Ideal for organizations seeking highly scalable, cost-effective, and hands-off document grounding.

FLAMEHAVEN FileSearch

Provides lightweight, open architecture for fast DIY deployments with transparent control, easy extensibility, instant setup without complex onboarding, and code visibilityโ€”perfect for solo developers and startups.


๐Ÿš€ Quick Start (3 Steps, 2 Minutes!)

Installation

# Core library only
pip install flamehaven-filesearch

# With API server (recommended)
pip install flamehaven-filesearch[api]

Set API Key

export GEMINI_API_KEY="your-gemini-api-key-here"

Get your API key at: https://ai.google.dev/

Start Searching!

Option 1: Python Library (3 lines of code!)

from flamehaven_filesearch import FlamehavenFileSearch

searcher = FlamehavenFileSearch()
searcher.upload_file("document.pdf")
result = searcher.search("What are the key findings?")

print(result['answer'])
print(f"Sources: {result['sources']}")

Option 2: API Server

# Start server
uvicorn flamehaven_filesearch.api:app --reload

# Upload file
curl -X POST "http://localhost:8000/upload" \
  -F "file=@document.pdf"

# Search
curl "http://localhost:8000/search?q=key+findings"

Interactive API docs: http://localhost:8000/docs


๐Ÿ“‹ Table of Contents


๐Ÿ“ฆ Installation Options

Option 1: PyPI (Recommended)

# Minimal installation
pip install flamehaven-filesearch

# With API server support
pip install flamehaven-filesearch[api]

# With development tools
pip install flamehaven-filesearch[dev]

# Everything
pip install flamehaven-filesearch[all]

Option 2: From Source

git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch
pip install -e ".[api]"

Option 3: Docker

docker pull flamehaven/filesearch:latest
# OR build locally
docker build -t flamehaven-filesearch .

๐Ÿ’ก Basic Usage

Simple Example (Library)

from flamehaven_filesearch import FlamehavenFileSearch
import os

# Initialize
searcher = FlamehavenFileSearch(api_key=os.getenv("GEMINI_API_KEY"))

# Upload a file
result = searcher.upload_file("research_paper.pdf")
print(f"โœ“ Uploaded: {result['status']}")

# Search
answer = searcher.search("What methodology did they use?")
print(f"\nAnswer: {answer['answer']}")
print(f"\nSources:")
for i, source in enumerate(answer['sources'], 1):
    print(f"  {i}. {source['title']}")

Multiple Stores (Organize by Project)

# Create separate stores
searcher.create_store("research")
searcher.create_store("legal")
searcher.create_store("business")

# Upload to specific stores
searcher.upload_file("paper.pdf", store_name="research")
searcher.upload_file("contract.pdf", store_name="legal")
searcher.upload_file("plan.docx", store_name="business")

# Search in specific context
research_answer = searcher.search("methodology", store_name="research")
legal_answer = searcher.search("termination clause", store_name="legal")

Batch Upload

files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
result = searcher.upload_files(files, store_name="project-alpha")
print(f"โœ“ Uploaded {result['success']}/{result['total']} files")

๐Ÿ“ก API Server

Start Server

# Method 1: Using uvicorn directly
export GEMINI_API_KEY="your-key"
uvicorn flamehaven_filesearch.api:app --reload

# Method 2: Using provided script
./scripts/start_server.sh

# Method 3: Using Makefile
make server

# Production mode (4 workers)
make server-prod

Server starts on: http://localhost:8000

Interactive docs: http://localhost:8000/docs

API Endpoints

๐Ÿ“ค Upload Files

# Single file
curl -X POST "http://localhost:8000/upload" \
  -F "file=@document.pdf" \
  -F "store=default"

# Multiple files
curl -X POST "http://localhost:8000/upload-multiple" \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.pdf" \
  -F "store=research"

๐Ÿ” Search

# GET (simple)
curl "http://localhost:8000/search?q=key+findings&store=default"

# POST (advanced)
curl -X POST "http://localhost:8000/search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main conclusions?",
    "store_name": "default",
    "temperature": 0.7,
    "max_tokens": 512
  }'

๐Ÿ—‚๏ธ Manage Stores

# List all stores
curl "http://localhost:8000/stores"

# Create store
curl -X POST "http://localhost:8000/stores" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-project"}'

# Delete store
curl -X DELETE "http://localhost:8000/stores/my-project"

๐Ÿ“Š Health & Metrics

# Health check
curl "http://localhost:8000/health"

# Metrics
curl "http://localhost:8000/metrics"

Python API Client

import requests

class FlamehavenAPIClient:
    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url

    def upload(self, file_path, store="default"):
        with open(file_path, "rb") as f:
            files = {"file": f}
            data = {"store": store}
            response = requests.post(f"{self.base_url}/upload",
                                    files=files, data=data)
        return response.json()

    def search(self, query, store="default"):
        response = requests.get(f"{self.base_url}/search",
                               params={"q": query, "store": store})
        return response.json()

# Usage
client = FlamehavenAPIClient()
client.upload("document.pdf")
result = client.search("summary")
print(result['answer'])

๐Ÿณ Docker Deployment

Quick Start

# Run with environment variable
docker run -d \
  -p 8000:8000 \
  -e GEMINI_API_KEY="your-key" \
  --name flamehaven-api \
  flamehaven-filesearch

Docker Compose (Recommended for Production)

# docker-compose.yml
version: '3.8'

services:
  flamehaven-api:
    image: flamehaven-filesearch:latest
    ports:
      - "8000:8000"
    environment:
      - GEMINI_API_KEY=${GEMINI_API_KEY}
      - MAX_FILE_SIZE_MB=50
      - WORKERS=4
    volumes:
      - ./uploads:/tmp/uploads
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
# Start
docker-compose up -d

# Stop
docker-compose down

Build Custom Image

# Build
docker build -t my-sovdef-lite .

# Run
docker run -d -p 8000:8000 \
  -e GEMINI_API_KEY="your-key" \
  my-sovdef-lite

โš™๏ธ Configuration

Environment Variables

Variable Description Default Required
GEMINI_API_KEY Google Gemini API key - โœ… Yes
MAX_FILE_SIZE_MB Maximum file size (MB) 50 No
UPLOAD_TIMEOUT_SEC Upload timeout (seconds) 60 No
DEFAULT_MODEL Gemini model to use gemini-2.5-flash No
MAX_OUTPUT_TOKENS Max response tokens 1024 No
TEMPERATURE Model temperature (0.0-1.0) 0.5 No
MAX_SOURCES Max citation sources 5 No
HOST API server host 0.0.0.0 No
PORT API server port 8000 No
WORKERS Uvicorn workers 1 No

.env File

# Copy example
cp .env.example .env

# Edit .env
nano .env
# .env
GEMINI_API_KEY=your-api-key-here
MAX_FILE_SIZE_MB=50
DEFAULT_MODEL=gemini-2.5-flash
TEMPERATURE=0.5
MAX_SOURCES=5

Programmatic Configuration

from flamehaven_filesearch import FlamehavenFileSearch, Config

# Custom configuration
config = Config(
    api_key="your-key",
    max_file_size_mb=100,
    default_model="gemini-2.5-flash",
    temperature=0.7,
    max_sources=10
)

searcher = FlamehavenFileSearch(config=config)

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              FLAMEHAVEN File Search Tool                    โ”‚
โ”‚            (FLAMEHAVEN FileSearch v1.0.0)                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”‚
โ”‚  โ”‚   FastAPI     โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  FlamehavenFileSearch  โ”‚               โ”‚
โ”‚  โ”‚   REST API    โ”‚         โ”‚     Core     โ”‚               โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ”‚
โ”‚          โ”‚                         โ”‚                        โ”‚
โ”‚          โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚
โ”‚          โ”‚  โ”‚                                               โ”‚
โ”‚          โ–ผ  โ–ผ                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚       Google Gemini File Search         โ”‚              โ”‚
โ”‚  โ”‚         (gemini-2.5-flash)              โ”‚              โ”‚
โ”‚  โ”‚                                          โ”‚              โ”‚
โ”‚  โ”‚  โ€ข Semantic embedding                   โ”‚              โ”‚
โ”‚  โ”‚  โ€ข Document chunking                    โ”‚              โ”‚
โ”‚  โ”‚  โ€ข Grounding & citations                โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Components

  1. Core Library (flamehaven_filesearch/core.py)

    • FlamehavenFileSearch class - Main interface
    • File upload with validation
    • Store management
    • Search with automatic grounding
  2. API Server (flamehaven_filesearch/api.py)

    • FastAPI application
    • RESTful endpoints
    • OpenAPI/Swagger documentation
    • Error handling & logging
  3. Configuration (flamehaven_filesearch/config.py)

    • Environment-based config
    • Validation & defaults
    • Driftlock settings

Data Flow

1. Upload: File โ†’ Validation โ†’ Google File Search Store
2. Search: Query โ†’ Gemini 2.5 Flash โ†’ Grounded Answer + Citations
3. Result: Answer + Sources (titles, URIs) โ†’ User

๐Ÿ“š Examples

Example 1: Document Q&A

from flamehaven_filesearch import FlamehavenFileSearch

searcher = FlamehavenFileSearch()

# Upload technical documentation
searcher.upload_file("api_docs.pdf", store_name="docs")
searcher.upload_file("user_guide.pdf", store_name="docs")

# Ask questions
answer = searcher.search(
    "How do I authenticate with the API?",
    store_name="docs",
    temperature=0.3  # Lower for factual queries
)

print(answer['answer'])

Example 2: Research Paper Analysis

# Upload multiple papers
papers = [
    "paper1_methodology.pdf",
    "paper2_results.pdf",
    "paper3_discussion.pdf"
]
result = searcher.upload_files(papers, store_name="research")

# Analyze across papers
answer = searcher.search(
    "Compare the methodologies used in these papers",
    store_name="research",
    max_tokens=2048  # Longer response
)

for i, source in enumerate(answer['sources'], 1):
    print(f"{i}. {source['title']}")

Example 3: Legal Document Search

# Upload contracts
searcher.upload_file("contract_v1.pdf", store_name="legal")
searcher.upload_file("terms_of_service.pdf", store_name="legal")

# Search for specific clauses
answer = searcher.search(
    "What are the termination and renewal clauses?",
    store_name="legal",
    temperature=0.1  # Very factual
)

More examples in examples/ directory.


๐Ÿงช Testing

Run Tests

# All unit tests
pytest

# With coverage
pytest --cov=flamehaven_filesearch --cov-report=html

# Specific test file
pytest tests/test_core.py -v

# Integration tests (requires API key)
pytest -m integration

Using Makefile

make test           # Run unit tests
make test-cov       # With coverage report
make test-integration  # Integration tests

Test Coverage

Current coverage: >85%

View HTML report: htmlcov/index.html


๐Ÿ› ๏ธ Development

Setup Development Environment

# Clone repository
git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch

# Install with dev dependencies
pip install -e ".[dev,api]"

# Copy environment file
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Code Quality Tools

# Format code
make format
# OR
black flamehaven_filesearch/ tests/ examples/
isort flamehaven_filesearch/ tests/ examples/

# Run linters
make lint
# OR
flake8 flamehaven_filesearch/
mypy flamehaven_filesearch/

Build Package

# Build distribution
make build

# Test on TestPyPI
make publish-test

# Publish to PyPI (when ready)
make publish

๐Ÿค Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Ways to Contribute

  • ๐Ÿ› Report bugs via Issues
  • ๐Ÿ’ก Suggest features via Discussions
  • ๐Ÿ“ Improve documentation
  • ๐Ÿ”ง Submit pull requests
  • โญ Star the repository to show support!

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (pytest)
  5. Commit (git commit -m 'Add amazing feature')
  6. Push (git push origin feature/amazing-feature)
  7. Open a Pull Request

๐Ÿ“ˆ Performance Benchmarks

Operation Time Notes
File Upload (10MB) ~5s Including validation
Search Query ~2s With 5 sources
Store Creation ~1s One-time operation
Batch Upload (3 files) ~12s Parallel processing

Benchmarks on standard VM (2 CPU, 4GB RAM)


๐Ÿ—บ๏ธ Roadmap

v1.1.0 (Planned)

  • Caching layer for repeated queries
  • Rate limiting and authentication
  • Batch search operations
  • WebSocket support for streaming
  • Enhanced file type support

v2.0.0 (Future)

  • Standard tier with advanced features
  • Custom model fine-tuning
  • Multi-language support
  • Admin dashboard
  • Analytics and insights

See CHANGELOG.md for version history.


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License - Copyright (c) 2025 SovDef Team

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...

๐Ÿ™ Acknowledgments


๐Ÿ“ž Support & Community


๐ŸŒŸ Why Choose FLAMEHAVEN FileSearch?

For Solo Developers

โœ… No corporate barriers - Get started in minutes โœ… Full code access - Understand and modify everything โœ… Zero vendor lock-in - Deploy anywhere โœ… Free & open source - No hidden costs

For Startups

โœ… Rapid prototyping - MVP in under 10 minutes โœ… Production-ready - FastAPI, Docker, CI/CD included โœ… Scalable architecture - Upgrade path to Standard tier โœ… Community support - Growing ecosystem

For Researchers

โœ… Transparent algorithms - Know how it works โœ… Extensible design - Easy to customize โœ… Academic-friendly - MIT license for research โœ… Reproducible results - Consistent API


๐Ÿ”ฅ Get Started Now!

# Install
pip install flamehaven-filesearch[api]

# Set API key
export GEMINI_API_KEY="your-key"

# Start searching!
python -c "
from flamehaven_filesearch import FlamehavenFileSearch
s = FlamehavenFileSearch()
s.upload_file('doc.pdf')
print(s.search('summary')['answer'])
"

Join the community and help redefine open AI search!


Made with โค๏ธ by the SovDef Team

โญ Star on GitHub | ๐Ÿ“š Documentation | ๐Ÿ› Report Issue


๐Ÿ“Š Project Stats

GitHub stars GitHub forks GitHub watchers


Tags: #opensource #filesearch #AI #RAG #GeminiAPI #startup #searchtools #python #fastapi #docker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flamehaven_filesearch-1.0.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flamehaven_filesearch-1.0.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file flamehaven_filesearch-1.0.0.tar.gz.

File metadata

  • Download URL: flamehaven_filesearch-1.0.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flamehaven_filesearch-1.0.0.tar.gz
Algorithm Hash digest
SHA256 dab2cc91de2709ee433187754b8fcd1706719c3c6fee0f36fbf34da531762924
MD5 459e590fbc4ad605318e339ef806d714
BLAKE2b-256 0b3888129826e67d17069f10db33d0a5a20cad6d2841b4e80f25be66f7f63a0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for flamehaven_filesearch-1.0.0.tar.gz:

Publisher: publish.yml on flamehaven01/Flamehaven-Filesearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flamehaven_filesearch-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for flamehaven_filesearch-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f769a1b37ca23517cd444729c6ca49a79174cccea2167c1a276813123e95791d
MD5 bbec3a153bae8a26c290a1fdf2d4448c
BLAKE2b-256 9e99a94c3672231b86f1358f24c8d9abc39c0921de061e79ebdc8f46682ad2ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for flamehaven_filesearch-1.0.0-py3-none-any.whl:

Publisher: publish.yml on flamehaven01/Flamehaven-Filesearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page