Skip to main content

FLAMEHAVEN FileSearch - Open source semantic document search with API authentication powered by Google Gemini

Project description

FLAMEHAVEN FileSearch

Your documents. Searchable in minutes. No infrastructure needed.

Search your local documents with RAG instantly

CI/CD Latest Version Python 3.8+ MIT License

Quick Start (3 min)DocumentationRoadmapContributing


Problem: Your Situation

✗ You have PDF, Word, and text documents scattered across your local drive
✗ You want to search them intelligently, not keyword-by-keyword
✗ You don't want to upload your data to external services (Pinecone, Cloudflare, etc.)
✗ You need production-ready security without complex setup
✗ You want zero infrastructure costs for prototype phase

Solution: FLAMEHAVEN FileSearch

✓ Local RAG search engine in 5 minutes
✓ 100% self-hosted (your data stays yours)
✓ Single Docker command deployment
✓ Free tier Google Gemini (up to 1500 queries/month)
✓ v1.2.0: Enterprise-grade authentication & multi-user support
✓ Batch search API for 1-100 queries per request
✓ Optional Redis for distributed caching across workers

[>] 3-Minute Quick Start

1. Docker (No Setup)

# Start with one command
docker run -d \
  -e GEMINI_API_KEY="your_gemini_api_key" \
  -p 8000:8000 \
  -v $(pwd)/data:/app/data \
  flamehaven-filesearch:1.2.0

# Available at http://localhost:8000 in 3 seconds

2. Your First Search (cURL)

# Step 1: Generate API key (v1.2.0 requirement)
curl -X POST http://localhost:8000/api/admin/keys \
  -H "X-Admin-Key: your_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My First Key",
    "permissions": ["upload", "search"]
  }'

# Response: {"key_id": "...", "plain_key": "sk_live_abc123..."}
# SAVE this key - it won't be shown again!

# Step 2: Upload a document
curl -X POST http://localhost:8000/api/upload/single \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@sample.pdf" \
  -F "store=documents"

# Step 3: Search
curl -X POST http://localhost:8000/api/search \
  -H "Authorization: Bearer sk_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main points in this document?",
    "store": "documents"
  }'

# Response:
# {
#   "answer": "The main points are...",
#   "sources": [
#     {
#       "file": "sample.pdf",
#       "page": 3,
#       "excerpt": "..."
#     }
#   ]
# }

3. Python Code Example

from flamehaven_filesearch import FlamehavenFileSearch, FileSearchConfig

# Configuration
config = FileSearchConfig(
    google_api_key="your_gemini_api_key",
    environment="offline"  # or "remote" for online
)

# Initialize
searcher = FlamehavenFileSearch(config)

# Create document store
searcher.create_store("my_documents")

# Upload files
searcher.upload_file("path/to/document.pdf", "my_documents")
searcher.upload_file("path/to/document.docx", "my_documents")

# Search with RAG
result = searcher.search(
    "Summarize the key findings",
    store="my_documents"
)

print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")

Features

Core Features (v1.1.0+)

Feature Description Benefit
Multi-format Support PDF, DOCX, MD, TXT (up to 50MB) Works with all document types
Semantic Search AI-powered natural language queries Far more accurate than keyword search
Source Attribution Original documents linked to answers Transparency & trustworthiness
Store Management Organize documents into collections Structured search across categories
REST API + Python SDK Two ways to integrate Flexible deployment options
LRU Cache 1-hour TTL, 1000 items <10ms response on cache hits
Prometheus Metrics 17 monitoring metrics Operations visibility
Security Headers OWASP-compliant Enterprise-ready

New in v1.2.0 (Enterprise-Grade)

Feature Description Use Case
[*] API Key Authentication Bearer token OAuth2-compatible Multi-user environments
[*] Key Management API Programmatic key creation/revocation Automated user provisioning
[*] Audit Logging Complete request audit trail Compliance & security
[*] Per-Key Rate Limiting Customizable limits per API key Fair resource allocation
[*] Admin Dashboard Web UI for key management User-friendly operations
[&] Batch Search API Process 1-100 queries per request High-throughput scenarios
[&] Redis Cache Backend Distributed caching Multi-worker deployments

Installation

Option 1: Pip (Recommended for Development)

# Core functionality
pip install flamehaven-filesearch

# With API server
pip install flamehaven-filesearch[api]

# With Redis support
pip install flamehaven-filesearch[api,redis]

# Everything (for development)
pip install flamehaven-filesearch[all]

Option 2: Docker (Recommended for Production)

# Build image
docker build -t flamehaven-filesearch:1.2.0 .

# Run standalone
docker run \
  -e GEMINI_API_KEY="your_key" \
  -e FLAMEHAVEN_ADMIN_KEY="your_admin_key" \
  -p 8000:8000 \
  flamehaven-filesearch:1.2.0

# With Docker Compose (includes Redis)
docker-compose up -d

Option 3: Kubernetes (For High Availability)

See DEPLOYMENT_GUIDE_v1.2.0.md for complete Kubernetes manifests with StatefulSets, ConfigMaps, and Secrets.


Configuration

Required Environment Variables

# Google Gemini API key (get free tier at ai.google.dev)
export GEMINI_API_KEY="your_api_key"

# Admin key for creating API keys in v1.2.0
export FLAMEHAVEN_ADMIN_KEY="your_secure_admin_password"

Optional Environment Variables

# Server settings
export HOST="0.0.0.0"              # Default: 127.0.0.1
export PORT="8000"                  # Default: 8000
export ENVIRONMENT="production"     # Default: development

# Database location
export FLAMEHAVEN_API_KEYS_DB="/path/to/api_keys.db"

# Redis (for multi-worker deployments)
export REDIS_HOST="localhost"       # Default: localhost
export REDIS_PORT="6379"            # Default: 6379
export REDIS_PASSWORD="password"    # Optional

# API limits
export MAX_FILE_SIZE_MB="50"        # Default: 50

API Key Management (v1.2.0)

Generate API Key

curl -X POST http://localhost:8000/api/admin/keys \
  -H "X-Admin-Key: $FLAMEHAVEN_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production API Key",
    "permissions": ["upload", "search", "stores", "delete"],
    "rate_limit_per_minute": 100
  }'

List Your Keys

curl http://localhost:8000/api/admin/keys \
  -H "Authorization: Bearer sk_live_your_key..."

Revoke a Key

curl -X DELETE http://localhost:8000/api/admin/keys/{key_id} \
  -H "Authorization: Bearer sk_live_your_key..."

View Usage Statistics

curl "http://localhost:8000/api/admin/usage?days=7" \
  -H "Authorization: Bearer sk_live_your_key..."

Performance

Operation Response Time Throughput
Health check <10ms -
Search (cache hit) <10ms >2 searches/sec
Search (cache miss) 500ms - 3s Depends on Gemini API
Batch search (10 queries) 2-5s >1 batch/sec
File upload 1-5s >1 file/sec

Resource Usage

  • Memory: ~200MB baseline + 50MB per 1000 cached queries
  • CPU: <5% idle, <50% under sustained load
  • Disk: 10MB per 100,000 cached queries

Security Features (v1.2.0)

  • [#] API Key Encryption: SHA256 hashing (plain keys never stored)
  • [#] Rate Limiting: Per-API-key customizable limits (default 100/min)
  • [#] Permission Control: Fine-grained access (upload, search, stores, delete)
  • [#] Audit Logging: Complete request trail with timestamps
  • [#] OWASP Headers: Security headers on all responses
  • [#] Request Tracking: Unique request IDs for debugging
  • [#] Key Expiration: Optional time-limited API keys

Roadmap

v1.2.1 (Q4 2025)

  • Improved admin authentication (IAM integration)
  • Redis UI configuration in dashboard
  • Encryption at rest for sensitive data
  • Fix deprecated FastAPI on_event decorators

v1.3.0 (Q1 2026)

  • OAuth2/OIDC integration
  • API key rotation
  • Billing/metering system
  • Advanced analytics dashboard

v2.0.0 (Q2 2026)

  • Multi-language support
  • Enhanced file type support (XLSX, PPTX, RTF)
  • Export search results (JSON, CSV, PDF)
  • WebSocket streaming support

Good First Issues (Contributions Welcome!)

Looking to contribute? Start with these issues:

  1. Add XLSX Support (Difficulty: Medium)

    • Extend file upload to handle Excel spreadsheets
    • Reference: flamehaven_filesearch/loaders.py
    • Estimated time: 2-3 hours
  2. Implement Search Result Caching UI (Difficulty: Easy)

    • Display cache hit rate in admin dashboard
    • Reference: flamehaven_filesearch/dashboard.py
    • Estimated time: 1-2 hours
  3. Add Batch Search Progress Tracking (Difficulty: Medium)

    • Implement WebSocket endpoint for real-time batch progress
    • Reference: flamehaven_filesearch/batch_routes.py
    • Estimated time: 3-4 hours
  4. Create Integration Tests (Difficulty: Easy)

    • Write end-to-end tests for common workflows
    • Reference: tests/test_api_integration.py
    • Estimated time: 2-3 hours
  5. Add Dark Mode to Admin Dashboard (Difficulty: Easy)

    • Extend dashboard.py with CSS dark theme toggle
    • Reference: flamehaven_filesearch/dashboard.py
    • Estimated time: 1-2 hours

See CONTRIBUTING.md for complete contribution guidelines.


Troubleshooting

Issue: 401 Unauthorized on API requests

Solution:

  • Verify FLAMEHAVEN_ADMIN_KEY environment variable is set
  • Check Authorization header format: Authorization: Bearer sk_live_your_key
  • Ensure API key hasn't expired

Issue: "Redis connection refused"

Solution:

  • Verify Redis is running: redis-cli ping
  • Check REDIS_HOST and REDIS_PORT environment variables
  • Remove Redis if not needed (system falls back to LRU cache)

Issue: High memory usage

Solution:

  • Configure Redis eviction policy: maxmemory-policy allkeys-lru
  • Reduce cache TTL in configuration
  • Monitor cache metrics: curl http://localhost:8000/prometheus | grep cache

Issue: Slow searches

Solution:

  • Check if caching is working (view cache_hits_total metric)
  • Verify Gemini API is responsive
  • Check network latency to Redis instance

Migration from v1.1.0 to v1.2.0

Breaking Changes

All protected endpoints now require API key authentication.

Migration Steps

  1. Update dependencies:

    pip install -U flamehaven-filesearch[api]
    
  2. Set admin key:

    export FLAMEHAVEN_ADMIN_KEY="your_secure_admin_key"
    
  3. Generate first API key:

    curl -X POST http://localhost:8000/api/admin/keys \
      -H "X-Admin-Key: your_admin_key" \
      -H "Content-Type: application/json" \
      -d '{"name":"Production","permissions":["upload","search","stores","delete"]}'
    
  4. Update application code:

    # Old (v1.1.0)
    requests.post("http://localhost:8000/api/search", json={"query": "test"})
    
    # New (v1.2.0)
    api_key = "sk_live_your_key"
    requests.post(
        "http://localhost:8000/api/search",
        json={"query": "test"},
        headers={"Authorization": f"Bearer {api_key}"}
    )
    
  5. Test before production deployment

Rollback Plan

If issues occur, v1.1.0 remains available. No data loss on downgrade.


Support & Community


Architecture Overview

[Your Documents]
       |
       v
[File Upload Endpoint] ---> [File Parser] ---> [Store Manager]
       |                         |                    |
       +---- (REST API) --------+---- (SQLite DB)----+
       |
[Search Endpoint] ---> [Semantic Search] ---> [Gemini API]
       |                   |
       +--- (Cache) -------+
       |
[Prometheus Metrics] <--- [Audit Log]

Performance Metrics

Recent v1.2.0 benchmark (Docker on M1 Mac):

Health Check:           8ms
Search (cache hit):     9ms
Search (cache miss):    1250ms
Batch Search (10 queries, parallel): 2500ms
Upload (50MB file):     3200ms

License

FLAMEHAVEN FileSearch is released under the MIT License. See LICENSE file for details.


Acknowledgments

Built with:

  • FastAPI - Modern Python web framework
  • Google Gemini API - Semantic understanding
  • SQLite - Lightweight database
  • Redis (optional) - Distributed caching

Questions? Open an issue or email support@flamehaven.space

Last Updated: November 16, 2025 (v1.2.0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flamehaven_filesearch-1.2.0.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flamehaven_filesearch-1.2.0-py3-none-any.whl (57.6 kB view details)

Uploaded Python 3

File details

Details for the file flamehaven_filesearch-1.2.0.tar.gz.

File metadata

  • Download URL: flamehaven_filesearch-1.2.0.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flamehaven_filesearch-1.2.0.tar.gz
Algorithm Hash digest
SHA256 ac2a74eb429d5ec6749b8d43369436e3fb4a91984e0127864b212fe0f8930a98
MD5 bc67f127d77a2be1a217234e4466159e
BLAKE2b-256 f61ff789178f42b00ce242f1bc16ed413e932d0ab509b40bbb45c5ed71507bf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for flamehaven_filesearch-1.2.0.tar.gz:

Publisher: publish.yml on flamehaven01/Flamehaven-Filesearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flamehaven_filesearch-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for flamehaven_filesearch-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14e2526168f15ac0e5cda42e11b9b2015e92efc84045ea90ff8f055903f9febf
MD5 8dfc719d513765300bbb7de55f2f5269
BLAKE2b-256 f2615cfcb8305ec7efdbbdd289017fd2cc3f26572a2355589b83952d50e56121

See more details on using hashes here.

Provenance

The following attestation bundles were made for flamehaven_filesearch-1.2.0-py3-none-any.whl:

Publisher: publish.yml on flamehaven01/Flamehaven-Filesearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page