FLAMEHAVEN FileSearch - Open source semantic document search with API authentication powered by Google Gemini

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

flamehaven

These details have not been verified by PyPI

Project description

FLAMEHAVEN FileSearch

Self-hosted RAG search engine. Production-ready in 3 minutes.

Quick Start • Features • Documentation • API Reference • Contributing

🎯 Why FLAMEHAVEN?

Stop sending your sensitive documents to third-party services. Get enterprise-grade semantic search running locally in minutes, not days.

# One command. Three minutes. Done.
docker run -d -p 8000:8000 -e GEMINI_API_KEY="your_key" flamehaven-filesearch:1.5.0

🚀 Fast

Production deployment in 3 minutes
Vector generation in <1ms
Zero ML dependencies

🔒 Private

100% self-hosted
Your data never leaves your infrastructure
Enterprise-grade security

💰 Cost-Effective

Free tier: 1,500 queries/month
No infrastructure costs
Open source & MIT licensed

Features ✨

Core Capabilities

🔍 Smart Search Modes - Keyword, semantic, and hybrid search with automatic typo correction
📄 Multi-Format Support - PDF, DOCX, TXT, MD, and common image formats
⚡ Ultra-Fast Vectors - DSP v2.0 algorithm generates embeddings in <1ms without ML frameworks
🎯 Source Attribution - Every answer includes links back to source documents

What's New in v1.5.2 (patch)

Parse Cache (engine/parse_cache.py) — mtime-based file parse cache; extract_text(use_cache=True) skips re-parsing unchanged files
ContextExtractor (engine/context_extractor.py) — sliding-window chunk context enrichment for RAG pipelines (enrich_chunks())
Backend Plugin Architecture (engine/format_backends.py) — 11 AbstractFormatBackend subclasses + BackendRegistry; new formats plug in without touching the dispatcher
file_parser.py refactored to 75 lines (was 340); cyclomatic complexity 13 → 3
83 new tests; AI-Slop-Detector critical deficits: 0

What's New in v1.5.1 (patch)

Dead code removed — embedding_generator_legacy.py deleted (306-line duplicate, unused)
Code quality — Critical nested_complexity eliminated in 6 files; avg slop-detector score 13.46 → 11.25
Test suite expanded — 360 tests (was 331); AI-Slop-Detector critical deficits 7 → 0

What's New in v1.5.0

Universal Document Parser — 34 file formats, zero external document-AI dependency
- PDF (pymupdf/pypdf), DOCX/DOC, XLSX, PPTX, RTF via optional [parsers] extra
- HTML, WebVTT, LaTeX, CSV — stdlib only, no extra install needed
- Image OCR via [vision] extra (pytesseract)
Content-Based RAG Embeddings — File content extracted and embedded (not filename); semantic search now works correctly in local mode
Internal Text Chunker — Structure-aware + token-aware RAG chunking with zero ML deps (chunk_text())
Framework Integrations — LangChain, LlamaIndex, Haystack, CrewAI adapters (flamehaven_filesearch.integrations)

Features in v1.4.2

Performance fix - Vector generation now < 1 ms for ASCII text (ASCII shortcut skips detect_language)
Windows compatibility - MAX_FILENAME_LENGTH reduced to 200 to prevent MAX_PATH overflow
Code quality - ABC + @abstractmethod for VectorStore, MetadataStore, IAMProvider
CI/CD - Replaced flake8 with ruff; lint and test pipelines fully green

Features in v1.4.1

Usage Tracking & Quotas - Per-API-key request/token tracking with daily/monthly limits
Admin Usage APIs - Detailed usage stats, quota management, and alert monitoring
pgvector Maintenance - HNSW reindexing, VACUUM ANALYZE, and index statistics
Circuit Breaker - Automatic failure recovery for database connections

Production Features (v1.4.0+)

Multimodal Search - Text + image search endpoint (optional)
HNSW Vector Index - High-performance similarity search with pgvector
OAuth2/OIDC Support - JWT validation alongside API keys
PostgreSQL Backend - Enterprise-grade persistence and vector store
Vision Processing - Image metadata extraction with size limits and timeouts

Enterprise Features (v1.2.2+)

🔐 API Key Authentication - Fine-grained permission system
⚡ Rate Limiting - Configurable per-user quotas
📊 Audit Logging - Complete request history
📦 Batch Processing - Process 1-100 queries per request
📈 Admin Dashboard - Real-time metrics and management

Quick Start 🚀

Option 1: Docker (Recommended)

The fastest path to production:

docker run -d \
  -p 8000:8000 \
  -e GEMINI_API_KEY="your_gemini_api_key" \
  -e FLAMEHAVEN_ADMIN_KEY="secure_admin_password" \
  -v $(pwd)/data:/app/data \
  flamehaven-filesearch:1.5.0

✅ Server running at http://localhost:8000

Option 2: Python SDK

Perfect for integrating into existing applications:

from flamehaven_filesearch import FlamehavenFileSearch, FileSearchConfig

# Initialize
config = FileSearchConfig(google_api_key="your_gemini_key")
fs = FlamehavenFileSearch(config)

# Upload and search
fs.upload_file("company_handbook.pdf", store="docs")
result = fs.search("What is our remote work policy?", store="docs")

print(result['answer'])
# Output: "Employees can work remotely up to 3 days per week..."

Option 3: REST API

For language-agnostic integration:

# 1. Generate API key
curl -X POST http://localhost:8000/api/admin/keys \
  -H "X-Admin-Key: your_admin_key" \
  -d '{"name":"production","permissions":["upload","search"]}'

# 2. Upload document
curl -X POST http://localhost:8000/api/upload/single \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@document.pdf" \
  -F "store=my_docs"

# 3. Search
curl -X POST http://localhost:8000/api/search \
  -H "Authorization: Bearer sk_live_abc123..." \
  -H "Content-Type: application/json" \
  -d 
  '{ 
    "query": "What are the main findings?",
    "store": "my_docs",
    "search_mode": "hybrid"
  }'

📦 Installation

# Core package (HTML, CSV, LaTeX, WebVTT, plain-text parsing included — zero extra deps)
pip install flamehaven-filesearch

# + Document parsers: PDF (pymupdf/pypdf), DOCX, XLSX, PPTX, RTF
pip install flamehaven-filesearch[parsers]

# + Image OCR (Pillow + pytesseract; requires Tesseract system binary)
pip install flamehaven-filesearch[vision]

# + Google Gemini API
pip install flamehaven-filesearch[google]

# + REST API server (FastAPI + uvicorn)
pip install flamehaven-filesearch[api]

# + HNSW vector index
pip install flamehaven-filesearch[vector]

# + PostgreSQL backend
pip install flamehaven-filesearch[postgres]

# Everything
pip install flamehaven-filesearch[all]

# Build from source
git clone https://github.com/flamehaven01/Flamehaven-Filesearch.git
cd Flamehaven-Filesearch
docker build -t flamehaven-filesearch:1.5.0 .

Framework Integrations

Framework SDKs (LangChain, LlamaIndex, etc.) are imported lazily — install only what you need:

# LangChain  (pip install langchain-core)
from flamehaven_filesearch.integrations import FlamehavenLangChainLoader
docs = FlamehavenLangChainLoader("report.pdf", chunk=True).load()

# LlamaIndex  (pip install llama-index-core)
from flamehaven_filesearch.integrations import FlamehavenLlamaIndexReader
nodes = FlamehavenLlamaIndexReader(chunk=True).load_data(["report.pdf", "slides.pptx"])

# Haystack  (pip install haystack-ai)
from flamehaven_filesearch.integrations import FlamehavenHaystackConverter
result = FlamehavenHaystackConverter().run(sources=["report.pdf"])

# CrewAI  (pip install crewai)
from flamehaven_filesearch.integrations import FlamehavenCrewAITool
tool = FlamehavenCrewAITool()           # pass to your agent's tools list

Configuration ⚙️

Required Environment Variables

export GEMINI_API_KEY="your_google_gemini_api_key"
export FLAMEHAVEN_ADMIN_KEY="your_secure_admin_password"

Optional Configuration

export HOST="0.0.0.0"              # Bind address
export PORT="8000"                  # Server port
export REDIS_HOST="localhost"       # Distributed caching
export REDIS_PORT="6379"            # Redis port

Advanced Configuration

Create a config.yaml for fine-tuned control:

vector_store:
  quantization: int8
  compression: gravitas_pack
  
search:
  default_mode: hybrid
  typo_correction: true
  max_results: 10
  
security:
  rate_limit: 100  # requests per minute
  max_file_size: 52428800  # 50MB

📊 Performance

Metric	Value	Notes
Vector Generation	`<1ms`	DSP v2.0, zero ML dependencies
Memory Footprint	`75% reduced`	Int8 quantization vs float32
Metadata Size	`90% smaller`	Gravitas-Pack compression
Test Suite	`360 tests`	All passing (pytest)
Cold Start	`3 seconds`	Docker container ready

Real-World Benchmarks

Environment: Docker on Apple M1 Mac, 16GB RAM
Document Set: 500 PDFs, ~2GB total

Health Check:           8ms
Search (cache hit):     9ms
Search (cache miss):    1,250ms  (includes Gemini API call)
Batch Search (10):      2,500ms  (parallel processing)
Upload (50MB file):     3,200ms  (with indexing)

Architecture 🏗️

┌─────────────────┐
│  Your Documents │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────────┐
│                  REST API Layer                      │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │   Upload     │  │    Search    │  │   Admin   │ │
│  │   Endpoint   │  │   Endpoint   │  │ Dashboard │ │
│  └──────┬───────┘  └──────┬───────┘  └─────┬─────┘ │
└─────────┼──────────────────┼─────────────────┼──────┘
          │                  │                 │
          ▼                  ▼                 ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────┐
│  File Parser     │  │ Semantic Search  │  │  Metrics │
│  (PDF/DOCX/TXT)  │  │  DSP v2.0       │  │  Logger  │
└────────┬─────────┘  └────────┬─────────┘  └──────────┘
         │                     │
         ▼                     ▼
┌──────────────────┐  ┌──────────────────┐
│  Store Manager   │  │  Gemini API      │
│  (SQLite + Vec)  │  │  (Reasoning)     │
└────────┬─────────┘  └──────────────────┘
         │
         ▼
┌──────────────────┐
│  Redis Cache     │
│  (Optional)      │
└──────────────────┘

Security 🔒

FLAMEHAVEN takes security seriously:

✅ API Key Hashing - SHA256 with salt
✅ Rate Limiting - Per-key quotas (default: 100/min)
✅ Permission System - Granular access control
✅ Audit Logging - Complete request history
✅ OWASP Headers - Security headers enabled by default
✅ Input Validation - Strict file type and size checks

Security Best Practices

# Use strong admin keys
export FLAMEHAVEN_ADMIN_KEY=$(openssl rand -base64 32)

# Enable HTTPS in production
# (use nginx/traefik as reverse proxy)

# Rotate API keys regularly
curl -X DELETE http://localhost:8000/api/admin/keys/old_key_id \
  -H "X-Admin-Key: $FLAMEHAVEN_ADMIN_KEY"

Roadmap 🗺️

Full roadmap lives in ROADMAP.md. Summary below:

v1.4.x (Completed)

Multimodal search (image + text)
HNSW vector indexing for faster search
OAuth2/OIDC integration
PostgreSQL backend option (metadata + vector store)
Usage-budget controls and reporting (v1.4.1)
pgvector tuning and reliability hardening (v1.4.1)
Code quality audit + CI/CD ruff integration (v1.4.2)

v2.0.0 (Q3 2026)

XLSX, PPTX, RTF format support (shipped in v1.4.x)
WebSocket streaming for real-time results (shipped in v1.4.x)
Multi-language support (15+ languages) — multilingual stopwords + jieba partial
Kubernetes Helm charts
Distributed indexing

Community Requests

See ROADMAP.md for backlog curation and request intake.

Troubleshooting 🐛

❌ 401 Unauthorized Error

Problem: API returns 401 when making requests.

Solutions:

Verify FLAMEHAVEN_ADMIN_KEY environment variable is set
Check Authorization: Bearer sk_live_... header format
Ensure API key hasn't expired (check admin dashboard)

# Debug: Check if admin key is set
echo $FLAMEHAVEN_ADMIN_KEY

# Regenerate API key
curl -X POST http://localhost:8000/api/admin/keys \
  -H "X-Admin-Key: $FLAMEHAVEN_ADMIN_KEY" \
  -d '{"name":"debug","permissions":["search"]}'

🐌 Slow Search Performance

Problem: Searches taking >5 seconds.

Solutions:

Check cache hit rate: FLAMEHAVEN_METRICS_ENABLED=1 curl http://localhost:8000/metrics
Enable Redis for distributed caching
Verify Gemini API latency (should be <1.5s)

# Enable Redis caching
docker run -d --name redis redis:7-alpine
export REDIS_HOST=localhost

💾 High Memory Usage

Problem: Container using >2GB RAM.

Solutions:

Enable Redis with LRU eviction policy
Reduce max file size in config
Monitor with Prometheus endpoint

# Configure Redis memory limit
docker run -d \
  -p 6379:6379 \
  redis:7-alpine \
  --maxmemory 512mb \
  --maxmemory-policy allkeys-lru

More solutions in our Wiki Troubleshooting Guide.

Documentation 📚

Documentation Hub

Use the links below to jump to the most relevant guide.

Topic	Description
Document Parsing	Supported formats, internal parsers, RAG chunking
Framework Integrations	LangChain, LlamaIndex, Haystack, CrewAI adapters
API Reference	REST endpoints, payloads, rate limits
Architecture	How all layers fit together (v1.5.2)
Configuration Reference	Full list of environment variables and config fields
Production Deployment	Docker, systemd, reverse proxy, scaling tips
Troubleshooting	Step-by-step debugging playbook
Benchmarks	Performance measurements and methodology

These Markdown files live inside the repository so they stay versioned alongside the code. Feel free to contribute improvements via pull requests.

Additional Resources

Interactive API Docs - OpenAPI/Swagger interface (when server is running)
CHANGELOG - Version history and breaking changes
CONTRIBUTING - How to contribute code
Examples - Sample integrations and use cases

Contributing 🤝

We love contributions! FLAMEHAVEN is better because of developers like you.

Good First Issues

🟢 [Easy] Add dark mode to admin dashboard (1-2 hours)
🟡 [Medium] PostgreSQL backend for usage tracker (multi-instance deployments)
🔴 [Advanced] Kubernetes Helm charts for production deployment

See CONTRIBUTING.md for development setup and guidelines.

Contributors

Community & Support 💬

💬 Discussions: GitHub Discussions
🐛 Bug Reports: GitHub Issues
🔒 Security: security@flamehaven.space
📧 General: info@flamehaven.space

License 📄

Distributed under the MIT License. See LICENSE for more information.

🙏 Acknowledgments

Built with amazing open source tools:

FastAPI - Modern Python web framework
Google Gemini - Semantic understanding and reasoning
SQLite - Lightweight, embedded database
Redis - In-memory caching (optional)

⭐ Star us on GitHub • 📖 Read the Docs • 🚀 Deploy Now

Built with 🔥 by the Flamehaven Core Team

Last updated: April 16, 2026 • Version 1.4.2

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

flamehaven

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.6.2

Apr 23, 2026

1.6.1

Apr 19, 2026

1.5.3

Apr 19, 2026

This version

1.5.2

Apr 19, 2026

1.5.1

Apr 19, 2026

1.5.0

Apr 19, 2026

1.4.1

Apr 16, 2026

1.4.0

Dec 28, 2025

1.3.1

Dec 16, 2025

1.2.2

Dec 9, 2025

1.2.1

Nov 28, 2025

1.2.0

Nov 16, 2025

1.1.0

Nov 13, 2025

1.0.0

Nov 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flamehaven_filesearch-1.5.2.tar.gz (115.7 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flamehaven_filesearch-1.5.2-py3-none-any.whl (123.6 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file flamehaven_filesearch-1.5.2.tar.gz.

File metadata

Download URL: flamehaven_filesearch-1.5.2.tar.gz
Upload date: Apr 19, 2026
Size: 115.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flamehaven_filesearch-1.5.2.tar.gz
Algorithm	Hash digest
SHA256	`3998f4de0e4728b8ed893022700caf94598df7b740876ebf1e0438b31c9f9cc7`
MD5	`3e3e03bd8778e33999c176b3f43eb524`
BLAKE2b-256	`345534dc4dcecb4e6c5e0e290e5a1ab7510da5c74396cf287aa31a10c16fa212`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flamehaven_filesearch-1.5.2.tar.gz:

Publisher: publish.yml on flamehaven01/Flamehaven-Filesearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flamehaven_filesearch-1.5.2.tar.gz
- Subject digest: 3998f4de0e4728b8ed893022700caf94598df7b740876ebf1e0438b31c9f9cc7
- Sigstore transparency entry: 1340167010
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: flamehaven01/Flamehaven-Filesearch@6c34c59534ce48dba1beea4f99dcb14524a95366
- Branch / Tag: refs/tags/v1.5.2
- Owner: https://github.com/flamehaven01
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c34c59534ce48dba1beea4f99dcb14524a95366
- Trigger Event: release

File details

Details for the file flamehaven_filesearch-1.5.2-py3-none-any.whl.

File metadata

Download URL: flamehaven_filesearch-1.5.2-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 123.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flamehaven_filesearch-1.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29a90384c4af475bfa3242cf12fc6f998871b070fd677fa4206eb20522263cee`
MD5	`186f72be04f8ad3a5447ce2fcd766bb4`
BLAKE2b-256	`fbb557b2b449d50b51eb881a25b7a5c7927d52c26aace43aba586d23b46eebad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flamehaven_filesearch-1.5.2-py3-none-any.whl:

Publisher: publish.yml on flamehaven01/Flamehaven-Filesearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flamehaven_filesearch-1.5.2-py3-none-any.whl
- Subject digest: 29a90384c4af475bfa3242cf12fc6f998871b070fd677fa4206eb20522263cee
- Sigstore transparency entry: 1340167012
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: flamehaven01/Flamehaven-Filesearch@6c34c59534ce48dba1beea4f99dcb14524a95366
- Branch / Tag: refs/tags/v1.5.2
- Owner: https://github.com/flamehaven01
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c34c59534ce48dba1beea4f99dcb14524a95366
- Trigger Event: release

flamehaven-filesearch 1.5.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FLAMEHAVEN FileSearch

Self-hosted RAG search engine. Production-ready in 3 minutes.

🎯 Why FLAMEHAVEN?

🚀 Fast

🔒 Private

💰 Cost-Effective

Features ✨

Core Capabilities

What's New in v1.5.2 (patch)

What's New in v1.5.1 (patch)

What's New in v1.5.0

Features in v1.4.2

Features in v1.4.1

Production Features (v1.4.0+)

Enterprise Features (v1.2.2+)

Quick Start 🚀

Option 1: Docker (Recommended)

Option 2: Python SDK

Option 3: REST API

📦 Installation

Framework Integrations

Configuration ⚙️

Required Environment Variables

Optional Configuration

Advanced Configuration

📊 Performance

Real-World Benchmarks

Architecture 🏗️

Security 🔒

Security Best Practices

Roadmap 🗺️

v1.4.x (Completed)

v2.0.0 (Q3 2026)

Community Requests

Troubleshooting 🐛

Documentation 📚

Documentation Hub

Additional Resources

Contributing 🤝

Good First Issues

Contributors

Community & Support 💬

License 📄

🙏 Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance