Skip to main content

Open-source RAG infrastructure toolkit with pluggable drivers for storage, vector databases, embeddings, and document processing

Project description

FLTR - Open-Source RAG Infrastructure Toolkit

PyPI version Python 3.10+ License: MIT Tests

FLTR is a production-ready, open-source toolkit for building Retrieval-Augmented Generation (RAG) applications with pluggable drivers for storage, vector databases, embeddings, and document processing.

๐ŸŽฏ Why FLTR?

  • ๐Ÿ”Œ Pluggable Architecture: Swap storage backends, vector databases, and embedding providers without changing your code
  • โšก Production-Tested: Extracted from tryfltr.com - battle-tested in production
  • ๐Ÿงช Fully Tested: 79+ tests with 53%+ coverage
  • ๐Ÿ“ฆ Zero Lock-In: Abstract drivers mean you're never locked into a single provider
  • ๐Ÿš€ Async-First: Built with modern async/await patterns for high performance
  • ๐ŸŽจ Type-Safe: Full type hints and Pydantic validation

๐Ÿš€ Quick Start

Installation

# Install core library
pip install fltr

# Install with specific providers
pip install fltr[storage-s3,vectorstore-milvus,embeddings-openai]

# Install everything
pip install fltr[all]

Basic Usage

import asyncio
from fltr.drivers.storage import LocalFileDriver
from fltr.drivers.vectorstore import MilvusDriver
from fltr.drivers.embeddings import OpenAIEmbeddingProvider

async def main():
    # Initialize drivers
    storage = LocalFileDriver(base_path="./data")
    vectorstore = MilvusDriver(uri="./milvus.db")  # Milvus Lite for local dev
    embeddings = OpenAIEmbeddingProvider(api_key="sk-...")

    # Upload a document
    await storage.upload(
        key="documents/readme.txt",
        data=b"FLTR is awesome!",
        content_type="text/plain"
    )

    # Generate embeddings
    text = "What is FLTR?"
    embedding = await embeddings.embed_text(text)

    # Create collection and insert
    await vectorstore.create_collection("docs", dimension=1536)
    await vectorstore.insert(
        collection_name="docs",
        vectors=[embedding],
        texts=[text],
        metadata=[{"source": "readme.txt"}]
    )

    # Search
    results = await vectorstore.search(
        collection_name="docs",
        query_vector=embedding,
        limit=5
    )

    print(results)

asyncio.run(main())

๐Ÿ“š Available Drivers

Storage Drivers

Driver Install Use Case
LocalFileDriver pip install fltr Local development, testing
S3StorageDriver pip install fltr[storage-s3] AWS S3, MinIO, DigitalOcean Spaces
R2StorageDriver pip install fltr[storage-r2] Cloudflare R2 (S3-compatible)

Vector Store Drivers

Driver Install Use Case
MilvusDriver pip install fltr[vectorstore-milvus] Milvus Lite (local) or Milvus Cloud/Zilliz

Embedding Providers

Provider Install Models
OpenAI pip install fltr[embeddings-openai] text-embedding-3-small, text-embedding-3-large
Cohere pip install fltr[embeddings-cohere] embed-english-v3.0, embed-multilingual-v3.0
Voyage AI pip install fltr[embeddings-voyageai] voyage-3, voyage-code-2, voyage-law-2

๐Ÿ”ง Configuration

All drivers support both direct initialization and environment-based configuration:

Environment Variables

# Storage (S3/R2)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export S3_BUCKET="my-bucket"
export S3_REGION="us-east-1"

# Vector Store (Milvus)
export MILVUS_URI="https://your-milvus-instance.com"
export MILVUS_TOKEN="your-token"
export VECTOR_METRIC_TYPE="COSINE"

# Embeddings (OpenAI)
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
export OPENAI_BATCH_SIZE="100"

Factory Methods

# Create from environment variables
storage = S3StorageDriver.from_env()
vectorstore = MilvusDriver.from_env()
embeddings = OpenAIEmbeddingProvider.from_env()

๐ŸŽจ Advanced Features

Retry Logic

All embedding providers include automatic retry logic with exponential backoff:

embeddings = OpenAIEmbeddingProvider(
    api_key="sk-...",
    max_retries=5,
    retry_min_wait=1,
    retry_max_wait=60
)

Batch Processing

Efficient batching for embedding large datasets:

texts = ["document 1", "document 2", ...]  # 1000s of texts
embeddings_list = await embeddings_provider.embed_batch(
    texts=texts,
    batch_size=100  # Process 100 at a time
)

Metadata Filtering

Powerful metadata filtering in vector search:

results = await vectorstore.search(
    collection_name="docs",
    query_vector=embedding,
    filter_expr='dataset_id == "my-dataset" && chunk_type == "text"',
    limit=10
)

Input Types (Cohere & Voyage AI)

Optimize embeddings for different use cases:

# Cohere
cohere_embeddings = CohereEmbeddingProvider(api_key="...")

# For indexing documents
doc_embeddings = await cohere_embeddings.embed_batch(
    texts=documents,
    input_type="search_document"
)

# For search queries
query_embedding = await cohere_embeddings.embed_text(
    text="What is RAG?",
    input_type="search_query"
)

# Voyage AI - Domain-specific models
voyage_embeddings = VoyageAIEmbeddingProvider(
    api_key="...",
    model="voyage-law-2"  # Optimized for legal text
)

๐Ÿ—๏ธ Architecture

FLTR follows a clean architecture with abstract base classes:

fltr/
โ”œโ”€โ”€ drivers/
โ”‚   โ”œโ”€โ”€ storage/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py          # StorageDriver abstract class
โ”‚   โ”‚   โ”œโ”€โ”€ local.py         # Local filesystem
โ”‚   โ”‚   โ”œโ”€โ”€ s3.py            # AWS S3
โ”‚   โ”‚   โ””โ”€โ”€ r2.py            # Cloudflare R2
โ”‚   โ”œโ”€โ”€ vectorstore/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py          # VectorStoreDriver abstract class
โ”‚   โ”‚   โ””โ”€โ”€ milvus.py        # Milvus implementation
โ”‚   โ””โ”€โ”€ embeddings/
โ”‚       โ”œโ”€โ”€ base.py          # EmbeddingProvider abstract class
โ”‚       โ”œโ”€โ”€ openai.py        # OpenAI
โ”‚       โ”œโ”€โ”€ cohere.py        # Cohere
โ”‚       โ””โ”€โ”€ voyageai.py      # Voyage AI
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ schema.py            # Pydantic configuration schemas
โ””โ”€โ”€ parsers/
    โ”œโ”€โ”€ base.py              # DocumentParser abstract class
    โ””โ”€โ”€ registry.py          # Parser discovery system

๐Ÿงช Testing

FLTR includes comprehensive tests:

# Run all tests
pytest

# Run with coverage
pytest --cov=fltr --cov-report=html

# Run specific test file
pytest tests/test_embeddings.py -v

๐Ÿค Contributing

We welcome contributions! FLTR is extracted from production code at tryfltr.com.

Adding a New Driver

  1. Inherit from the appropriate base class
  2. Implement all abstract methods
  3. Add comprehensive tests
  4. Add to pyproject.toml entry points
  5. Submit a PR

Example:

from fltr.drivers.embeddings.base import EmbeddingProvider

class MyEmbeddingProvider(EmbeddingProvider):
    async def embed_text(self, text: str) -> list[float]:
        # Your implementation
        pass

    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        pass

    def get_dimension(self) -> int:
        return 1536

๐Ÿ“– Documentation

๐Ÿ”’ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

FLTR is built and maintained by the team at tryfltr.com. We're on a mission to make RAG infrastructure accessible to everyone.

Special thanks to:

  • Milvus for the excellent vector database
  • OpenAI, Cohere, and Voyage AI for embedding APIs
  • The Python community for amazing tools and libraries

๐Ÿ”— Links


Built with โค๏ธ by the FLTR team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fltr_core-0.1.1.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fltr_core-0.1.1-py3-none-any.whl (72.0 kB view details)

Uploaded Python 3

File details

Details for the file fltr_core-0.1.1.tar.gz.

File metadata

  • Download URL: fltr_core-0.1.1.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for fltr_core-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a5756a30744e2afbfb4ece20170f24d3b6ed282186d6f854dd20fdc34109db8f
MD5 0f53570d680ed70c7e10bc20d78e1aea
BLAKE2b-256 da5446651ef4b3c021499642d513d13b75c113f8f2c1d6a9a366fbb1524ec712

See more details on using hashes here.

File details

Details for the file fltr_core-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fltr_core-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 72.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for fltr_core-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0dd4ad40e95241e1972d0d5f6d0a49dece845ca5b127ef90fe9e8b263e8fa983
MD5 1b282b0df1c8019adf5b89d304c8089d
BLAKE2b-256 d913b382afb3c1050a152510bc21f1ce35e769efbccc7ad53979b1c7cba9115a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page