Skip to main content

Open-source RAG infrastructure toolkit with pluggable drivers for storage, vector databases, embeddings, and document processing

Project description

FLTR - Open-Source RAG Infrastructure Toolkit

PyPI version Python 3.10+ License: MIT Tests

FLTR is a production-ready, open-source toolkit for building Retrieval-Augmented Generation (RAG) applications with pluggable drivers for storage, vector databases, embeddings, and document processing.

๐ŸŽฏ Why FLTR?

  • ๐Ÿ”Œ Pluggable Architecture: Swap storage backends, vector databases, and embedding providers without changing your code
  • โšก Production-Tested: Extracted from tryfltr.com - battle-tested in production
  • ๐Ÿงช Fully Tested: 79+ tests with 53%+ coverage
  • ๐Ÿ“ฆ Zero Lock-In: Abstract drivers mean you're never locked into a single provider
  • ๐Ÿš€ Async-First: Built with modern async/await patterns for high performance
  • ๐ŸŽจ Type-Safe: Full type hints and Pydantic validation

๐Ÿš€ Quick Start

Installation

# Install core library
pip install fltr

# Install with specific providers
pip install fltr[storage-s3,vectorstore-milvus,embeddings-openai]

# Install everything
pip install fltr[all]

Basic Usage

import asyncio
from fltr.drivers.storage import LocalFileDriver
from fltr.drivers.vectorstore import MilvusDriver
from fltr.drivers.embeddings import OpenAIEmbeddingProvider

async def main():
    # Initialize drivers
    storage = LocalFileDriver(base_path="./data")
    vectorstore = MilvusDriver(uri="./milvus.db")  # Milvus Lite for local dev
    embeddings = OpenAIEmbeddingProvider(api_key="sk-...")

    # Upload a document
    await storage.upload(
        key="documents/readme.txt",
        data=b"FLTR is awesome!",
        content_type="text/plain"
    )

    # Generate embeddings
    text = "What is FLTR?"
    embedding = await embeddings.embed_text(text)

    # Create collection and insert
    await vectorstore.create_collection("docs", dimension=1536)
    await vectorstore.insert(
        collection_name="docs",
        vectors=[embedding],
        texts=[text],
        metadata=[{"source": "readme.txt"}]
    )

    # Search
    results = await vectorstore.search(
        collection_name="docs",
        query_vector=embedding,
        limit=5
    )

    print(results)

asyncio.run(main())

๐Ÿ“š Available Drivers

Storage Drivers

Driver Install Use Case
LocalFileDriver pip install fltr Local development, testing
S3StorageDriver pip install fltr[storage-s3] AWS S3, MinIO, DigitalOcean Spaces
R2StorageDriver pip install fltr[storage-r2] Cloudflare R2 (S3-compatible)

Vector Store Drivers

Driver Install Use Case
MilvusDriver pip install fltr[vectorstore-milvus] Milvus Lite (local) or Milvus Cloud/Zilliz

Embedding Providers

Provider Install Models
OpenAI pip install fltr[embeddings-openai] text-embedding-3-small, text-embedding-3-large
Cohere pip install fltr[embeddings-cohere] embed-english-v3.0, embed-multilingual-v3.0
Voyage AI pip install fltr[embeddings-voyageai] voyage-3, voyage-code-2, voyage-law-2

๐Ÿ”ง Configuration

All drivers support both direct initialization and environment-based configuration:

Environment Variables

# Storage (S3/R2)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export S3_BUCKET="my-bucket"
export S3_REGION="us-east-1"

# Vector Store (Milvus)
export MILVUS_URI="https://your-milvus-instance.com"
export MILVUS_TOKEN="your-token"
export VECTOR_METRIC_TYPE="COSINE"

# Embeddings (OpenAI)
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
export OPENAI_BATCH_SIZE="100"

Factory Methods

# Create from environment variables
storage = S3StorageDriver.from_env()
vectorstore = MilvusDriver.from_env()
embeddings = OpenAIEmbeddingProvider.from_env()

๐ŸŽจ Advanced Features

Retry Logic

All embedding providers include automatic retry logic with exponential backoff:

embeddings = OpenAIEmbeddingProvider(
    api_key="sk-...",
    max_retries=5,
    retry_min_wait=1,
    retry_max_wait=60
)

Batch Processing

Efficient batching for embedding large datasets:

texts = ["document 1", "document 2", ...]  # 1000s of texts
embeddings_list = await embeddings_provider.embed_batch(
    texts=texts,
    batch_size=100  # Process 100 at a time
)

Metadata Filtering

Powerful metadata filtering in vector search:

results = await vectorstore.search(
    collection_name="docs",
    query_vector=embedding,
    filter_expr='dataset_id == "my-dataset" && chunk_type == "text"',
    limit=10
)

Input Types (Cohere & Voyage AI)

Optimize embeddings for different use cases:

# Cohere
cohere_embeddings = CohereEmbeddingProvider(api_key="...")

# For indexing documents
doc_embeddings = await cohere_embeddings.embed_batch(
    texts=documents,
    input_type="search_document"
)

# For search queries
query_embedding = await cohere_embeddings.embed_text(
    text="What is RAG?",
    input_type="search_query"
)

# Voyage AI - Domain-specific models
voyage_embeddings = VoyageAIEmbeddingProvider(
    api_key="...",
    model="voyage-law-2"  # Optimized for legal text
)

๐Ÿ—๏ธ Architecture

FLTR follows a clean architecture with abstract base classes:

fltr/
โ”œโ”€โ”€ drivers/
โ”‚   โ”œโ”€โ”€ storage/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py          # StorageDriver abstract class
โ”‚   โ”‚   โ”œโ”€โ”€ local.py         # Local filesystem
โ”‚   โ”‚   โ”œโ”€โ”€ s3.py            # AWS S3
โ”‚   โ”‚   โ””โ”€โ”€ r2.py            # Cloudflare R2
โ”‚   โ”œโ”€โ”€ vectorstore/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py          # VectorStoreDriver abstract class
โ”‚   โ”‚   โ””โ”€โ”€ milvus.py        # Milvus implementation
โ”‚   โ””โ”€โ”€ embeddings/
โ”‚       โ”œโ”€โ”€ base.py          # EmbeddingProvider abstract class
โ”‚       โ”œโ”€โ”€ openai.py        # OpenAI
โ”‚       โ”œโ”€โ”€ cohere.py        # Cohere
โ”‚       โ””โ”€โ”€ voyageai.py      # Voyage AI
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ schema.py            # Pydantic configuration schemas
โ””โ”€โ”€ parsers/
    โ”œโ”€โ”€ base.py              # DocumentParser abstract class
    โ””โ”€โ”€ registry.py          # Parser discovery system

๐Ÿงช Testing

FLTR includes comprehensive tests:

# Run all tests
pytest

# Run with coverage
pytest --cov=fltr --cov-report=html

# Run specific test file
pytest tests/test_embeddings.py -v

๐Ÿค Contributing

We welcome contributions! FLTR is extracted from production code at tryfltr.com.

Adding a New Driver

  1. Inherit from the appropriate base class
  2. Implement all abstract methods
  3. Add comprehensive tests
  4. Add to pyproject.toml entry points
  5. Submit a PR

Example:

from fltr.drivers.embeddings.base import EmbeddingProvider

class MyEmbeddingProvider(EmbeddingProvider):
    async def embed_text(self, text: str) -> list[float]:
        # Your implementation
        pass

    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        pass

    def get_dimension(self) -> int:
        return 1536

๐Ÿ“– Documentation

๐Ÿ”’ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

FLTR is built and maintained by the team at tryfltr.com. We're on a mission to make RAG infrastructure accessible to everyone.

Special thanks to:

  • Milvus for the excellent vector database
  • OpenAI, Cohere, and Voyage AI for embedding APIs
  • The Python community for amazing tools and libraries

๐Ÿ”— Links


Built with โค๏ธ by the FLTR team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fltr_core-0.1.0.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fltr_core-0.1.0-py3-none-any.whl (72.0 kB view details)

Uploaded Python 3

File details

Details for the file fltr_core-0.1.0.tar.gz.

File metadata

  • Download URL: fltr_core-0.1.0.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for fltr_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9f16b26a20279a164ed314b990bd67b1ba01bc882163d437eeb3f49d69d2c791
MD5 d92916934d620311ff1f71d048f9e492
BLAKE2b-256 f7d7bb9197ccd01f14a36dd46500b5056451e5c969418ce383f3e8c154946084

See more details on using hashes here.

File details

Details for the file fltr_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fltr_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 72.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for fltr_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ad344a28f3fe56a8112de03a71f8787796217a29be59dd86d7bfe15899e2069
MD5 a439767295bb69b011c7de501ca13845
BLAKE2b-256 03ca368b5c870aabed851f6bbede5ecdc4eba7c0db2fcce0bb7d24eb64fe70e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page