Open-source RAG infrastructure toolkit with pluggable drivers for storage, vector databases, embeddings, and document processing
Project description
FLTR - Open-Source RAG Infrastructure Toolkit
FLTR is a production-ready, open-source toolkit for building Retrieval-Augmented Generation (RAG) applications with pluggable drivers for storage, vector databases, embeddings, and document processing.
๐ฏ Why FLTR?
- ๐ Pluggable Architecture: Swap storage backends, vector databases, and embedding providers without changing your code
- โก Production-Tested: Extracted from tryfltr.com - battle-tested in production
- ๐งช Fully Tested: 79+ tests with 53%+ coverage
- ๐ฆ Zero Lock-In: Abstract drivers mean you're never locked into a single provider
- ๐ Async-First: Built with modern async/await patterns for high performance
- ๐จ Type-Safe: Full type hints and Pydantic validation
๐ Quick Start
Installation
# Install core library
pip install fltr
# Install with specific providers
pip install fltr[storage-s3,vectorstore-milvus,embeddings-openai]
# Install everything
pip install fltr[all]
Basic Usage
import asyncio
from fltr.drivers.storage import LocalFileDriver
from fltr.drivers.vectorstore import MilvusDriver
from fltr.drivers.embeddings import OpenAIEmbeddingProvider
async def main():
# Initialize drivers
storage = LocalFileDriver(base_path="./data")
vectorstore = MilvusDriver(uri="./milvus.db") # Milvus Lite for local dev
embeddings = OpenAIEmbeddingProvider(api_key="sk-...")
# Upload a document
await storage.upload(
key="documents/readme.txt",
data=b"FLTR is awesome!",
content_type="text/plain"
)
# Generate embeddings
text = "What is FLTR?"
embedding = await embeddings.embed_text(text)
# Create collection and insert
await vectorstore.create_collection("docs", dimension=1536)
await vectorstore.insert(
collection_name="docs",
vectors=[embedding],
texts=[text],
metadata=[{"source": "readme.txt"}]
)
# Search
results = await vectorstore.search(
collection_name="docs",
query_vector=embedding,
limit=5
)
print(results)
asyncio.run(main())
๐ Available Drivers
Storage Drivers
| Driver | Install | Use Case |
|---|---|---|
| LocalFileDriver | pip install fltr |
Local development, testing |
| S3StorageDriver | pip install fltr[storage-s3] |
AWS S3, MinIO, DigitalOcean Spaces |
| R2StorageDriver | pip install fltr[storage-r2] |
Cloudflare R2 (S3-compatible) |
Vector Store Drivers
| Driver | Install | Use Case |
|---|---|---|
| MilvusDriver | pip install fltr[vectorstore-milvus] |
Milvus Lite (local) or Milvus Cloud/Zilliz |
Embedding Providers
| Provider | Install | Models |
|---|---|---|
| OpenAI | pip install fltr[embeddings-openai] |
text-embedding-3-small, text-embedding-3-large |
| Cohere | pip install fltr[embeddings-cohere] |
embed-english-v3.0, embed-multilingual-v3.0 |
| Voyage AI | pip install fltr[embeddings-voyageai] |
voyage-3, voyage-code-2, voyage-law-2 |
๐ง Configuration
All drivers support both direct initialization and environment-based configuration:
Environment Variables
# Storage (S3/R2)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export S3_BUCKET="my-bucket"
export S3_REGION="us-east-1"
# Vector Store (Milvus)
export MILVUS_URI="https://your-milvus-instance.com"
export MILVUS_TOKEN="your-token"
export VECTOR_METRIC_TYPE="COSINE"
# Embeddings (OpenAI)
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
export OPENAI_BATCH_SIZE="100"
Factory Methods
# Create from environment variables
storage = S3StorageDriver.from_env()
vectorstore = MilvusDriver.from_env()
embeddings = OpenAIEmbeddingProvider.from_env()
๐จ Advanced Features
Retry Logic
All embedding providers include automatic retry logic with exponential backoff:
embeddings = OpenAIEmbeddingProvider(
api_key="sk-...",
max_retries=5,
retry_min_wait=1,
retry_max_wait=60
)
Batch Processing
Efficient batching for embedding large datasets:
texts = ["document 1", "document 2", ...] # 1000s of texts
embeddings_list = await embeddings_provider.embed_batch(
texts=texts,
batch_size=100 # Process 100 at a time
)
Metadata Filtering
Powerful metadata filtering in vector search:
results = await vectorstore.search(
collection_name="docs",
query_vector=embedding,
filter_expr='dataset_id == "my-dataset" && chunk_type == "text"',
limit=10
)
Input Types (Cohere & Voyage AI)
Optimize embeddings for different use cases:
# Cohere
cohere_embeddings = CohereEmbeddingProvider(api_key="...")
# For indexing documents
doc_embeddings = await cohere_embeddings.embed_batch(
texts=documents,
input_type="search_document"
)
# For search queries
query_embedding = await cohere_embeddings.embed_text(
text="What is RAG?",
input_type="search_query"
)
# Voyage AI - Domain-specific models
voyage_embeddings = VoyageAIEmbeddingProvider(
api_key="...",
model="voyage-law-2" # Optimized for legal text
)
๐๏ธ Architecture
FLTR follows a clean architecture with abstract base classes:
fltr/
โโโ drivers/
โ โโโ storage/
โ โ โโโ base.py # StorageDriver abstract class
โ โ โโโ local.py # Local filesystem
โ โ โโโ s3.py # AWS S3
โ โ โโโ r2.py # Cloudflare R2
โ โโโ vectorstore/
โ โ โโโ base.py # VectorStoreDriver abstract class
โ โ โโโ milvus.py # Milvus implementation
โ โโโ embeddings/
โ โโโ base.py # EmbeddingProvider abstract class
โ โโโ openai.py # OpenAI
โ โโโ cohere.py # Cohere
โ โโโ voyageai.py # Voyage AI
โโโ config/
โ โโโ schema.py # Pydantic configuration schemas
โโโ parsers/
โโโ base.py # DocumentParser abstract class
โโโ registry.py # Parser discovery system
๐งช Testing
FLTR includes comprehensive tests:
# Run all tests
pytest
# Run with coverage
pytest --cov=fltr --cov-report=html
# Run specific test file
pytest tests/test_embeddings.py -v
๐ค Contributing
We welcome contributions! FLTR is extracted from production code at tryfltr.com.
Adding a New Driver
- Inherit from the appropriate base class
- Implement all abstract methods
- Add comprehensive tests
- Add to
pyproject.tomlentry points - Submit a PR
Example:
from fltr.drivers.embeddings.base import EmbeddingProvider
class MyEmbeddingProvider(EmbeddingProvider):
async def embed_text(self, text: str) -> list[float]:
# Your implementation
pass
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
# Your implementation
pass
def get_dimension(self) -> int:
return 1536
๐ Documentation
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
FLTR is built and maintained by the team at tryfltr.com. We're on a mission to make RAG infrastructure accessible to everyone.
Special thanks to:
- Milvus for the excellent vector database
- OpenAI, Cohere, and Voyage AI for embedding APIs
- The Python community for amazing tools and libraries
๐ Links
- Website: tryfltr.com
- Documentation: docs.tryfltr.com
- GitHub: github.com/tryfltr/fltr
- PyPI: pypi.org/project/fltr
- Discord: Join our community
Built with โค๏ธ by the FLTR team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fltr_core-0.1.1.tar.gz.
File metadata
- Download URL: fltr_core-0.1.1.tar.gz
- Upload date:
- Size: 60.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5756a30744e2afbfb4ece20170f24d3b6ed282186d6f854dd20fdc34109db8f
|
|
| MD5 |
0f53570d680ed70c7e10bc20d78e1aea
|
|
| BLAKE2b-256 |
da5446651ef4b3c021499642d513d13b75c113f8f2c1d6a9a366fbb1524ec712
|
File details
Details for the file fltr_core-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fltr_core-0.1.1-py3-none-any.whl
- Upload date:
- Size: 72.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dd4ad40e95241e1972d0d5f6d0a49dece845ca5b127ef90fe9e8b263e8fa983
|
|
| MD5 |
1b282b0df1c8019adf5b89d304c8089d
|
|
| BLAKE2b-256 |
d913b382afb3c1050a152510bc21f1ce35e769efbccc7ad53979b1c7cba9115a
|