A minimal Python library for Retrieval-Augmented Generation with multiple vector store backends
Project description
TinyRag 🚀
A minimal, powerful Python library for Retrieval-Augmented Generation (RAG) with support for multiple document formats and vector storage backends.
🌟 Features
- 🔌 Multiple Vector Stores: Faiss, ChromaDB, In-Memory, Pickle-based
- 📄 Document Support: PDF, DOCX, TXT, and raw text
- 🧠 Flexible Embeddings: Local Sentence Transformers or API-based
- 🔍 Query Without LLM: Direct similarity search functionality
- 💬 LLM Integration: Chat completion with retrieved context
- ⚡ Minimal Dependencies: Core functionality with optional extras
- 🎯 Easy to Use: Simple API with powerful features
🚀 Quick Start
Installation
# Basic installation
pip install tinyrag
# With all optional dependencies
pip install tinyrag[all]
# Specific vector stores
pip install tinyrag[faiss] # High performance
pip install tinyrag[chroma] # Persistent storage
pip install tinyrag[docs] # Document processing
Usage Example
from tinyrag import Provider, TinyRag
provider = Provider(
api_key="sk-xxxxxx",
model="gpt-4",
embedding_model="default",
base_url="https://api.openai.com/v1"
)
rag = TinyRag(provider=provider, vector_store="faiss")
rag.add_documents("path/to/docs_or_raw_text")
response = rag.chat("Summarize the documents.")
print(response)
📖 Documentation
Core Components
Provider Class
Handles API interactions and embeddings:
from tinyrag import Provider
# Local embeddings only (no API key needed)
provider = Provider(embedding_model="default")
# With OpenAI API
provider = Provider(
api_key="sk-your-key",
model="gpt-4",
embedding_model="text-embedding-ada-002",
base_url="https://api.openai.com/v1"
)
TinyRag Class
Main interface for RAG operations:
from tinyrag import TinyRag
# Initialize with different vector stores
rag = TinyRag(provider, vector_store="memory") # No dependencies
rag = TinyRag(provider, vector_store="faiss") # High performance
rag = TinyRag(provider, vector_store="chroma") # Persistent
rag = TinyRag(provider, vector_store="pickle") # Simple file-based
Vector Store Comparison
| Store | Performance | Persistence | Memory | Dependencies | Best For |
|---|---|---|---|---|---|
| Memory | Good | Manual | High | None | Development, small datasets |
| Faiss | Excellent | Manual | Low | faiss-cpu | Large-scale, performance-critical |
| ChromaDB | Good | Automatic | Medium | chromadb | Production, automatic persistence |
| Pickle | Fair | Manual | Medium | scikit-learn | Simple file-based storage |
API Reference
Core Methods
# Document Management
rag.add_documents(data) # Add documents/text
rag.get_chunk_count() # Get number of chunks
rag.get_all_chunks() # Get all text chunks
rag.clear_documents() # Clear all data
# Querying (No LLM)
rag.query(query, k=5, return_scores=True) # Basic similarity search
rag.search_documents(query, k=5, min_score=0.0) # With score filtering
rag.get_similar_chunks(text, k=5) # Find similar to given text
# LLM Integration
rag.chat(query, k=3) # Generate response with context
# Persistence
rag.save_vector_store(filepath) # Save to disk
rag.load_vector_store(filepath) # Load from disk
🔧 Configuration Options
Vector Store Configuration
# Faiss with custom settings
rag = TinyRag(
provider=provider,
vector_store="faiss",
chunk_size=1000, # Larger chunks
vector_store_config={}
)
# ChromaDB with persistence
rag = TinyRag(
provider=provider,
vector_store="chroma",
vector_store_config={
"collection_name": "my_collection",
"persist_directory": "./chroma_db"
}
)
# Memory store (no config needed)
rag = TinyRag(provider=provider, vector_store="memory")
# Pickle store with scikit-learn
rag = TinyRag(provider=provider, vector_store="pickle")
Provider Configuration
# Local embeddings only
provider = Provider(embedding_model="default")
# OpenAI with custom settings
provider = Provider(
api_key="sk-your-key",
model="gpt-3.5-turbo",
embedding_model="text-embedding-ada-002",
base_url="https://api.openai.com/v1"
)
# Custom API endpoint
provider = Provider(
api_key="your-key",
model="custom-model",
base_url="https://your-custom-api.com/v1"
)
📦 Installation Options
# Minimal installation
pip install tinyrag
# With specific vector stores
pip install tinyrag[faiss] # For high-performance similarity search
pip install tinyrag[chroma] # For persistent vector database
pip install tinyrag[pickle] # For simple file-based storage
# With document processing
pip install tinyrag[docs] # PDF and DOCX support
# Everything included
pip install tinyrag[all] # All optional dependencies
🛠️ Development
Requirements
- Python 3.7+
- sentence-transformers (core)
- requests (core)
- numpy (core)
Optional Dependencies
faiss-cpu: High-performance vector searchchromadb: Persistent vector databasescikit-learn: Pickle vector store similarityPyPDF2: PDF document processingpython-docx: Word document processing
Contributing
- Fork the repository: https://github.com/Kenosis01/TinyRag.git
- Create a feature branch:
git checkout -b feature-name - Make your changes and add tests
- Submit a pull request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🤝 Support
- GitHub Issues: Report bugs or request features
- Documentation: Full documentation
- Examples: Check the
examples/directory in the repository
🎯 Use Cases
- Document Q&A: Query your documents without LLM costs
- Knowledge Base: Build searchable knowledge repositories
- Content Discovery: Find similar content in large document collections
- RAG Applications: Full retrieval-augmented generation workflows
- Research Tools: Semantic search through research papers
- Customer Support: Query company documentation and policies
TinyRag - Making RAG simple, powerful, and accessible! 🚀
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinyrag-0.1.0.tar.gz.
File metadata
- Download URL: tinyrag-0.1.0.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe49b5532ab2c1eb130cb51f00839f98bdaa54deb285ffc979c88d78ba37e5c7
|
|
| MD5 |
3729495a7aea50f487d3cbd1e7694429
|
|
| BLAKE2b-256 |
93db36e0aeb297f3b91a1fd73708dc60552a7706d6a9c336bb7674a2037b0bde
|
File details
Details for the file tinyrag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tinyrag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c136381de71cdb809bf90ce91f35bbe2e48dbc0e56614615b9edd7e633e481b
|
|
| MD5 |
0f117c30872d0857129ded6324fbdd4e
|
|
| BLAKE2b-256 |
4d101b960e0735c781ea7495ab53695ce429209141ed28a22430e12974951840
|