Skip to main content

A minimal Python library for Retrieval-Augmented Generation with codebase indexing and multiple vector store backends

Project description

Tinyrag Logo

TinyRag ๐Ÿš€

PyPI version Python 3.7+ License: MIT Documentation PyPI Downloads

A lightweight, powerful Python library for Retrieval-Augmented Generation (RAG) that works locally without API keys. Features advanced codebase indexing, multiple document formats, and flexible vector storage backends.

๐ŸŽฏ Perfect for developers who need RAG capabilities without complexity or mandatory cloud dependencies.

๐ŸŒŸ Key Features

๐Ÿš€ Works Locally - No API Keys Required

  • ๐Ÿง  Local Embeddings: Uses all-MiniLM-L6-v2 by default
  • ๐Ÿ” Direct Search: Query documents without LLM costs
  • โšก Zero Setup: Works immediately after installation

๐Ÿ“š Advanced Document Processing

  • ๐Ÿ“„ Multi-Format: PDF, DOCX, CSV, TXT, and raw text
  • ๐Ÿ’ป Code Intelligence: Function-level indexing for 7+ programming languages
  • ๐Ÿงต Multithreading: Parallel processing for faster indexing
  • ๐Ÿ“Š Chunking Strategies: Smart text segmentation

๐Ÿ—„๏ธ Flexible Storage Options

  • ๐Ÿ”Œ Multiple Backends: Memory, Pickle, Faiss, ChromaDB
  • ๐Ÿ’พ Persistence: Automatic or manual data saving
  • โšก Performance: Choose speed vs. memory trade-offs
  • ๐Ÿ”ง Configuration: Customizable for any use case

๐Ÿ’ฌ Optional AI Integration

  • ๐Ÿค– Custom System Prompts: Tailor AI behavior for your domain
  • ๐Ÿ”— Provider Support: OpenAI, Azure, Anthropic, local models
  • ๐Ÿ’ฐ Cost Control: Use only when needed
  • ๐ŸŽฏ RAG-Powered Chat: Contextual AI responses

๐Ÿš€ Quick Start

๐Ÿ’ก New to TinyRag? Check out our comprehensive ๐Ÿ“– Documentation with step-by-step guides!

Installation

# Basic installation
pip install tinyrag

# With all optional dependencies
pip install tinyrag[all]

# Specific vector stores
pip install tinyrag[faiss]    # High performance
pip install tinyrag[chroma]   # Persistent storage
pip install tinyrag[docs]     # Document processing

Usage Examples

๐Ÿƒโ€โ™‚๏ธ 30-Second Example (No API Key Required)

from tinyrag import TinyRag

# 1. Create TinyRag instance
rag = TinyRag()

# 2. Add your content  
rag.add_documents([
    "TinyRag makes RAG simple and powerful.",
    "docs/user_guide.pdf",
    "research_papers/"
])

# 3. Search your content
results = rag.query("How does TinyRag work?", k=3)
for text, score in results:
    print(f"Score: {score:.2f} - {text[:100]}...")

Output:

Score: 0.89 - TinyRag makes RAG simple and powerful.
Score: 0.76 - TinyRag is a lightweight Python library for...
Score: 0.72 - The system processes documents using semantic...

๐Ÿค– AI-Powered Chat (Optional)

from tinyrag import Provider, TinyRag

# Set up AI provider
provider = Provider(
    api_key="sk-your-openai-key",
    model="gpt-4"
)

# Create smart assistant
rag = TinyRag(
    provider=provider,
    system_prompt="You are a helpful technical assistant."
)

# Add knowledge base
rag.add_documents(["technical_docs/", "api_guides/"])
rag.add_codebase("src/")  # Index your codebase

# Get intelligent answers
response = rag.chat("How do I implement user authentication?")
print(response)
# AI response based on your specific docs and code!

๐Ÿ“– Complete Documentation

๐Ÿ“š Full Documentation - Comprehensive guides from beginner to expert

๐Ÿš€ Getting Started

๐Ÿ”ง Core Features

๐Ÿค– AI Integration


๐Ÿ”ง Core API Reference

Provider Class

from tinyrag import Provider

# ๐Ÿ†“ No API key needed - works locally
provider = Provider(embedding_model="default")

# ๐Ÿค– With AI capabilities
provider = Provider(
    api_key="sk-your-key",
    model="gpt-4",                           # GPT-4, GPT-3.5, local models
    embedding_model="text-embedding-ada-002", # or "default" for local
    base_url="https://api.openai.com/v1"     # OpenAI, Azure, custom
)

TinyRag Class

from tinyrag import TinyRag

# ๐ŸŽ›๏ธ Choose your vector store
rag = TinyRag(
    provider=provider,               # Optional: for AI chat
    vector_store="faiss",           # memory, pickle, faiss, chromadb
    chunk_size=500,                 # Text chunk size
    max_workers=4,                  # Parallel processing
    system_prompt="Custom prompt"   # AI behavior
)

๐Ÿ—„๏ธ Vector Store Comparison

Store Performance Persistence Memory Dependencies Best For
Memory โšก Fast โŒ None ๐Ÿ“ˆ High โœ… None Development, testing
Pickle ๐ŸŒ Fair ๐Ÿ’พ Manual ๐Ÿ“Š Medium โœ… Minimal Simple projects
Faiss ๐Ÿš€ Excellent ๐Ÿ’พ Manual ๐Ÿ“‰ Low ๐Ÿ“ฆ faiss-cpu Large datasets, speed
ChromaDB โšก Good ๐Ÿ”„ Auto ๐Ÿ“Š Medium ๐Ÿ“ฆ chromadb Production, features

๐Ÿ’ก Recommendation: Start with memory for development, use faiss for production performance.

๐Ÿ”ง Essential Methods

# ๐Ÿ“„ Document Management
rag.add_documents(["file.pdf", "text"])   # Add any documents
rag.add_codebase("src/")                   # Index code functions
rag.clear_documents()                      # Reset everything

# ๐Ÿ” Search & Query (No AI needed)
results = rag.query("search term", k=5)   # Find similar content
code = rag.query("auth function")          # Search code too

# ๐Ÿค– AI Chat (Optional)
response = rag.chat("Explain this code")   # Get AI answers
rag.set_system_prompt("Be helpful")        # Customize AI

# ๐Ÿ’พ Persistence
rag.save_vector_store("my_data.pkl")       # Save your work
rag.load_vector_store("my_data.pkl")       # Load it back

๐Ÿ“– Complete API Reference - Full method documentation

๐Ÿ’ป Code Intelligence

TinyRag indexes your codebase at the function level for intelligent code search:

๐ŸŒ Supported Languages

Language Extensions Detection
Python .py def function_name
JavaScript .js, .ts function name(), const name =
Java .java public/private type name()
C/C++ .c, .cpp, .h return_type function_name()
Go .go func functionName()
Rust .rs fn function_name()
PHP .php function functionName()

๐Ÿ” Code Search Examples

# Index your entire project
rag.add_codebase("my_app/")

# Find authentication code
auth_code = rag.query("user authentication login")

# Database functions
db_code = rag.query("database query SELECT")

# API endpoints
api_code = rag.query("REST API endpoint")

# Get AI explanations (with API key)
response = rag.chat("How does user authentication work?")
# AI analyzes your actual code and explains it!

๐Ÿ’ก Learn More - Advanced code search techniques

โš™๏ธ Configuration Examples

๐Ÿš€ Performance Optimized

# Large datasets, maximum speed
rag = TinyRag(
    vector_store="faiss",
    chunk_size=800,
    max_workers=8  # Parallel processing
)

๐Ÿ’พ Production Setup

# Persistent, multi-user ready
rag = TinyRag(
    provider=provider,
    vector_store="chromadb",
    vector_store_config={
        "collection_name": "company_docs",
        "persist_directory": "/data/vectors/"
    }
)

๐Ÿค– Custom AI Assistant

# Domain-specific AI behavior
rag = TinyRag(
    provider=provider,
    system_prompt="""You are a senior software engineer.
    Provide detailed technical explanations with code examples."""
)

๐Ÿ”ง Full Configuration Guide - All options explained

๐Ÿ“ฆ Installation

๐ŸŽฏ Choose Your Setup

# ๐Ÿš€ Quick start (works immediately)
pip install tinyrag

# โšก High performance (recommended)
pip install tinyrag[faiss]

# ๐Ÿ“„ Document processing (PDF, DOCX)
pip install tinyrag[docs]

# ๐Ÿ—„๏ธ Production database
pip install tinyrag[chroma]

# ๐ŸŽ Everything included
pip install tinyrag[all]

๐Ÿ”ง What Each Option Includes

Option Includes Use Case
Base Memory store, local embeddings Development, testing
[faiss] + High-performance search Large datasets
[docs] + PDF/DOCX processing Document analysis
[chroma] + Persistent database Production apps
[all] + Everything Full features

๐Ÿ’ก Installation Guide - Detailed setup instructions

๐ŸŽฏ Real-World Use Cases

๐Ÿข Business Applications

  • ๐Ÿ“‹ Customer Support: Query company docs and policies
  • ๐Ÿ“š Knowledge Management: Searchable internal documentation
  • ๐Ÿ” Research Tools: Semantic search through research papers
  • ๐Ÿ“Š Report Analysis: Find insights across business reports

๐Ÿ‘จโ€๐Ÿ’ป Developer Tools

  • ๐Ÿ”ง Code Documentation: Auto-generate code explanations
  • ๐Ÿ” Legacy Code Explorer: Understand large codebases
  • ๐Ÿ“– API Assistant: Query technical documentation
  • ๐Ÿงช Testing Helper: Find relevant test patterns

๐ŸŽ“ Educational & Research

  • ๐Ÿ“š Study Assistant: Query textbooks and notes
  • ๐Ÿ“ Writing Helper: Research paper analysis
  • ๐Ÿง  Learning Companion: Personalized explanations
  • ๐Ÿ“Š Data Analysis: Explore datasets semantically

๐Ÿ’ก See Complete Examples - Production-ready applications


๐Ÿ› ๏ธ Contributing

We welcome contributions! Here's how to get started:

# 1. Fork and clone
git clone https://github.com/Kenosis01/TinyRag.git
cd TinyRag

# 2. Install development dependencies  
pip install -e ".[all,dev]"

# 3. Run tests
python -m pytest

# 4. Make your changes and submit a PR!

๐Ÿ“‹ Development Setup

  • Python 3.7+ required
  • Core dependencies: sentence-transformers, requests, numpy
  • Optional: faiss-cpu, chromadb, PyPDF2, python-docx

๐Ÿ”ง Development Guide - Detailed contributor guidelines

๐Ÿค Community & Support

๐Ÿ“ž Get Help

๐ŸŽ‰ Show Your Support

  • โญ Star this repo if TinyRag helps you!
  • ๐Ÿฆ Share on Twitter - spread the word
  • โ˜• Buy me a coffee - support development
  • ๐Ÿค Contribute - help make TinyRag better

๐Ÿ“„ License

MIT License - see LICENSE for details.


๐Ÿš€ TinyRag - Making RAG Simple, Powerful, and Accessible! ๐Ÿš€

Build intelligent search and Q&A systems in minutes, not hours

GitHub stars PyPI downloads GitHub last commit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyrag-0.3.5.tar.gz (102.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinyrag-0.3.5-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file tinyrag-0.3.5.tar.gz.

File metadata

  • Download URL: tinyrag-0.3.5.tar.gz
  • Upload date:
  • Size: 102.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for tinyrag-0.3.5.tar.gz
Algorithm Hash digest
SHA256 c110ad632982c834f10eaffaddae9e52e9ecc60f875429f9fe00f186be240326
MD5 756aa7de32c1f4fcace3acbdf3d022b6
BLAKE2b-256 385ae514d4e3dd7737e5e09adbb3b2a8cfc04ead1768bf2d2a5becebd93e4444

See more details on using hashes here.

File details

Details for the file tinyrag-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: tinyrag-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for tinyrag-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d1db964d40b5d73fa400c0dfd51dee5b4fa3e8d770d3eb5e8fea01d808173297
MD5 dfb1d0b39b076867cf0b1a9fd3eaee94
BLAKE2b-256 e1364819a40f378b9d63708c339e04b452b1e690dc4812a03fa6edd64afa3de6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page