India's First Multi-Language Voice RAG Framework
Project description
๐ OmniRAG v2.0 - Multi-Language Voice RAG
India's First Multi-Language Voice RAG Framework ๐ฎ๐ณ
Intelligent RAG combining Liquid + Agentic + Chain architectures with unique features:
- ๐ Smart Multi-Language Translation - 27+ languages including Tamil, Hindi
- ๐ค Voice Input & Output - Speak questions, hear answers
- ๐ง Adaptive RAG - Automatically adjusts to user expertise level
๐ What's New in v2.0?
โจ Feature 1: Smart Post-Retrieval Translation
Revolutionary architecture: Documents stay in original language, translation happens AFTER retrieval!
Why this is better:
- โ Better embeddings (preserve semantic meaning)
- โ No storage duplication
- โ One document โ Many output languages
- โ 70% more efficient than traditional approaches
from omnirag import OmniRAG
# Documents in English
rag = OmniRAG(output_language="Tamil")
rag.add_documents(["AI helps solve complex problems."])
# Query in English, get Tamil answer!
result = rag.query("What is AI?")
print(result['answer'])
# Output: "เฎเฏเฎฏเฎฑเฏเฎเฏ เฎจเฏเฎฃเฏเฎฃเฎฑเฎฟเฎตเฏ เฎเฎฟเฎเฏเฎเฎฒเฎพเฎฉ เฎเฎฟเฎเฏเฎเฎฒเฏเฎเฎณเฏเฎคเฏ เฎคเฏเฎฐเฏเฎเฏเฎ เฎเฎคเฎตเฏเฎเฎฟเฎฑเฎคเฏ."
Supported Languages (27+):
- Indian: Tamil, Hindi, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Bengali
- European: Spanish, French, German, Italian, Portuguese, Russian, Polish, Dutch, Turkish
- Asian: Chinese, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay
- Other: Arabic, English
โจ Feature 2: Voice Input & Output
First open-source RAG with built-in voice support!
# Voice output (text-to-speech)
rag = OmniRAG(enable_voice=True, output_language="Tamil")
result = rag.query("What is Python?", speak_answer=True)
# Hears answer in Tamil! ๐
# Voice input (speech-to-text) - requires microphone
result = rag.voice_query()
# Speak your question, hear the answer!
๐ฏ What is OmniRAG?
OmniRAG is an advanced Retrieval-Augmented Generation system that combines three powerful RAG techniques:
๐ Liquid RAG
Automatically adapts answers to user expertise level:
- Beginner: Simple explanations with examples
- Intermediate: Balanced technical content
- Expert: Deep technical details
๐ค Agentic RAG
Intelligently chooses the best information source:
- VectorDB: For local documents
- Web Search: For current information
โ๏ธ Chain RAG
Handles complex multi-part questions:
- Breaks down complex queries
- Answers each part separately
- Synthesizes coherent final answer
โจ All Features
v2.0 NEW:
- ๐ Multi-Language Translation (27+ languages)
- ๐ค Voice Input & Output (speak & hear)
- ๐ค Full Language Names ("Tamil" not "ta")
- ๐ง UTF-8 Support (perfect Tamil/Hindi display)
v1.0 CORE:
- โ PDF Support - Load PDF files directly
- โ Multiple LLM Models - Qwen, Flan-T5, Mistral, Phi-2
- โ FAISS Vector DB - Fast similarity search
- โ Web Search - DuckDuckGo integration (free!)
- โ Smart User Detection - Auto expertise level detection
- โ Query Decomposition - Handles complex questions
- โ Fast Caching - 3x speedup on repeated queries
- โ 100% FREE - No API costs!
- โ Works on CPU - No GPU required
๐ฆ Installation
pip install omnirag
With Voice Input (Optional)
Windows:
pip install pipwin
pipwin install pyaudio
pip install omnirag[voice-input]
Mac:
brew install portaudio
pip install omnirag[voice-input]
Linux:
sudo apt-get install portaudio19-dev
pip install omnirag[voice-input]
From Source
git clone https://github.com/Giri530/omnirag.git
cd omnirag
pip install -e .
๐ Quick Start
Basic Usage
from omnirag import OmniRAG
# Initialize
rag = OmniRAG(model_name="google/flan-t5-small")
# Add documents
rag.add_documents([
"Python is a programming language.",
"It is used for AI and data science."
])
# Query
result = rag.query("What is Python?")
print(result['answer'])
Multi-Language Example
from omnirag import OmniRAG
# Initialize with Spanish output
rag = OmniRAG(
model_name="google/flan-t5-small",
output_language="Spanish" # or "Tamil", "Hindi", etc.
)
# Add English documents
rag.add_documents([
"AI helps solve complex problems.",
"Machine Learning is a subset of AI."
])
# Query in English, get Spanish answer!
result = rag.query("What is AI?")
print(result['answer'])
# Output: "La IA ayuda a resolver problemas complejos."
Voice Example
from omnirag import OmniRAG
# Initialize with voice
rag = OmniRAG(
enable_voice=True,
output_language="Tamil"
)
rag.add_documents(["Python is great for AI."])
# Text input, voice output
result = rag.query("What is Python?", speak_answer=True)
# Hears answer in Tamil! ๐
# Voice input, voice output (requires microphone)
result = rag.voice_query()
# Speak question, hear answer!
๐ก Usage Examples
Load Different File Types
# PDF files
rag.load_from_file("research_paper.pdf")
# Text files
rag.load_from_file("notes.txt")
# JSON data
rag.load_from_file("data.json")
# Entire folder
rag.load_from_folder("./documents")
# With chunking for large files
rag.load_from_file("big_file.pdf", chunk_size=500)
# Direct text
rag.add_documents([
"Python is great for ML.",
"Qwen is a powerful language model."
])
Different Output Languages
# Default: Spanish
rag = OmniRAG(output_language="Spanish")
# Query 1: Spanish (default)
result1 = rag.query("What is AI?")
# Query 2: Override to Tamil
result2 = rag.query("What is ML?", output_language="Tamil")
# Query 3: Override to French
result3 = rag.query("What is DL?", output_language="French")
Full Language Names
# All these work!
rag = OmniRAG(output_language="Spanish") # โ
rag = OmniRAG(output_language="spanish") # โ
rag = OmniRAG(output_language="es") # โ
# Same for all languages
rag = OmniRAG(output_language="Tamil") # โ
rag = OmniRAG(output_language="Hindi") # โ
Complex Queries
# OmniRAG automatically breaks down and answers
result = rag.query("""
Compare Python vs Java for machine learning.
Which is better for beginners?
What are the performance differences?
""")
print(result['answer'])
Enable Web Search
rag = OmniRAG(
model_name="google/flan-t5-small",
enable_web_search=True # Free DuckDuckGo search
)
# Queries about "latest" or "recent" automatically use web
result = rag.query("Latest AI developments in 2025")
๐จ Supported Models
Qwen Models (Recommended!)
# Fast & Efficient
rag = OmniRAG(model_name="Qwen/Qwen2.5-0.5B-Instruct")
# Balanced (Best Choice!)
rag = OmniRAG(model_name="Qwen/Qwen2.5-1.5B-Instruct")
# High Quality
rag = OmniRAG(model_name="Qwen/Qwen2.5-3B-Instruct")
Flan-T5 Models
# Small & Fast
rag = OmniRAG(model_name="google/flan-t5-small") # 80M params
# Medium
rag = OmniRAG(model_name="google/flan-t5-base") # 250M params
# Larger & Better
rag = OmniRAG(model_name="google/flan-t5-large") # 780M params
๐๏ธ Architecture
User Query
โ
๐ LIQUID RAG: Detect expertise level
โ
โ๏ธ CHAIN RAG: Break into sub-queries (if complex)
โ
FOR EACH SUB-QUERY:
โ
๐ค AGENTIC RAG: Choose tool (VectorDB or Web)
โ
Retrieve relevant chunks (ORIGINAL language)
โ
๐ LIQUID RAG: Transform to user level
โ
Generate sub-answer
โ
โ๏ธ CHAIN RAG: Synthesize all sub-answers
โ
๐ TRANSLATION: Convert to target language (NEW!)
โ
๐ VOICE: Speak answer (if enabled) (NEW!)
โ
โจ Perfect Answer!
๐ Performance
| Model | Size | RAM | Speed | Quality |
|---|---|---|---|---|
| flan-t5-small | 80M | 0.5GB | โกโกโก | โญโญ |
| flan-t5-base | 250M | 1GB | โกโกโก | โญโญโญ |
| Qwen-0.5B | 0.5B | 1GB | โกโก | โญโญโญ |
| Qwen-1.5B | 1.5B | 2GB | โกโก | โญโญโญโญ |
| Qwen-3B | 3B | 4GB | โก | โญโญโญโญโญ |
Recommended:
- For testing:
flan-t5-small(fast!) - For production:
flan-t5-baseorQwen-0.5B(balanced) - For quality:
Qwen-1.5B(best!)
๐ง Configuration
rag = OmniRAG(
# LLM Model
model_name="google/flan-t5-small",
# Embedding Model
embedding_model="all-MiniLM-L6-v2",
# Web Search
enable_web_search=True,
# NEW: Output Language
output_language="Tamil", # or "auto" for no translation
# NEW: Voice I/O
enable_voice=True,
# Verbose Output
verbose=True
)
๐ API Reference
OmniRAG Class
__init__(model_name, embedding_model, enable_web_search, verbose, output_language, enable_voice)
Initialize OmniRAG system.
New Parameters:
output_language(str): Target language ("Tamil", "Spanish", "auto", etc.)enable_voice(bool): Enable voice input/output
query(user_query, output_language=None, speak_answer=False)
Query the system and get answer.
New Parameters:
output_language(str): Override default language for this queryspeak_answer(bool): Speak the answer aloud
Returns:
{
'answer': str, # Generated answer
'sources': list, # Retrieved sources
'user_level': str, # Detected expertise level
'output_language': str, # Output language code
'spoken': bool, # Whether answer was spoken
}
voice_query(output_language=None) NEW!
Voice-to-voice query (requires microphone).
save_to_file(result, filename) NEW!
Save result to file with UTF-8 encoding.
Other Methods (from v1.0)
load_from_file(file_path, chunk_size=None)load_from_folder(folder_path, file_extensions=None)add_documents(documents)get_stats()clear_cache()
๐ Use Cases
Customer Support (Multi-Language)
rag = OmniRAG(output_language="Hindi", enable_voice=True)
rag.load_from_file("product_manual.pdf")
# Hindi-speaking customer
result = rag.query("How do I reset my device?", speak_answer=True)
# Answer in Hindi + spoken aloud!
Educational Platform (Tamil)
rag = OmniRAG(output_language="Tamil")
rag.load_from_file("class10_science.pdf")
# Student query
result = rag.query("What is photosynthesis?")
# Answer in Tamil!
Accessibility Tool
# For visually impaired users
rag = OmniRAG(enable_voice=True)
rag.load_from_folder("./personal_docs")
# Completely hands-free
while True:
result = rag.voice_query()
if "exit" in result.get('answer', '').lower():
break
๐ Why OmniRAG?
| Feature | LangChain | LlamaIndex | OmniRAG |
|---|---|---|---|
| Post-Retrieval Translation | โ No | โ No | โ YES |
| Built-in Voice I/O | โ No | โ No | โ YES |
| Indian Language Support | โ ๏ธ Basic | โ ๏ธ Basic | โ Native |
| Full Language Names | โ No | โ No | โ YES |
| Beginner Friendly | โ ๏ธ Complex | โ ๏ธ Complex | โ Simple |
| 100% Free | โ Yes | โ Yes | โ Yes |
๐ ๏ธ Development
Install for Development
git clone https://github.com/Giri530/omnirag.git
cd omnirag
pip install -e ".[dev]"
Project Structure
omnirag/
โโโ omnirag/
โ โโโ __init__.py
โ โโโ omnirag.py # Main class
โ โโโ smart_translator.py # NEW: Translation
โ โโโ voice_processor.py # NEW: Voice I/O
โ โโโ liquid_analyzer.py # User level detection
โ โโโ chain_decomposer.py # Query decomposition
โ โโโ agentic_planner.py # Tool selection
โ โโโ content_transformer.py # Content adaptation
โ โโโ vectordb_tool.py # FAISS database
โ โโโ web_search_tool.py # Web search
โ โโโ llm_client.py # LLM wrapper
โ โโโ cache.py # Caching
โโโ examples/
โ โโโ quickstart.py
โโโ setup.py
โโโ pyproject.toml
โโโ requirements.txt
โโโ README.md
๐ Requirements
- Python 3.8+
- 1-4GB RAM (depends on model)
- CPU or GPU (GPU recommended for speed)
Core Dependencies:
- transformers, torch, sentence-transformers
- faiss-cpu, PyPDF2, duckduckgo-search
New Dependencies (v2.0):
- deep-translator, langdetect (translation)
- pyttsx3 (voice output)
- SpeechRecognition, pyaudio (voice input - optional)
๐ค Contributing
Contributions welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open Pull Request
๐ License
MIT License - Free for commercial and personal use!
See LICENSE for details.
๐ Acknowledgments
- HuggingFace for transformers library
- Qwen Team for excellent models
- FAISS for fast vector search
- Sentence Transformers for embeddings
- Deep Translator for translation API
- pyttsx3 for text-to-speech
๐ง Contact
- GitHub: @Giri530
- Email: girinathv48@gmail.com
- Issues: Report bugs or request features
๐ Star History
If you find OmniRAG useful, please โญ star the repo!
๐ Citation
@software{omnirag2025,
title={OmniRAG: Multi-Language Voice RAG Framework},
author={Girinath V},
year={2025},
version={2.0.0},
url={https://github.com/Giri530/omnirag}
}
๐ฏ Roadmap
v2.0 (Current):
- โ Multi-language translation (27+ languages)
- โ Voice input and output
- โ UTF-8 encoding support
v2.1 (Planned):
- More file formats (DOCX, XLSX)
- Custom translation models
- Voice language selection
- GUI interface
v3.0 (Future):
- Real-time translation
- Multi-modal RAG (images)
- Cloud deployment
- API server
Made with โค๏ธ in India ๐ฎ๐ณ
100% FREE Forever!
Happy RAG-ing! ๐
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnirag-2.0.1.tar.gz.
File metadata
- Download URL: omnirag-2.0.1.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db051f1c95e251ac60c8cb5abba1a7575bedc5a8d508bab15aed4421fd1a9fb1
|
|
| MD5 |
cc8d1e893d2386c0a7bc5eb046d729e8
|
|
| BLAKE2b-256 |
71f7c2407e9d1ca890c89a2c09011669d0f039fac2678fa9ade320c576d80068
|
File details
Details for the file omnirag-2.0.1-py3-none-any.whl.
File metadata
- Download URL: omnirag-2.0.1-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c995d258459af719db3b4ef42589df672b76396dc84a9c4f6b40406b9bc99c79
|
|
| MD5 |
222c77e07a777ce4e7c201c7c617d7a5
|
|
| BLAKE2b-256 |
272e49869e531e9ffc827718672edf988686ee19dfa1a54d5cc390fc114a4e28
|