Skip to main content

India's First Multi-Language Voice RAG Framework

Project description

๐Ÿš€ OmniRAG v2.0 - Multi-Language Voice RAG

PyPI version Python 3.8+ License: MIT Downloads

India's First Multi-Language Voice RAG Framework ๐Ÿ‡ฎ๐Ÿ‡ณ

Intelligent RAG combining Liquid + Agentic + Chain architectures with unique features:

  • ๐ŸŒ Smart Multi-Language Translation - 27+ languages including Tamil, Hindi
  • ๐ŸŽค Voice Input & Output - Speak questions, hear answers
  • ๐Ÿง  Adaptive RAG - Automatically adjusts to user expertise level

๐Ÿ†• What's New in v2.0?

โœจ Feature 1: Smart Post-Retrieval Translation

Revolutionary architecture: Documents stay in original language, translation happens AFTER retrieval!

Why this is better:

  • โœ… Better embeddings (preserve semantic meaning)
  • โœ… No storage duplication
  • โœ… One document โ†’ Many output languages
  • โœ… 70% more efficient than traditional approaches
from omnirag import OmniRAG

# Documents in English
rag = OmniRAG(output_language="Tamil")
rag.add_documents(["AI helps solve complex problems."])

# Query in English, get Tamil answer!
result = rag.query("What is AI?")
print(result['answer'])
# Output: "เฎšเฏ†เฎฏเฎฑเฏเฎ•เฏˆ เฎจเฏเฎฃเฏเฎฃเฎฑเฎฟเฎตเฏ เฎšเฎฟเฎ•เฏเฎ•เฎฒเฎพเฎฉ เฎšเฎฟเฎ•เฏเฎ•เฎฒเฏเฎ•เฎณเฏˆเฎคเฏ เฎคเฏ€เฎฐเฏเฎ•เฏเฎ• เฎ‰เฎคเฎตเฏเฎ•เฎฟเฎฑเฎคเฏ."

Supported Languages (27+):

  • Indian: Tamil, Hindi, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Bengali
  • European: Spanish, French, German, Italian, Portuguese, Russian, Polish, Dutch, Turkish
  • Asian: Chinese, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay
  • Other: Arabic, English

โœจ Feature 2: Voice Input & Output

First open-source RAG with built-in voice support!

# Voice output (text-to-speech)
rag = OmniRAG(enable_voice=True, output_language="Tamil")
result = rag.query("What is Python?", speak_answer=True)
# Hears answer in Tamil! ๐Ÿ”Š

# Voice input (speech-to-text) - requires microphone
result = rag.voice_query()
# Speak your question, hear the answer!

๐ŸŽฏ What is OmniRAG?

OmniRAG is an advanced Retrieval-Augmented Generation system that combines three powerful RAG techniques:

๐ŸŒŠ Liquid RAG

Automatically adapts answers to user expertise level:

  • Beginner: Simple explanations with examples
  • Intermediate: Balanced technical content
  • Expert: Deep technical details

๐Ÿค– Agentic RAG

Intelligently chooses the best information source:

  • VectorDB: For local documents
  • Web Search: For current information

โ›“๏ธ Chain RAG

Handles complex multi-part questions:

  • Breaks down complex queries
  • Answers each part separately
  • Synthesizes coherent final answer

โœจ All Features

v2.0 NEW:

  • ๐ŸŒ Multi-Language Translation (27+ languages)
  • ๐ŸŽค Voice Input & Output (speak & hear)
  • ๐Ÿ”ค Full Language Names ("Tamil" not "ta")
  • ๐Ÿ”ง UTF-8 Support (perfect Tamil/Hindi display)

v1.0 CORE:

  • โœ… PDF Support - Load PDF files directly
  • โœ… Multiple LLM Models - Qwen, Flan-T5, Mistral, Phi-2
  • โœ… FAISS Vector DB - Fast similarity search
  • โœ… Web Search - DuckDuckGo integration (free!)
  • โœ… Smart User Detection - Auto expertise level detection
  • โœ… Query Decomposition - Handles complex questions
  • โœ… Fast Caching - 3x speedup on repeated queries
  • โœ… 100% FREE - No API costs!
  • โœ… Works on CPU - No GPU required

๐Ÿ“ฆ Installation

pip install omnirag

With Voice Input (Optional)

Windows:

pip install pipwin
pipwin install pyaudio
pip install omnirag[voice-input]

Mac:

brew install portaudio
pip install omnirag[voice-input]

Linux:

sudo apt-get install portaudio19-dev
pip install omnirag[voice-input]

From Source

git clone https://github.com/Giri530/omnirag.git
cd omnirag
pip install -e .

๐Ÿš€ Quick Start

Basic Usage

from omnirag import OmniRAG

# Initialize
rag = OmniRAG(model_name="google/flan-t5-small")

# Add documents
rag.add_documents([
    "Python is a programming language.",
    "It is used for AI and data science."
])

# Query
result = rag.query("What is Python?")
print(result['answer'])

Multi-Language Example

from omnirag import OmniRAG

# Initialize with Spanish output
rag = OmniRAG(
    model_name="google/flan-t5-small",
    output_language="Spanish"  # or "Tamil", "Hindi", etc.
)

# Add English documents
rag.add_documents([
    "AI helps solve complex problems.",
    "Machine Learning is a subset of AI."
])

# Query in English, get Spanish answer!
result = rag.query("What is AI?")
print(result['answer'])
# Output: "La IA ayuda a resolver problemas complejos."

Voice Example

from omnirag import OmniRAG

# Initialize with voice
rag = OmniRAG(
    enable_voice=True,
    output_language="Tamil"
)

rag.add_documents(["Python is great for AI."])

# Text input, voice output
result = rag.query("What is Python?", speak_answer=True)
# Hears answer in Tamil! ๐Ÿ”Š

# Voice input, voice output (requires microphone)
result = rag.voice_query()
# Speak question, hear answer!

๐Ÿ’ก Usage Examples

Load Different File Types

# PDF files
rag.load_from_file("research_paper.pdf")

# Text files
rag.load_from_file("notes.txt")

# JSON data
rag.load_from_file("data.json")

# Entire folder
rag.load_from_folder("./documents")

# With chunking for large files
rag.load_from_file("big_file.pdf", chunk_size=500)

# Direct text
rag.add_documents([
    "Python is great for ML.",
    "Qwen is a powerful language model."
])

Different Output Languages

# Default: Spanish
rag = OmniRAG(output_language="Spanish")

# Query 1: Spanish (default)
result1 = rag.query("What is AI?")

# Query 2: Override to Tamil
result2 = rag.query("What is ML?", output_language="Tamil")

# Query 3: Override to French
result3 = rag.query("What is DL?", output_language="French")

Full Language Names

# All these work!
rag = OmniRAG(output_language="Spanish")  # โœ…
rag = OmniRAG(output_language="spanish")  # โœ…
rag = OmniRAG(output_language="es")       # โœ…

# Same for all languages
rag = OmniRAG(output_language="Tamil")    # โœ…
rag = OmniRAG(output_language="Hindi")    # โœ…

Complex Queries

# OmniRAG automatically breaks down and answers
result = rag.query("""
Compare Python vs Java for machine learning.
Which is better for beginners?
What are the performance differences?
""")

print(result['answer'])

Enable Web Search

rag = OmniRAG(
    model_name="google/flan-t5-small",
    enable_web_search=True  # Free DuckDuckGo search
)

# Queries about "latest" or "recent" automatically use web
result = rag.query("Latest AI developments in 2025")

๐ŸŽจ Supported Models

Qwen Models (Recommended!)

# Fast & Efficient
rag = OmniRAG(model_name="Qwen/Qwen2.5-0.5B-Instruct")

# Balanced (Best Choice!)
rag = OmniRAG(model_name="Qwen/Qwen2.5-1.5B-Instruct")

# High Quality
rag = OmniRAG(model_name="Qwen/Qwen2.5-3B-Instruct")

Flan-T5 Models

# Small & Fast
rag = OmniRAG(model_name="google/flan-t5-small")   # 80M params

# Medium
rag = OmniRAG(model_name="google/flan-t5-base")    # 250M params

# Larger & Better
rag = OmniRAG(model_name="google/flan-t5-large")   # 780M params

๐Ÿ—๏ธ Architecture

User Query
    โ†“
๐ŸŒŠ LIQUID RAG: Detect expertise level
    โ†“
โ›“๏ธ CHAIN RAG: Break into sub-queries (if complex)
    โ†“
FOR EACH SUB-QUERY:
    โ†“
๐Ÿค– AGENTIC RAG: Choose tool (VectorDB or Web)
    โ†“
    Retrieve relevant chunks (ORIGINAL language)
    โ†“
๐ŸŒŠ LIQUID RAG: Transform to user level
    โ†“
    Generate sub-answer
    โ†“
โ›“๏ธ CHAIN RAG: Synthesize all sub-answers
    โ†“
๐ŸŒ TRANSLATION: Convert to target language (NEW!)
    โ†“
๐Ÿ”Š VOICE: Speak answer (if enabled) (NEW!)
    โ†“
โœจ Perfect Answer!

๐Ÿ“Š Performance

Model Size RAM Speed Quality
flan-t5-small 80M 0.5GB โšกโšกโšก โญโญ
flan-t5-base 250M 1GB โšกโšกโšก โญโญโญ
Qwen-0.5B 0.5B 1GB โšกโšก โญโญโญ
Qwen-1.5B 1.5B 2GB โšกโšก โญโญโญโญ
Qwen-3B 3B 4GB โšก โญโญโญโญโญ

Recommended:

  • For testing: flan-t5-small (fast!)
  • For production: flan-t5-base or Qwen-0.5B (balanced)
  • For quality: Qwen-1.5B (best!)

๐Ÿ”ง Configuration

rag = OmniRAG(
    # LLM Model
    model_name="google/flan-t5-small",
    
    # Embedding Model
    embedding_model="all-MiniLM-L6-v2",
    
    # Web Search
    enable_web_search=True,
    
    # NEW: Output Language
    output_language="Tamil",  # or "auto" for no translation
    
    # NEW: Voice I/O
    enable_voice=True,
    
    # Verbose Output
    verbose=True
)

๐Ÿ“– API Reference

OmniRAG Class

__init__(model_name, embedding_model, enable_web_search, verbose, output_language, enable_voice)

Initialize OmniRAG system.

New Parameters:

  • output_language (str): Target language ("Tamil", "Spanish", "auto", etc.)
  • enable_voice (bool): Enable voice input/output

query(user_query, output_language=None, speak_answer=False)

Query the system and get answer.

New Parameters:

  • output_language (str): Override default language for this query
  • speak_answer (bool): Speak the answer aloud

Returns:

{
    'answer': str,              # Generated answer
    'sources': list,            # Retrieved sources
    'user_level': str,          # Detected expertise level
    'output_language': str,     # Output language code
    'spoken': bool,             # Whether answer was spoken
}

voice_query(output_language=None) NEW!

Voice-to-voice query (requires microphone).

save_to_file(result, filename) NEW!

Save result to file with UTF-8 encoding.

Other Methods (from v1.0)

  • load_from_file(file_path, chunk_size=None)
  • load_from_folder(folder_path, file_extensions=None)
  • add_documents(documents)
  • get_stats()
  • clear_cache()

๐ŸŒ Use Cases

Customer Support (Multi-Language)

rag = OmniRAG(output_language="Hindi", enable_voice=True)
rag.load_from_file("product_manual.pdf")

# Hindi-speaking customer
result = rag.query("How do I reset my device?", speak_answer=True)
# Answer in Hindi + spoken aloud!

Educational Platform (Tamil)

rag = OmniRAG(output_language="Tamil")
rag.load_from_file("class10_science.pdf")

# Student query
result = rag.query("What is photosynthesis?")
# Answer in Tamil!

Accessibility Tool

# For visually impaired users
rag = OmniRAG(enable_voice=True)
rag.load_from_folder("./personal_docs")

# Completely hands-free
while True:
    result = rag.voice_query()
    if "exit" in result.get('answer', '').lower():
        break

๐ŸŒŸ Why OmniRAG?

Feature LangChain LlamaIndex OmniRAG
Post-Retrieval Translation โŒ No โŒ No โœ… YES
Built-in Voice I/O โŒ No โŒ No โœ… YES
Indian Language Support โš ๏ธ Basic โš ๏ธ Basic โœ… Native
Full Language Names โŒ No โŒ No โœ… YES
Beginner Friendly โš ๏ธ Complex โš ๏ธ Complex โœ… Simple
100% Free โœ… Yes โœ… Yes โœ… Yes

๐Ÿ› ๏ธ Development

Install for Development

git clone https://github.com/Giri530/omnirag.git
cd omnirag
pip install -e ".[dev]"

Project Structure

omnirag/
โ”œโ”€โ”€ omnirag/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ omnirag.py              # Main class
โ”‚   โ”œโ”€โ”€ smart_translator.py     # NEW: Translation
โ”‚   โ”œโ”€โ”€ voice_processor.py      # NEW: Voice I/O
โ”‚   โ”œโ”€โ”€ liquid_analyzer.py      # User level detection
โ”‚   โ”œโ”€โ”€ chain_decomposer.py     # Query decomposition
โ”‚   โ”œโ”€โ”€ agentic_planner.py      # Tool selection
โ”‚   โ”œโ”€โ”€ content_transformer.py  # Content adaptation
โ”‚   โ”œโ”€โ”€ vectordb_tool.py        # FAISS database
โ”‚   โ”œโ”€โ”€ web_search_tool.py      # Web search
โ”‚   โ”œโ”€โ”€ llm_client.py           # LLM wrapper
โ”‚   โ””โ”€โ”€ cache.py                # Caching
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ quickstart.py
โ”œโ”€โ”€ setup.py
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ“ Requirements

  • Python 3.8+
  • 1-4GB RAM (depends on model)
  • CPU or GPU (GPU recommended for speed)

Core Dependencies:

  • transformers, torch, sentence-transformers
  • faiss-cpu, PyPDF2, duckduckgo-search

New Dependencies (v2.0):

  • deep-translator, langdetect (translation)
  • pyttsx3 (voice output)
  • SpeechRecognition, pyaudio (voice input - optional)

๐Ÿค Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing)
  5. Open Pull Request

๐Ÿ“„ License

MIT License - Free for commercial and personal use!

See LICENSE for details.


๐Ÿ™ Acknowledgments

  • HuggingFace for transformers library
  • Qwen Team for excellent models
  • FAISS for fast vector search
  • Sentence Transformers for embeddings
  • Deep Translator for translation API
  • pyttsx3 for text-to-speech

๐Ÿ“ง Contact


๐ŸŒŸ Star History

If you find OmniRAG useful, please โญ star the repo!


๐Ÿ“š Citation

@software{omnirag2025,
  title={OmniRAG: Multi-Language Voice RAG Framework},
  author={Girinath V},
  year={2025},
  version={2.0.0},
  url={https://github.com/Giri530/omnirag}
}

๐ŸŽฏ Roadmap

v2.0 (Current):

  • โœ… Multi-language translation (27+ languages)
  • โœ… Voice input and output
  • โœ… UTF-8 encoding support

v2.1 (Planned):

  • More file formats (DOCX, XLSX)
  • Custom translation models
  • Voice language selection
  • GUI interface

v3.0 (Future):

  • Real-time translation
  • Multi-modal RAG (images)
  • Cloud deployment
  • API server

Made with โค๏ธ in India ๐Ÿ‡ฎ๐Ÿ‡ณ

100% FREE Forever!

Happy RAG-ing! ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnirag-2.0.0.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnirag-2.0.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file omnirag-2.0.0.tar.gz.

File metadata

  • Download URL: omnirag-2.0.0.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for omnirag-2.0.0.tar.gz
Algorithm Hash digest
SHA256 2feeda875d64a228ce49b73520a7e38eba1e5a337c0d5506b066ff5732fb7fc7
MD5 b55ab6faf043949fa7d7321f5500d51e
BLAKE2b-256 93317f2c3723a4c6a8ec0074f31bdf55ec6452bdcb98468bc2916e2d7fe8c6b3

See more details on using hashes here.

File details

Details for the file omnirag-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: omnirag-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for omnirag-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 540bb19f6374e0b2cd0917a96efad865c25ae7b090655abc67b55875f9506fc9
MD5 271ed7a782a1e32a410c386c50a58d0a
BLAKE2b-256 dd09cc96acb3b70f84c1b0ea68433d8e7555197c0af4a9c8d7e8d7846055fb1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page