An AI-powered conversational assistant with voice support and document intelligence
Project description
Samvaad: Facilitating Dialogue-Based Learning
Note
- Voice queries are now fully supported with Kokoro TTS for high-quality speech synthesis in English and Hindi
- Frontend/UI is under development - currently CLI-only
- Voice chat feature includes automatic markdown processing for clean display and natural pronunciation
Please see the issues for ideas or to report bugs.
Recent Updates
- Kokoro TTS: Neural TTS engine with high-quality speech synthesis
- Voice Queries: Ask questions or query documents in your preferred language (Hindi, English, etc.)
- GPU Acceleration: Automatic GPU detection for faster processing
- Performance Monitoring: Timing instrumentation for all pipeline steps
- OS Compatibility: Cross-platform path resolution
- Separate Requirements: CPU and GPU-specific dependency files
- Interactive CLI: Improved user interface for all operations
The modular design makes it easy to add new features. The backend/ and frontend/ folders are separate, so you can build the UI and connect to the backend API.icense-MIT-green)
About The Project
Samvaad (Sanskrit for "dialogue") is an open-source software that combines Retrieval-Augmented Generation (RAG) capabilities with end-to-end voice capabilities. Users can add their documents, Samvaad will index and store them, and then users can have a text or voice conversation with those documents that delivers accurate, context-aware answers. Built with a modular backend and a modern frontend (in the works), Samvaad makes it easy to learn new topics, get ahead of confusions, and stay learning - all while feeling like a friend.
Getting Started
Prerequisites
- Python 3.11: This project is optimized for Python 3.11. Some dependencies (like sounddevice for voice features) provide wheels primarily for this version. Ensure you're using 3.11:
python --version # Should show Python 3.11.x
Follow these steps to set up and run Samvaad locally:
1. Clone the Repository
git clone https://github.com/HapoSeiz/samvaad.git
cd samvaad
2. Set Up a Virtual Environment
Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh
Windows:
uv venv
venv\Scripts\activate
macOS/Linux:
uv venv
source .venv/bin/activate
Note: uv creates a .venv directory by default (with a dot). The activation command uses .venv/bin/activate on Unix systems.
3. Install Samvaad
Option 1: Install from PyPI (recommended):
For CPU-only systems:
pip install samvaad[cpu]
For GPU systems (CUDA 12.1):
pip install samvaad[gpu]
Option 2: Install from source:
For CPU-only systems:
git clone https://github.com/atharva-again/samvaad.git
cd samvaad
pip install -e .[cpu]
For GPU systems:
git clone https://github.com/atharva-again/samvaad.git
cd samvaad
pip install -e .[gpu]
Note: Always ensure your virtual environment is activated before installing packages. If you encounter PyTorch installation issues, visit https://pytorch.org/get-started/locally/ for manual installation instructions.
Important: Install Required Llama-cpp-python Fork
Samvaad requires a special fork of llama-cpp-python for Gemma model support. You must install this manually before installing Samvaad:
pip install git+https://github.com/inference-sh/llama-cpp-python.git
Then install Samvaad as usual:
pip install samvaad[cpu]
# or
pip install samvaad[gpu]
4. Add Your Documents
Place your documents inside the data/documents/ folder. Supported file types include:
- PDF files (.pdf)
- Microsoft Office documents (.docx, .pptx, .xlsx)
- Text files (.txt, .md)
- Web pages (.html, .htm)
- Images (.png, .jpg, .jpeg, .tiff, .bmp) - with OCR support
- Other formats supported by Docling (e.g., .rtf, .epub)
These will be used as the chatbot's knowledge base.
5. Configure Environment
Create a .env file in the root directory and add your API keys:
# Copy and edit the following into .env
GEMINI_API_KEY=your_gemini_api_key_here
You can get your Gemini_API_Key here.
Note: The system works without API keys but will only show retrieved context without AI-generated answers.
6. Process Your Documents
Run the interactive CLI to ingest documents:
samvaad
Then use commands like:
i document.pdfto ingest a fileq What is the main topic?to query
7. Query Your Knowledge Base
Use the interactive CLI for querying:
samvaad
Inside the CLI:
q What are the main findings?- Basic query
Voice Queries
Samvaad supports multilingual voice queries, allowing you to ask questions in Hindi, English, Hinglish, or other languages. The system transcribes your speech and responds in the same language/style.
# Start interactive mode
samvaad
# Inside CLI:
v
# This starts voice recording mode. Speak your question in any supported language.
# The system will transcribe, process, and respond accordingly.
Supported Languages: Hindi, Hinglish (code-mixed), English, and auto-detection for other languages.
TTS Engine Options:
- Kokoro TTS: Neural TTS engine with high-quality voices (English & Hindi)
# Voice query with Kokoro TTS
v
Features:
- Automatic silence detection (2 seconds of silence stops recording)
- Markdown-aware responses (clean text for both display and speech)
- Audio responses saved to
data/audio_responses/with engine-specific filenames - Real-time language detection and appropriate voice selection
API Endpoints
Samvaad provides a REST API for programmatic access:
TTS Endpoint:
POST /tts
Content-Type: application/json
{
"text": "Your text here",
"language": "en"
}
Supported TTS Engine:
kokoro- Neural TTS (higher quality, English & Hindi)
Response:
{
"audio_base64": "base64_encoded_wav_data",
"sample_rate": 24000,
"format": "wav"
}
Direct Voice Query Usage
For direct voice queries without the interactive CLI:
# Voice query with Kokoro TTS
python -m backend.pipeline.retrieval.query_voice
# Voice query with specific Gemini model
python -m backend.pipeline.retrieval.query_voice --model gemini-2.5-flash
Usage Examples
Interactive CLI
Samvaad now uses an interactive command-line interface for all operations:
samvaad
Available commands:
i <file>oringest <file>- Process and ingest a fileq <text>orquery <text>- Query the knowledge basevorvoice- Start voice query mode (supports multiple languages like Hindi, English, Hinglish)r <file>orremove <file>- Remove a file and its embeddingshorhelp- Show helpeorexit- Exit the CLI
Document Processing
# Start interactive mode
samvaad
# Inside CLI:
i documents/research_paper.pdf
# Output includes timing: ⏱️ Parsing time: 0.1234 seconds, etc.
# Remove a document
r documents/old_file.pdf
# Output: ⏱️ Deletion time: 0.0567 seconds
Querying Your Knowledge Base
# Start interactive mode
python -m backend.test
# Inside CLI:
q "What are the main findings?"
# Output includes total query time and sources
q "Explain the methodology" -k 8
# Retrieve more context chunks
q "What are the implications?" -m gemini-2.5-flash
# Use Gemini model for answers
Performance Monitoring
The CLI now shows timing for each step:
⏱️ Parsing time: 0.1234 seconds
⏱️ Chunking time: 0.0567 seconds
⏱️ Embedding time: 1.2345 seconds
⏱️ Storage time: 0.0890 seconds
⏱️ Total query time: 2.3456 seconds
⏱️ Deletion time: 0.0123 seconds
GPU Acceleration
If a CUDA-compatible GPU is detected, operations will automatically use GPU acceleration for:
- Document parsing (Docling)
- Text embeddings (SentenceTransformer)
- Cross-encoder reranking
- LLM inference (if supported)
Check GPU usage with nvidia-smi during processing.
Example Output
🔍 Processing query: 'What is the theory of Ballism?'
============================================================
⏱️ Total query time: 2.3456 seconds
📝 QUERY: What is the theory of Ballism?
🤖 ANSWER:
The theory of Ballism, formally known as the Principle of Spherical Convergence, posits that all matter and energy in the universe is subject to a fundamental force that compels it to assume a perfect spherical shape over infinitely long periods...
📚 SOURCES (3 chunks retrieved):
1. ballism.txt (Similarity: 0.847)
Preview: The theory of Ballism, formally known as the Principle of Spherical Convergence...
2. ballism.txt (Similarity: 0.723)
Preview: Dr. Finch's initial "Finches' Folly" experiment...
Project Structure
samvaad/
├── samvaad/ # Python code for the RAG pipeline and API
│ ├── pipeline/ # Core RAG components
│ │ ├── generation/ # LLM integration and TTS engine (Kokoro)
│ │ ├── ingestion/ # Document processing and chunking
│ │ ├── retrieval/ # Query processing and voice recognition
│ │ ├── vectorstore/ # Vector database operations
│ │ └── deletion/ # Document removal utilities
│ ├── utils/ # Utilities (hashing, DB, GPU detection)
│ ├── interfaces/ # CLI and API interfaces
│ │ ├── api.py # FastAPI server with TTS API
│ │ └── cli.py # Interactive CLI for testing and usage
├── data/ # Raw documents and audio responses
│ ├── documents/ # Source documents for knowledge base
│ └── audio_responses/ # Saved TTS audio files
├── tests/ # Unit and integration tests
├── requirements.txt # Dependencies
└── README.md # Project documentation
Directory Overview:
- samvaad/: Modular RAG pipeline, dual TTS engines, API, and CLI (Python)
- samvaad/pipeline/generation/: LLM integration (Gemini) and TTS engine (Kokoro)
- samvaad/pipeline/retrieval/: Query processing, voice recognition, and markdown handling
- data/documents/: Your source documents (PDFs, Office docs, text, images, etc.)
- data/audio_responses/: Automatically saved TTS audio files with engine-specific names
- tests/: Comprehensive test suite for reliability
Features
- Kokoro TTS: Neural TTS engine with high-quality speech synthesis
- Smart Markdown Processing: Automatic stripping of markdown formatting for clean terminal display and natural speech synthesis
- Multilingual Voice Support: Voice queries and responses in Hindi, English, Hinglish, and auto-detection for other languages
- Retrieval-Augmented Generation (RAG): Combines LLMs with your own documents for accurate, context-aware answers.
- Complete Query Pipeline: Ask natural language questions and get AI-powered answers with source citations.
- GPU Acceleration: Automatic GPU detection and usage for faster embeddings, parsing, and inference (when available).
- Performance Monitoring: Built-in timing instrumentation for ingestion, retrieval, and deletion steps.
- OS-Agnostic Paths: Cross-platform compatibility (Windows, macOS, Linux) with dynamic path resolution.
- Modular Backend: Easily extend or swap components in the RAG pipeline.
- Modern Frontend (Coming Soon): React + Next.js interface for a seamless chat experience.
- Interactive CLI: Full document processing and querying via an interactive command-line interface.
- Multiple LLM Support: Works with OpenAI GPT models and Google Gemini, with graceful fallback.
- Easy Setup: Simple installation with manual PyTorch selection for CPU or GPU.
- Private & Secure: Your data stays on your machine.
Testing
Samvaad includes comprehensive unit and integration tests to ensure reliability.
Test Structure
tests/
├── unit/ # Unit tests for individual components
│ ├── test_utils.py # Utils (hashing, DB, GPU)
│ ├── test_preprocessing.py
│ ├── test_ingestion.py
│ ├── test_embedding.py
│ ├── test_vectorstore.py
│ ├── test_query.py
│ └── test_deletion.py
├── integration/ # Integration tests for full pipeline
│ └── test_full_pipeline.py
└── pytest.ini # Test configuration
Running Tests
Run all tests:
pytest
Run unit tests only:
pytest tests/unit/
Run integration tests only:
pytest tests/integration/
Run specific test file:
pytest tests/unit/test_utils.py -v
Test Coverage
- Unit Tests: Test individual functions and classes in isolation
- Integration Tests: Test the complete RAG pipeline end-to-end
- Mocking: External dependencies (APIs, databases, ML models) are mocked for reliable testing
- CI/CD Ready: Tests are designed to run in automated environments
About Test Warnings
Some warnings may appear during test runs from external dependencies (e.g., docling-core, google-genai). These warnings are not from Samvaad code but from upstream libraries that have known deprecation issues in Pydantic v2.12+. Here's how to minimize them:
To reduce or eliminate warnings:
- Keep dependencies updated:
uv pip install --upgrade docling google-genai pydantic setuptools - These are deprecation notices that will be fixed in future releases of the upstream libraries
- The warnings do not affect functionality - all 175+ tests pass successfully
Current state (as of Oct 2025):
docling-core2.49.0: Pending upstream fix for Pydantic validator patterngoogle-genai1.45.0: Pending upstream fix for Pydantic validator patternsetuptools80.9.0:pkg_resourcesdeprecation warning (expected to be removed in setuptools 81+)
These warnings will disappear once the upstream libraries update their code to use instance methods instead of classmethods for Pydantic validators (required by Pydantic v2.12+).
Continuous Integration
Automated test runs execute through GitHub Actions. The workflow runs CPU tests on all pushes and pull requests to main. GPU tests run only on pushes to main to avoid the overhead of installing large PyTorch GPU wheels on every PR. Both configurations exercise the full pytest suite. No additional secrets are required for the suite to pass because external services are mocked in the tests. You can monitor the latest builds from the Actions tab on GitHub.
Contributing
Contributions are welcome! To get started:
- Fork this repository
- Create a new branch (
git checkout -b feature/your-feature) - Make your changes and add tests
- Commit and push (
git commit -am 'Add new feature') - Open a pull request
Please see the issues page for ideas or to report bugs.
Future Development The modular design of this project makes it easy to add new features. The backend/ and frontend/ folders are completely separate, so you can build out the user interface and connect it to the backend's API when you're ready.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file samvaad-0.1.0.tar.gz.
File metadata
- Download URL: samvaad-0.1.0.tar.gz
- Upload date:
- Size: 50.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70af4eaa3eecc81ba0b7386a38c7a0752b0ddf503916bf4e73f6c9770c9b388d
|
|
| MD5 |
96e2b407915d775305dc91e8709ae532
|
|
| BLAKE2b-256 |
c5d02d2e25e165a4f041a026791f23f51ed83e2946f058332ea9d524912a8b14
|
File details
Details for the file samvaad-0.1.0-py3-none-any.whl.
File metadata
- Download URL: samvaad-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
859c4ba2ef14cb6cc36a9caf76a014b22fb57492d265033e9930a359762036c2
|
|
| MD5 |
e004d72baa89be07cf57634f99d2f6ad
|
|
| BLAKE2b-256 |
570897a83469d44ae9e24ca50a9bde2bcf9546027e4126402ac0238b0309f1d7
|