Realtime AI Voice Assistant with STM memory
Project description
VoiceAgentArnab
Realtime AI Voice Assistant with:
- 🎙 Speech-to-Text (STT)
- 🧠 Conversational Memory (STM)
- 🤖 OpenAI LLM Integration
- 🔊 Text-to-Speech (TTS)
- ♻ Continuous Conversation Loop
- 🛑 Stop/Exit Voice Commands
Built using:
- Python
- OpenAI API
- SpeechRecognition
- OpenAI TTS
Features
✅ Realtime Voice Conversations
✅ Short-Term Memory (STM)
✅ Context-Aware Responses
✅ Continuous Listening Loop
✅ User-Controlled API Key
✅ AI Voice Responses
✅ Installable Python Package
✅ CLI Support
Installation
Install directly from PyPI:
pip install VoiceAgentArnab
Requirements
- Python 3.9+
- Microphone
- Speaker/Headphones
- OpenAI API Key
Setup
Create a .env file in your project directory.
Example:
OPENAI_API_KEY=your_openai_api_key
Quick Start
Python Usage
Create a file named test.py
import asyncio
from voiceagentarnab import VoiceAgent
agent = VoiceAgent()
asyncio.run(agent.pipeline())
Run:
python test.py
CLI Usage
You can also run directly from terminal:
python -m voiceagentarnab.main
Or if CLI PATH is configured correctly:
voiceagentarnab
Example Conversation
Voice Agent Started
Say 'stop' to exit.
Listening...
User: Hello
Assistant: Hello! How can I help you today?
User: My name is Arnab
Assistant: Nice to meet you Arnab.
User: What is my name?
Assistant: Your name is Arnab.
User: stop
Assistant: Goodbye!
Short-Term Memory (STM)
This package uses STM (Short-Term Memory).
Conversation history is stored only while the program is running.
When the user says:
- stop
- exit
- quit
- bye
the application terminates and all memory is cleared automatically.
No long-term storage is used by default.
Architecture
User Speech
↓
Speech To Text
↓
Conversation Memory (STM)
↓
OpenAI LLM
↓
Assistant Response
↓
Text To Speech
Package Structure
voiceagentarnab/
│
├── audio/
│ ├── stt.py
│ └── tts.py
│
├── llm/
│ └── chat.py
│
├── memory/
│ └── stm.py
│
├── agent.py
├── main.py
└── __init__.py
Voice Commands
Stop Commands
The assistant stops when user says:
- stop
- exit
- quit
- bye
Customization
Custom Model
agent = VoiceAgent(
model="gpt-4.1-mini"
)
Custom Voice
agent = VoiceAgent(
voice="coral"
)
Available voices depend on OpenAI TTS support.
Passing API Key Manually
agent = VoiceAgent(
api_key="your_api_key"
)
Dependencies
Main dependencies:
openai
SpeechRecognition
PyAudio
python-dotenv
Troubleshooting
PyAudio Installation Error (Windows)
Try:
pip install pipwin
pipwin install pyaudio
Microphone Not Detected
Test available microphones:
import speech_recognition as sr
print(sr.Microphone.list_microphone_names())
OpenAI API Key Missing
Error:
OPENAI_API_KEY missing
Fix:
- create
.env - add valid API key
Example:
OPENAI_API_KEY=your_key_here
Current Limitations
- Uses Google SpeechRecognition backend for STT
- Uses STM only (memory resets after exit)
- Requires internet connection
- Not optimized for ultra-low latency realtime conversations
Future Improvements
Planned features:
- Wake Word Support
- Long-Term Memory (LTM)
- Realtime Streaming
- OpenAI Whisper Integration
- Voice Customization
- Desktop App
- Multi-Agent Support
Development
Clone repository:
git clone <your_repo_url>
Install dependencies:
pip install -r requirements.txt
Run locally:
python test.py
Build Package
python -m build
Publish Package
twine upload dist/*
License
MIT License
Author
Arnab
PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voiceagentarnab-0.0.2.tar.gz.
File metadata
- Download URL: voiceagentarnab-0.0.2.tar.gz
- Upload date:
- Size: 14.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c157a1c07900be785782f048c93c35407f2f1f0b68532cf7ce0fdc516bbc47b8
|
|
| MD5 |
17641d6f23c001e0fb0bf779a8d9d870
|
|
| BLAKE2b-256 |
adc4ac04fac63d1453c104e814effa10762da6f0f6e89e2d45e259231b09fd09
|
File details
Details for the file voiceagentarnab-0.0.2-py3-none-any.whl.
File metadata
- Download URL: voiceagentarnab-0.0.2-py3-none-any.whl
- Upload date:
- Size: 18.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3bb6a3ec6ea07d7327a61a1a084f2cb824f8ed1adf88ef14e21d303aaa75493
|
|
| MD5 |
92d19f66226db74a42c5a8a3e3503b45
|
|
| BLAKE2b-256 |
ad5dafae1fca619c856aac49f88149f1ec242b696fb549af16362291c996f03d
|