Skip to main content

Realtime AI Voice Assistant with STM memory

Project description

VoiceAgentArnab

Realtime AI Voice Assistant with:

  • 🎙 Speech-to-Text (STT)
  • 🧠 Conversational Memory (STM)
  • 🤖 OpenAI LLM Integration
  • 🔊 Text-to-Speech (TTS)
  • ♻ Continuous Conversation Loop
  • 🛑 Stop/Exit Voice Commands

Built using:

  • Python
  • OpenAI API
  • SpeechRecognition
  • OpenAI TTS

Features

✅ Realtime Voice Conversations
✅ Short-Term Memory (STM)
✅ Context-Aware Responses
✅ Continuous Listening Loop
✅ User-Controlled API Key
✅ AI Voice Responses
✅ Installable Python Package
✅ CLI Support


Installation

Install directly from PyPI:

pip install VoiceAgentArnab

Requirements

  • Python 3.9+
  • Microphone
  • Speaker/Headphones
  • OpenAI API Key

Setup

Create a .env file in your project directory.

Example:

OPENAI_API_KEY=your_openai_api_key

Quick Start

Python Usage

Create a file named test.py

import asyncio

from voiceagentarnab import VoiceAgent


agent = VoiceAgent()

asyncio.run(agent.pipeline())

Run:

python test.py

CLI Usage

You can also run directly from terminal:

python -m voiceagentarnab.main

Or if CLI PATH is configured correctly:

voiceagentarnab

Example Conversation

Voice Agent Started
Say 'stop' to exit.

Listening...

User: Hello

Assistant: Hello! How can I help you today?

User: My name is Arnab

Assistant: Nice to meet you Arnab.

User: What is my name?

Assistant: Your name is Arnab.

User: stop

Assistant: Goodbye!

Short-Term Memory (STM)

This package uses STM (Short-Term Memory).

Conversation history is stored only while the program is running.

When the user says:

  • stop
  • exit
  • quit
  • bye

the application terminates and all memory is cleared automatically.

No long-term storage is used by default.


Architecture

User Speech
   ↓
Speech To Text
   ↓
Conversation Memory (STM)
   ↓
OpenAI LLM
   ↓
Assistant Response
   ↓
Text To Speech

Package Structure

voiceagentarnab/
│
├── audio/
│   ├── stt.py
│   └── tts.py
│
├── llm/
│   └── chat.py
│
├── memory/
│   └── stm.py
│
├── agent.py
├── main.py
└── __init__.py

Voice Commands

Stop Commands

The assistant stops when user says:

  • stop
  • exit
  • quit
  • bye

Customization

Custom Model

agent = VoiceAgent(
    model="gpt-4.1-mini"
)

Custom Voice

agent = VoiceAgent(
    voice="coral"
)

Available voices depend on OpenAI TTS support.


Passing API Key Manually

agent = VoiceAgent(
    api_key="your_api_key"
)

Dependencies

Main dependencies:

openai
SpeechRecognition
PyAudio
python-dotenv

Troubleshooting

PyAudio Installation Error (Windows)

Try:

pip install pipwin
pipwin install pyaudio

Microphone Not Detected

Test available microphones:

import speech_recognition as sr

print(sr.Microphone.list_microphone_names())

OpenAI API Key Missing

Error:

OPENAI_API_KEY missing

Fix:

  • create .env
  • add valid API key

Example:

OPENAI_API_KEY=your_key_here

Current Limitations

  • Uses Google SpeechRecognition backend for STT
  • Uses STM only (memory resets after exit)
  • Requires internet connection
  • Not optimized for ultra-low latency realtime conversations

Future Improvements

Planned features:

  • Wake Word Support
  • Long-Term Memory (LTM)
  • Realtime Streaming
  • OpenAI Whisper Integration
  • Voice Customization
  • Desktop App
  • Multi-Agent Support

Development

Clone repository:

git clone <your_repo_url>

Install dependencies:

pip install -r requirements.txt

Run locally:

python test.py

Build Package

python -m build

Publish Package

twine upload dist/*

License

MIT License


Author

Arnab


PyPI

https://pypi.org/project/VoiceAgentArnab/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voiceagentarnab-0.0.2.tar.gz (14.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voiceagentarnab-0.0.2-py3-none-any.whl (18.2 MB view details)

Uploaded Python 3

File details

Details for the file voiceagentarnab-0.0.2.tar.gz.

File metadata

  • Download URL: voiceagentarnab-0.0.2.tar.gz
  • Upload date:
  • Size: 14.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for voiceagentarnab-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c157a1c07900be785782f048c93c35407f2f1f0b68532cf7ce0fdc516bbc47b8
MD5 17641d6f23c001e0fb0bf779a8d9d870
BLAKE2b-256 adc4ac04fac63d1453c104e814effa10762da6f0f6e89e2d45e259231b09fd09

See more details on using hashes here.

File details

Details for the file voiceagentarnab-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for voiceagentarnab-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e3bb6a3ec6ea07d7327a61a1a084f2cb824f8ed1adf88ef14e21d303aaa75493
MD5 92d19f66226db74a42c5a8a3e3503b45
BLAKE2b-256 ad5dafae1fca619c856aac49f88149f1ec242b696fb549af16362291c996f03d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page