Easy speech-to-text transcription from audio files or live microphone input using Whisper.
Project description
Voice Assistant Transcriber
A simple Python-based voice assistant that captures speech from your microphone, detects silence, and transcribes spoken words to text using OpenAI Whisper. Easily extensible for integration with LLMs like Ollama or Gemma.
Features
- Real-time microphone audio capture
- Automatic silence detection and recording stop
- Speech-to-text transcription using Whisper
- Comprehensive transcription logging with detailed metrics
- Easy integration with other AI models
Installation
-
Clone the repository:
git clone https://github.com/akhshyganesh/voice-assistant-transcriber.git cd voice-assistant-transcriber
-
Install dependencies:
pip install -r requirements.txt
Usage
Run the main script:
python main.py
Speak into your microphone. The assistant will automatically stop recording after a few seconds of silence and transcribe your speech.
easytranscribe
Easy speech-to-text transcription from audio files or live microphone input using OpenAI's Whisper.
✨ Features
- 🎤 Live microphone transcription with automatic silence detection
- 📁 Audio file transcription supporting multiple formats
- 📊 Automatic logging with timestamps and performance metrics
- 🔧 Simple CLI interface for quick usage
- 🐍 Easy Python API for integration into your projects
- 📈 Log analysis tools to view transcription history and statistics
🚀 Quick Start
Installation
pip install easytranscribe
Python API
Live microphone transcription:
from easytranscribe import capture_and_transcribe
# Start live transcription (speaks and waits for silence)
text = capture_and_transcribe(model_name="base")
print(f"You said: {text}")
Audio file transcription:
from easytranscribe import transcribe_audio_file
# Transcribe an audio file
text = transcribe_audio_file("path/to/audio.wav", model_name="base")
print(f"Transcription: {text}")
View transcription logs:
from easytranscribe import view_logs
# View today's logs with statistics
logs = view_logs(date="today", stats=True)
print(f"Total entries: {logs['total_count']}")
Command Line Interface
Live transcription:
easytranscribe live --model base
File transcription:
easytranscribe file path/to/audio.wav --model base
View logs:
# View today's logs
easytranscribe logs --date today --stats
# View last 10 entries
easytranscribe logs --tail 10
# List available log dates
easytranscribe logs --list-dates
📋 Available Whisper Models
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
tiny |
39MB | Fastest | Good | Real-time, low resource |
base |
74MB | Fast | Better | Balanced performance |
small |
244MB | Medium | Good | Higher accuracy |
medium |
769MB | Slow | Very Good | Professional use |
large |
1550MB | Slowest | Best | Maximum accuracy |
turbo |
809MB | Fast | Excellent | Best balance (default) |
🔧 Configuration
Audio Settings (Live Recording)
The package automatically handles:
- ✅ Silence detection (3 seconds of silence stops recording)
- ✅ Minimum recording time (2 seconds)
- ✅ Audio level monitoring
- ✅ Automatic microphone input
Logging
Transcriptions are automatically logged to logs/transcription_YYYY-MM-DD.log with:
- 📅 Timestamp
- 🤖 Model used
- ⏱️ Processing time
- 🎵 Audio duration (for live recording)
- 📝 Transcribed text
🛠️ Development
Install from Source
git clone https://github.com/akhshyganesh/voice-assistant-transcriber.git
cd voice-assistant-transcriber
pip install -e .
Run Tests
python test/test_integration.py
📄 Requirements
- Python 3.8+
- OpenAI Whisper
- sounddevice (for microphone input)
- numpy
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- OpenAI Whisper for the amazing speech recognition model
- sounddevice for microphone input handling
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easytranscribe-0.1.0.tar.gz.
File metadata
- Download URL: easytranscribe-0.1.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36f37978a354fc1764a6527b2ccda00ea959a6081b68430436e6906e36680f01
|
|
| MD5 |
9786e6df9489ddf6f463c5dc5b93f2d1
|
|
| BLAKE2b-256 |
3c0ddc153ec82c13f4da455b4d220b01df2b39aaab09eff7b3286170c9900b01
|
File details
Details for the file easytranscribe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: easytranscribe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca50d682ee42b23472a46d6cfac390c2dcb5924c7dbad1270fab501ab371a2be
|
|
| MD5 |
73b04c9cae0c6dc550bed933e026e106
|
|
| BLAKE2b-256 |
aa5e440fb34b5fa263645a63566b9eb654b71d2baa5f31c4de6e30bd8a0ef349
|