Skip to main content

Easy speech-to-text transcription from audio files or live microphone input using Whisper.

Project description

Voice Assistant Transcriber

A simple Python-based voice assistant that captures speech from your microphone, detects silence, and transcribes spoken words to text using OpenAI Whisper. Easily extensible for integration with LLMs like Ollama or Gemma.

Features

  • Real-time microphone audio capture
  • Automatic silence detection and recording stop
  • Speech-to-text transcription using Whisper
  • Comprehensive transcription logging with detailed metrics
  • Easy integration with other AI models

Installation

  1. Clone the repository:

    git clone https://github.com/akhshyganesh/voice-assistant-transcriber.git
    cd voice-assistant-transcriber
    
  2. Install dependencies:

    pip install -r requirements.txt
    

Usage

Run the main script:

python main.py

Speak into your microphone. The assistant will automatically stop recording after a few seconds of silence and transcribe your speech.

easytranscribe

PyPI version Python 3.8+ License: MIT

Easy speech-to-text transcription from audio files or live microphone input using OpenAI's Whisper.

✨ Features

  • 🎤 Live microphone transcription with automatic silence detection
  • 📁 Audio file transcription supporting multiple formats
  • 📊 Automatic logging with timestamps and performance metrics
  • 🔧 Simple CLI interface for quick usage
  • 🐍 Easy Python API for integration into your projects
  • 📈 Log analysis tools to view transcription history and statistics

🚀 Quick Start

Installation

pip install easytranscribe

Python API

Live microphone transcription:

from easytranscribe import capture_and_transcribe

# Start live transcription (speaks and waits for silence)
text = capture_and_transcribe(model_name="base")
print(f"You said: {text}")

Audio file transcription:

from easytranscribe import transcribe_audio_file

# Transcribe an audio file
text = transcribe_audio_file("path/to/audio.wav", model_name="base")
print(f"Transcription: {text}")

View transcription logs:

from easytranscribe import view_logs

# View today's logs with statistics
logs = view_logs(date="today", stats=True)
print(f"Total entries: {logs['total_count']}")

Command Line Interface

Live transcription:

easytranscribe live --model base

File transcription:

easytranscribe file path/to/audio.wav --model base

View logs:

# View today's logs
easytranscribe logs --date today --stats

# View last 10 entries
easytranscribe logs --tail 10

# List available log dates
easytranscribe logs --list-dates

📋 Available Whisper Models

Model Size Speed Accuracy Use Case
tiny 39MB Fastest Good Real-time, low resource
base 74MB Fast Better Balanced performance
small 244MB Medium Good Higher accuracy
medium 769MB Slow Very Good Professional use
large 1550MB Slowest Best Maximum accuracy
turbo 809MB Fast Excellent Best balance (default)

🔧 Configuration

Audio Settings (Live Recording)

The package automatically handles:

  • ✅ Silence detection (3 seconds of silence stops recording)
  • ✅ Minimum recording time (2 seconds)
  • ✅ Audio level monitoring
  • ✅ Automatic microphone input

Logging

Transcriptions are automatically logged to logs/transcription_YYYY-MM-DD.log with:

  • 📅 Timestamp
  • 🤖 Model used
  • ⏱️ Processing time
  • 🎵 Audio duration (for live recording)
  • 📝 Transcribed text

🛠️ Development

Install from Source

git clone https://github.com/akhshyganesh/voice-assistant-transcriber.git
cd voice-assistant-transcriber
pip install -e .

Run Tests

python test/test_integration.py

📄 Requirements

  • Python 3.8+
  • OpenAI Whisper
  • sounddevice (for microphone input)
  • numpy

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easytranscribe-0.1.1.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easytranscribe-0.1.1-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file easytranscribe-0.1.1.tar.gz.

File metadata

  • Download URL: easytranscribe-0.1.1.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for easytranscribe-0.1.1.tar.gz
Algorithm Hash digest
SHA256 81dabf167d8b77e998e7fd0e15adefb3206cf74c5867992c5b44fa07f37d1854
MD5 568ba015ca2993b868e2ec0d8d20e6f2
BLAKE2b-256 6ca91ebb68d1e1c23f4a79e826bd53964358edef132f3edcceec7f71df21b7b7

See more details on using hashes here.

File details

Details for the file easytranscribe-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: easytranscribe-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for easytranscribe-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a9a6acfd375002af8e200b872635c218fba7b08fa86f30b36a1c686479a3bb23
MD5 50792b8a9cac32731c22aa478b082ba6
BLAKE2b-256 240ed8fcf55612842c4cae64858b741150d1996e2721f216a9fb383f69015fc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page