Skip to main content

Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text

Project description

๐ŸŽค Locivox

Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text powered by AI

Locivox (Latin: loci = local, vox = voice) is an open-source STT system designed to run entirely on your machine with no cloud dependencies. Start with Whisper, expand to any model.


โœจ Features (Phase 1 - MVP)

  • โœ… Real-time microphone capture with configurable settings
  • โœ… Multiple STT engines: Faster-Whisper (recommended) and OpenAI-Whisper
  • โœ… CPU-optimized for laptops without GPU
  • โœ… Model-agnostic architecture - easily add new engines
  • โœ… Multiple output formats: TXT, JSON, SRT subtitles
  • โœ… Automatic language detection or manual selection
  • โœ… Self-contained virtual environment - no global dependencies

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • FFmpeg (required for audio processing)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (use Chocolatey)
choco install ffmpeg

Installation

  1. Clone or download the project:
cd locivox
  1. Create virtual environment:
python -m venv venv

# Activate it:
# macOS/Linux:
source venv/bin/activate

# Windows:
venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

This will download the required models on first run (~140MB for base model).


๐Ÿ’ป Usage

Interactive Recording Mode

Record from your microphone and transcribe:

python src/cli.py

Workflow:

  1. Select your microphone device
  2. Press ENTER to start recording
  3. Speak into your microphone
  4. Press ENTER to stop
  5. Transcription appears in console and saves to output/ folder

Transcribe Existing Audio File

python src/cli.py --file path/to/audio.wav

Advanced Options

# Use a different model size
python src/cli.py --model small

# Force a specific language (skip auto-detection)
python src/cli.py --language es

# Change output format
python src/cli.py --output-format json

# Use custom config file
python src/cli.py --config my_config.yaml

# Combine options
python src/cli.py --file audio.mp3 --model medium --output-format srt

โš™๏ธ Configuration

Edit config.yaml to customize behavior:

model:
  engine: "faster-whisper"  # or "openai-whisper"
  size: "base"              # tiny, base, small, medium, large
  language: "en"            # or "auto" for detection

audio:
  sample_rate: 16000        # Whisper expects 16kHz
  chunk_duration: 5         # Seconds per chunk

output:
  format: "txt"             # txt, json, srt
  timestamp: true           # Include timestamp in filename

Model Sizes & Performance

Model Size Speed (CPU) Quality Memory
tiny 39M ~10x RT Basic <1GB
base 74M ~5x RT Good ~1GB
small 244M ~3x RT Better ~2GB
medium 769M ~1x RT Great ~5GB
large 1.5G ~0.5x RT Best ~10GB

RT = Real-time (1x means transcribes at speaking speed)

Recommendation: Start with base for best speed/quality balance on CPU.


๐Ÿ“ Project Structure

locivox/
โ”œโ”€โ”€ venv/                   # Virtual environment (created on setup)
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py         # Package init
โ”‚   โ”œโ”€โ”€ cli.py              # Main CLI entry point
โ”‚   โ”œโ”€โ”€ audio_capture.py    # Microphone recording
โ”‚   โ”œโ”€โ”€ transcriber.py      # STT engine wrappers
โ”‚   โ””โ”€โ”€ utils.py            # Helper functions
โ”œโ”€โ”€ output/                 # Generated transcripts
โ”œโ”€โ”€ logs/                   # Application logs
โ”œโ”€โ”€ models/                 # Downloaded models (auto-created)
โ”œโ”€โ”€ config.yaml             # User configuration
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ””โ”€โ”€ README.md               # This file

๐Ÿ› ๏ธ Troubleshooting

"No audio devices found"

# List available devices
python -c "import sounddevice; print(sounddevice.query_devices())"

"FFmpeg not found"

Ensure FFmpeg is installed and in your PATH:

ffmpeg -version

Slow transcription on CPU

  • Use faster-whisper engine (2-4x faster than openai-whisper)
  • Use smaller models (tiny/base)
  • Reduce chunk duration in config

Import errors

Make sure virtual environment is activated:

# Check if venv is active (should show venv path)
which python  # macOS/Linux
where python  # Windows

๐Ÿ—บ๏ธ Roadmap

  • Phase 1: MVP CLI (You are here!)
  • Phase 2: Real-time streaming with chunked processing
  • Phase 3: Enhanced CLI with speaker diarization, multiple formats
  • Phase 4: GUI Desktop App with Electron/PyQt
  • Phase 5: Advanced features (translation, punctuation, custom vocabulary)
  • Phase 6: Multi-platform distribution with installers

See ROADMAP.md for detailed timeline.


๐Ÿค Contributing

Contributions welcome! This is an open-source project.

Areas to contribute:

  • New STT engine integrations (Vosk, Coqui, wav2vec2)
  • Performance optimizations
  • GUI development
  • Documentation improvements
  • Bug fixes and testing

๐Ÿ“„ License

MIT License - See LICENSE file


๐Ÿ™ Acknowledgments

  • OpenAI Whisper - State-of-the-art STT model
  • Faster-Whisper - Optimized inference engine
  • sounddevice - Python audio library

๐Ÿ“ž Support

  • Issues: Open an issue on GitHub
  • Discussions: Start a discussion for features/ideas
  • Logs: Check logs/locivox.log for debugging

Built with โค๏ธ for privacy-conscious developers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locivox-0.4.1.tar.gz (77.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

locivox-0.4.1-py3-none-any.whl (70.8 kB view details)

Uploaded Python 3

File details

Details for the file locivox-0.4.1.tar.gz.

File metadata

  • Download URL: locivox-0.4.1.tar.gz
  • Upload date:
  • Size: 77.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for locivox-0.4.1.tar.gz
Algorithm Hash digest
SHA256 a42dd8a583dcd6beea8299b91ac1ff2bfeb1544bbbb8c0096f7afca7c588a258
MD5 dfd01bcaef2a69366da98a52174d8d45
BLAKE2b-256 3cfa1462e77ab9ddea54c2ab411e7686f560cda91057c98bdf6614dfb4462763

See more details on using hashes here.

Provenance

The following attestation bundles were made for locivox-0.4.1.tar.gz:

Publisher: release.yml on mudaye/locivox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file locivox-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: locivox-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 70.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for locivox-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f7f2129d8639a1b94619ea0ec208692124adfa5b959eeca94a688cc9681f958
MD5 8d6754521e2c2691343ea91d9e710beb
BLAKE2b-256 f9f8b9dec5a750c30e36db3fca9342e7be9eab3d976cd532fd348d0c783f3de6

See more details on using hashes here.

Provenance

The following attestation bundles were made for locivox-0.4.1-py3-none-any.whl:

Publisher: release.yml on mudaye/locivox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page