Skip to main content

Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text

Project description

๐ŸŽค Locivox

Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text powered by AI

Locivox (Latin: loci = local, vox = voice) is an open-source STT system designed to run entirely on your machine with no cloud dependencies. Start with Whisper, expand to any model.


โœจ Features (Phase 1 - MVP)

  • โœ… Real-time microphone capture with configurable settings
  • โœ… Multiple STT engines: Faster-Whisper (recommended) and OpenAI-Whisper
  • โœ… CPU-optimized for laptops without GPU
  • โœ… Model-agnostic architecture - easily add new engines
  • โœ… Multiple output formats: TXT, JSON, SRT subtitles
  • โœ… Automatic language detection or manual selection
  • โœ… Self-contained virtual environment - no global dependencies

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • FFmpeg (required for audio processing)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (use Chocolatey)
choco install ffmpeg

Installation

  1. Clone or download the project:
cd locivox
  1. Create virtual environment:
python -m venv venv

# Activate it:
# macOS/Linux:
source venv/bin/activate

# Windows:
venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

This will download the required models on first run (~140MB for base model).


๐Ÿ’ป Usage

Interactive Recording Mode

Record from your microphone and transcribe:

python src/cli.py

Workflow:

  1. Select your microphone device
  2. Press ENTER to start recording
  3. Speak into your microphone
  4. Press ENTER to stop
  5. Transcription appears in console and saves to output/ folder

Transcribe Existing Audio File

python src/cli.py --file path/to/audio.wav

Advanced Options

# Use a different model size
python src/cli.py --model small

# Force a specific language (skip auto-detection)
python src/cli.py --language es

# Change output format
python src/cli.py --output-format json

# Use custom config file
python src/cli.py --config my_config.yaml

# Combine options
python src/cli.py --file audio.mp3 --model medium --output-format srt

โš™๏ธ Configuration

Edit config.yaml to customize behavior:

model:
  engine: "faster-whisper"  # or "openai-whisper"
  size: "base"              # tiny, base, small, medium, large
  language: "en"            # or "auto" for detection

audio:
  sample_rate: 16000        # Whisper expects 16kHz
  chunk_duration: 5         # Seconds per chunk

output:
  format: "txt"             # txt, json, srt
  timestamp: true           # Include timestamp in filename

Model Sizes & Performance

Model Size Speed (CPU) Quality Memory
tiny 39M ~10x RT Basic <1GB
base 74M ~5x RT Good ~1GB
small 244M ~3x RT Better ~2GB
medium 769M ~1x RT Great ~5GB
large 1.5G ~0.5x RT Best ~10GB

RT = Real-time (1x means transcribes at speaking speed)

Recommendation: Start with base for best speed/quality balance on CPU.


๐Ÿ“ Project Structure

locivox/
โ”œโ”€โ”€ venv/                   # Virtual environment (created on setup)
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py         # Package init
โ”‚   โ”œโ”€โ”€ cli.py              # Main CLI entry point
โ”‚   โ”œโ”€โ”€ audio_capture.py    # Microphone recording
โ”‚   โ”œโ”€โ”€ transcriber.py      # STT engine wrappers
โ”‚   โ””โ”€โ”€ utils.py            # Helper functions
โ”œโ”€โ”€ output/                 # Generated transcripts
โ”œโ”€โ”€ logs/                   # Application logs
โ”œโ”€โ”€ models/                 # Downloaded models (auto-created)
โ”œโ”€โ”€ config.yaml             # User configuration
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ””โ”€โ”€ README.md               # This file

๐Ÿ› ๏ธ Troubleshooting

"No audio devices found"

# List available devices
python -c "import sounddevice; print(sounddevice.query_devices())"

"FFmpeg not found"

Ensure FFmpeg is installed and in your PATH:

ffmpeg -version

Slow transcription on CPU

  • Use faster-whisper engine (2-4x faster than openai-whisper)
  • Use smaller models (tiny/base)
  • Reduce chunk duration in config

Import errors

Make sure virtual environment is activated:

# Check if venv is active (should show venv path)
which python  # macOS/Linux
where python  # Windows

๐Ÿ—บ๏ธ Roadmap

  • Phase 1: MVP CLI (You are here!)
  • Phase 2: Real-time streaming with chunked processing
  • Phase 3: Enhanced CLI with speaker diarization, multiple formats
  • Phase 4: GUI Desktop App with Electron/PyQt
  • Phase 5: Advanced features (translation, punctuation, custom vocabulary)
  • Phase 6: Multi-platform distribution with installers

See ROADMAP.md for detailed timeline.


๐Ÿค Contributing

Contributions welcome! This is an open-source project.

Areas to contribute:

  • New STT engine integrations (Vosk, Coqui, wav2vec2)
  • Performance optimizations
  • GUI development
  • Documentation improvements
  • Bug fixes and testing

๐Ÿ“„ License

MIT License - See LICENSE file


๐Ÿ™ Acknowledgments

  • OpenAI Whisper - State-of-the-art STT model
  • Faster-Whisper - Optimized inference engine
  • sounddevice - Python audio library

๐Ÿ“ž Support

  • Issues: Open an issue on GitHub
  • Discussions: Start a discussion for features/ideas
  • Logs: Check logs/locivox.log for debugging

Built with โค๏ธ for privacy-conscious developers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locivox-0.4.0.tar.gz (77.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

locivox-0.4.0-py3-none-any.whl (70.8 kB view details)

Uploaded Python 3

File details

Details for the file locivox-0.4.0.tar.gz.

File metadata

  • Download URL: locivox-0.4.0.tar.gz
  • Upload date:
  • Size: 77.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for locivox-0.4.0.tar.gz
Algorithm Hash digest
SHA256 bf9b7502d9bdc58a50a814ab5af354ae7a882efd47c2b825a6b0a754842f2bc7
MD5 ad2920ec4d080e15d5d8dc4aca177c03
BLAKE2b-256 97453203d1be4fe02279fbd1a45673558c845b8d3199b81746fbb342e87e4704

See more details on using hashes here.

Provenance

The following attestation bundles were made for locivox-0.4.0.tar.gz:

Publisher: release.yml on mudaye/locivox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file locivox-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: locivox-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 70.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for locivox-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a87aa57be2fda14c143d101b053db4495f0cafbac42c837af48206116be4d2dc
MD5 14af7a54f20a679f9033f91879bc4b5c
BLAKE2b-256 f2e11e0b1750f3199f5d8780467227f3d4ac45821749346006d98c43386819e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for locivox-0.4.0-py3-none-any.whl:

Publisher: release.yml on mudaye/locivox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page