Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text
Project description
๐ค Locivox
Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text powered by AI
Locivox (Latin: loci = local, vox = voice) is an open-source STT system designed to run entirely on your machine with no cloud dependencies. Start with Whisper, expand to any model.
โจ Features (Phase 1 - MVP)
- โ Real-time microphone capture with configurable settings
- โ Multiple STT engines: Faster-Whisper (recommended) and OpenAI-Whisper
- โ CPU-optimized for laptops without GPU
- โ Model-agnostic architecture - easily add new engines
- โ Multiple output formats: TXT, JSON, SRT subtitles
- โ Automatic language detection or manual selection
- โ Self-contained virtual environment - no global dependencies
๐ Quick Start
Prerequisites
- Python 3.9 or higher
- FFmpeg (required for audio processing)
Install FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# Windows (use Chocolatey)
choco install ffmpeg
Installation
- Clone or download the project:
cd locivox
- Create virtual environment:
python -m venv venv
# Activate it:
# macOS/Linux:
source venv/bin/activate
# Windows:
venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
This will download the required models on first run (~140MB for base model).
๐ป Usage
Interactive Recording Mode
Record from your microphone and transcribe:
python src/cli.py
Workflow:
- Select your microphone device
- Press ENTER to start recording
- Speak into your microphone
- Press ENTER to stop
- Transcription appears in console and saves to
output/folder
Transcribe Existing Audio File
python src/cli.py --file path/to/audio.wav
Advanced Options
# Use a different model size
python src/cli.py --model small
# Force a specific language (skip auto-detection)
python src/cli.py --language es
# Change output format
python src/cli.py --output-format json
# Use custom config file
python src/cli.py --config my_config.yaml
# Combine options
python src/cli.py --file audio.mp3 --model medium --output-format srt
โ๏ธ Configuration
Edit config.yaml to customize behavior:
model:
engine: "faster-whisper" # or "openai-whisper"
size: "base" # tiny, base, small, medium, large
language: "en" # or "auto" for detection
audio:
sample_rate: 16000 # Whisper expects 16kHz
chunk_duration: 5 # Seconds per chunk
output:
format: "txt" # txt, json, srt
timestamp: true # Include timestamp in filename
Model Sizes & Performance
| Model | Size | Speed (CPU) | Quality | Memory |
|---|---|---|---|---|
| tiny | 39M | ~10x RT | Basic | <1GB |
| base | 74M | ~5x RT | Good | ~1GB |
| small | 244M | ~3x RT | Better | ~2GB |
| medium | 769M | ~1x RT | Great | ~5GB |
| large | 1.5G | ~0.5x RT | Best | ~10GB |
RT = Real-time (1x means transcribes at speaking speed)
Recommendation: Start with base for best speed/quality balance on CPU.
๐ Project Structure
locivox/
โโโ venv/ # Virtual environment (created on setup)
โโโ src/
โ โโโ __init__.py # Package init
โ โโโ cli.py # Main CLI entry point
โ โโโ audio_capture.py # Microphone recording
โ โโโ transcriber.py # STT engine wrappers
โ โโโ utils.py # Helper functions
โโโ output/ # Generated transcripts
โโโ logs/ # Application logs
โโโ models/ # Downloaded models (auto-created)
โโโ config.yaml # User configuration
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
๐ ๏ธ Troubleshooting
"No audio devices found"
# List available devices
python -c "import sounddevice; print(sounddevice.query_devices())"
"FFmpeg not found"
Ensure FFmpeg is installed and in your PATH:
ffmpeg -version
Slow transcription on CPU
- Use
faster-whisperengine (2-4x faster than openai-whisper) - Use smaller models (tiny/base)
- Reduce chunk duration in config
Import errors
Make sure virtual environment is activated:
# Check if venv is active (should show venv path)
which python # macOS/Linux
where python # Windows
๐บ๏ธ Roadmap
- Phase 1: MVP CLI (You are here!)
- Phase 2: Real-time streaming with chunked processing
- Phase 3: Enhanced CLI with speaker diarization, multiple formats
- Phase 4: GUI Desktop App with Electron/PyQt
- Phase 5: Advanced features (translation, punctuation, custom vocabulary)
- Phase 6: Multi-platform distribution with installers
See ROADMAP.md for detailed timeline.
๐ค Contributing
Contributions welcome! This is an open-source project.
Areas to contribute:
- New STT engine integrations (Vosk, Coqui, wav2vec2)
- Performance optimizations
- GUI development
- Documentation improvements
- Bug fixes and testing
๐ License
MIT License - See LICENSE file
๐ Acknowledgments
- OpenAI Whisper - State-of-the-art STT model
- Faster-Whisper - Optimized inference engine
- sounddevice - Python audio library
๐ Support
- Issues: Open an issue on GitHub
- Discussions: Start a discussion for features/ideas
- Logs: Check
logs/locivox.logfor debugging
Built with โค๏ธ for privacy-conscious developers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file locivox-0.4.1.tar.gz.
File metadata
- Download URL: locivox-0.4.1.tar.gz
- Upload date:
- Size: 77.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a42dd8a583dcd6beea8299b91ac1ff2bfeb1544bbbb8c0096f7afca7c588a258
|
|
| MD5 |
dfd01bcaef2a69366da98a52174d8d45
|
|
| BLAKE2b-256 |
3cfa1462e77ab9ddea54c2ab411e7686f560cda91057c98bdf6614dfb4462763
|
Provenance
The following attestation bundles were made for locivox-0.4.1.tar.gz:
Publisher:
release.yml on mudaye/locivox
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
locivox-0.4.1.tar.gz -
Subject digest:
a42dd8a583dcd6beea8299b91ac1ff2bfeb1544bbbb8c0096f7afca7c588a258 - Sigstore transparency entry: 956315855
- Sigstore integration time:
-
Permalink:
mudaye/locivox@fb6ef03f2a22640abd472e457ea5e93f92755db5 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/mudaye
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fb6ef03f2a22640abd472e457ea5e93f92755db5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file locivox-0.4.1-py3-none-any.whl.
File metadata
- Download URL: locivox-0.4.1-py3-none-any.whl
- Upload date:
- Size: 70.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f7f2129d8639a1b94619ea0ec208692124adfa5b959eeca94a688cc9681f958
|
|
| MD5 |
8d6754521e2c2691343ea91d9e710beb
|
|
| BLAKE2b-256 |
f9f8b9dec5a750c30e36db3fca9342e7be9eab3d976cd532fd348d0c783f3de6
|
Provenance
The following attestation bundles were made for locivox-0.4.1-py3-none-any.whl:
Publisher:
release.yml on mudaye/locivox
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
locivox-0.4.1-py3-none-any.whl -
Subject digest:
3f7f2129d8639a1b94619ea0ec208692124adfa5b959eeca94a688cc9681f958 - Sigstore transparency entry: 956315858
- Sigstore integration time:
-
Permalink:
mudaye/locivox@fb6ef03f2a22640abd472e457ea5e93f92755db5 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/mudaye
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fb6ef03f2a22640abd472e457ea5e93f92755db5 -
Trigger Event:
push
-
Statement type: