GOOBITS STT - Pure speech-to-text engine with multiple operation modes
Project description
๐๏ธ Goobits STT
A pure speech-to-text engine with multiple operation modes and advanced text formatting. Features real-time transcription, WebSocket server capabilities, and comprehensive text processing with internationalization support. Built on Whisper models for accurate transcription across various languages and use cases.
๐ Related Projects
- Matilda - AI assistant
- Goobits STT - Speech-to-Text engine (this project)
- Goobits TTS - Text-to-Speech engine
- Goobits TTT - Text-to-Text processing
๐ Table of Contents
- Installation
- Basic Usage
- Configuration
- Operation Modes
- Performance Optimization
- Text Formatting Features
- Server Deployment
- Testing & Development
- Model Comparison
- Audio Features
- Tech Stack
๐ฆ Installation
# Install globally with pipx (recommended)
pipx install . # Install globally, isolated environment
pipx install .[dev] # Install with development dependencies
# Or with pip for development
pip install -e .[dev] # Install editable with dev dependencies
stt --version # Verify installation
stt --listen-once # Test basic functionality
๐ฏ Basic Usage
stt --listen-once # Single utterance with VAD
stt --conversation # Always listening mode
stt --tap-to-talk=f8 # Tap F8 to start/stop recording
stt --hold-to-talk=space # Hold spacebar to record
stt --server --port=8769 # Run WebSocket server
โ๏ธ Configuration
# Edit main configuration
nano config.json
# Configure Whisper model
stt --model large-v3-turbo --language en
# Audio settings
stt --device "USB Audio" --sample-rate 16000
# Output formats
stt --format json | jq -r '.text'
stt --format text --no-formatting
๐ค Operation Modes
# Quick transcription
stt --listen-once | llm-process
# Interactive conversation
stt --conversation | tts-speak
# Hotkey control
stt --tap-to-talk=f8 # Toggle recording with F8
stt --hold-to-talk=ctrl+space # Push-to-talk mode
# Server mode for remote clients
stt --server --host 0.0.0.0 --port 8769
๐ Performance Optimization
# GPU acceleration (if available)
stt --model base --device cuda
# CPU optimization
stt --model tiny --device cpu
# Model selection by speed/quality
stt --model tiny # Fastest, lower quality
stt --model base # Balanced (default)
stt --model large-v3-turbo # Best quality
๐ญ Text Formatting Features
# Advanced entity detection
stt --listen-once # "Call me at 555-123-4567" โ "Call me at (555) 123-4567"
stt --listen-once # "Go to github dot com" โ "Go to github.com"
stt --listen-once # "Three point one four" โ "3.14"
# Multilingual support
stt --language es # Spanish formatting rules
stt --language en # English formatting (default)
# Disable formatting
stt --no-formatting # Raw transcription output
๐ง Server Deployment
# Basic server
stt --server
# Production with SSL
stt --server --port 443 --host 0.0.0.0
# Docker deployment
docker run -p 8080:8080 -p 8769:8769 sttservice/transcribe
๐ฏ Testing & Development
# Run test suite
pytest # All tests
pytest tests/text_formatting/ # Specific module
pytest -v -n auto # Parallel with verbose output
# Code quality
ruff check src/ tests/ # Linting
black src/ tests/ stt.py # Formatting
mypy src/ stt.py # Type checking
# Test with real audio
pytest tests/__fixtures__/audio/
๐ง Model Comparison
| Model | Speed | Quality | Memory | Best For |
|---|---|---|---|---|
| tiny | โก Fastest | ๐ Basic | ๐พ 39MB | Real-time, low resources |
| base | ๐ฅ Fast | ๐๐ Good | ๐พ 74MB | General use (default) |
| small | โก Quick | ๐๐๐ Better | ๐พ 244MB | Accuracy balance |
| medium | ๐ฅ Moderate | ๐๐๐๐ Great | ๐พ 769MB | High accuracy |
| large-v3-turbo | ๐ฅ Fast | ๐ Best | ๐พ 1550MB | Production quality |
Choose based on your speed/accuracy requirements and available system resources.
๐๏ธ Audio Features
- Real-time streaming: Opus audio encoding for efficient transmission
- Voice Activity Detection: Automatic speech detection and silence handling
- Multiple input devices: Support for various microphones and audio interfaces
- Hotkey integration: System-wide keyboard shortcuts for hands-free operation
- Background operation: Run as daemon with minimal resource usage
๐ ๏ธ Tech Stack
Core Technologies
- ๐ง AI/ML: OpenAI Whisper (faster-whisper), CTranslate2, PyTorch
- ๐๏ธ Audio: OpusLib, NumPy, custom pipe-based audio capture
- โจ๏ธ System: pynput for global hotkeys, cross-platform support
Text Processing
- ๐ NLP: spaCy, deepmultilingualpunctuation
- ๐ i18n: Multi-language entity detection and formatting
- ๐ง Parsing: pyparsing for complex text transformations
- ๐ Output: JSON/text formatting with rich entity support
Development & Testing
- ๐งช Testing: pytest with asyncio, xdist, custom plugins
- ๐ Quality: ruff (linting), black (formatting), mypy (typing)
- ๐ Security: bandit for security analysis
- ๐ฆ Build: setuptools, pyproject.toml configuration
Deployment
- ๐ณ Containerization: Docker with CUDA 12.1 support
- ๐ฅ๏ธ Interface: FastAPI admin dashboard (Docker), responsive web UI
- ๐ Security: JWT authentication, RSA+AES encryption (Docker)
- ๐ Monitoring: Structured logging, health checks
- โ๏ธ Cloud: Ready for production deployment with SSL/TLS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file goobits_stt-1.0.0.tar.gz.
File metadata
- Download URL: goobits_stt-1.0.0.tar.gz
- Upload date:
- Size: 143.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67756f1282532f9a2e29208e210dbbe3f8f3756b3f53f8bccc4c745179031cb4
|
|
| MD5 |
b66f3663e9f0d81fa9fac4a6269897cc
|
|
| BLAKE2b-256 |
5eb5d3529f95e6fbc3d74c8f730e1bec77e43de7a022fb4ebe83c63ab36d5a8d
|
File details
Details for the file goobits_stt-1.0.0-py3-none-any.whl.
File metadata
- Download URL: goobits_stt-1.0.0-py3-none-any.whl
- Upload date:
- Size: 161.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcfbe0da5e8fc548ab9e3bc4139ac35eea210e09c3a413b4a274f6cc6f7ca857
|
|
| MD5 |
9e0513543b069630e1d8d4cdf02982ec
|
|
| BLAKE2b-256 |
9954d1aa492e45fa3952e0746bb14a9469a584bf60c6270df8cade1e3d11e625
|