Type with your voice using hotkey-activated speech recognition
Project description
voiceType - type with your voice
Features
- Press a hotkey (default:
Pause/Breakkey) to start recording audio. - Release the hotkey to stop recording.
- The recorded audio is transcribed to text (e.g., using OpenAI's Whisper model).
- The transcribed text is typed into the currently active application.
Prerequisites
- Python 3.8+
pip(Python package installer)- For Linux installation:
systemd(common in most modern Linux distributions). - An OpenAI API Key (if using OpenAI for transcription).
Installation
Option 1: Install from PyPI
pip install voicetype2
Option 2: Install from Source
-
Clone the repository (including submodules):
git clone --recurse-submodules https://github.com/Adam-D-Lewis/voicetype.git cd voicetype
If you already cloned without
--recurse-submodules, initialize the submodules:git submodule update --init --recursive
-
Set up a Python virtual environment (recommended):
python3 -m venv .venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
-
Install the package and its dependencies: This project uses
pyproject.tomlwithsetuptools. Install thevoicetypepackage and its dependencies using pip:pip install .
This command reads
pyproject.toml, installs all necessary dependencies, and makes thevoicetypescript available (callable aspython -m voicetype). -
Run the installation script (for Linux with systemd): If you are on Linux and want to run VoiceType as a systemd user service (recommended for background operation and auto-start on login), use the CLI entrypoint installed with the package. Ensure you're in the environment where you installed dependencies.
voicetype installDuring install you'll be prompted to choose a provider [litellm, local]. If you choose
litellmyou'll then be prompted for yourOPENAI_API_KEY. Values are stored in~/.config/voicetype/.envwith restricted permissions.The script will:
- Create a systemd service file at
~/.config/systemd/user/voicetype.service. - Store your OpenAI API key in
~/.config/voicetype/.env(with restricted permissions). - Reload the systemd user daemon, enable the
voicetype.serviceto start on login, and start it immediately.
For other operating systems, or if you prefer not to use the systemd service on Linux, you can run the application directly after installation (see Usage).
- Create a systemd service file at
Configuration
VoiceType can be configured using a settings.toml file. The application looks for configuration files in the following locations (in priority order):
./settings.toml- Current directory~/.config/voicetype/settings.toml- User config directory/etc/voicetype/settings.toml- System-wide config
Available Settings
VoiceType uses a pipeline-based configuration system. See settings.example.toml for a complete, documented example configuration including:
- Stage definitions (RecordAudio, Transcribe, CorrectTypos, TypeText, LLMAgent)
- Local and cloud transcription options with fallback support
- Pipeline configuration with hotkey bindings
- Telemetry and logging settings
Note: If you used voicetype install and configured litellm during installation, your API key is stored separately in ~/.config/voicetype/.env.
Monitoring Pipeline Performance with OpenTelemetry
VoiceType includes built-in OpenTelemetry instrumentation to track pipeline execution and stage performance. When enabled, traces are exported to a local file for offline analysis.
Enabling Telemetry
Telemetry is disabled by default. To enable it, add to your settings.toml:
[telemetry]
enabled = true
Trace File Location
Traces are automatically saved to:
- Linux:
~/.config/voicetype/traces.jsonl - macOS:
~/Library/Application Support/voicetype/traces.jsonl - Windows:
%APPDATA%/voicetype/traces.jsonl
What You Can See
Each pipeline execution creates a trace with:
- Overall pipeline duration - Total time from start to finish
- Individual stage timings - How long each stage (RecordAudio, Transcribe, etc.) took
- Pipeline metadata - Pipeline name, ID, stage count
- Error tracking - Any exceptions or failures with stack traces
Example Trace
Each span is written as a JSON line:
{
"name": "pipeline.default",
"context": {...},
"start_time": 1234567890,
"end_time": 1234567895,
"attributes": {
"pipeline.id": "abc-123",
"pipeline.name": "default",
"pipeline.duration_ms": 5200
}
}
Managing Trace Files
Automatic rotation:
Trace files are automatically rotated when they reach 10 MB. Rotated files are timestamped (e.g., traces.20250117_143022.jsonl) and kept indefinitely.
View traces:
# Pretty-print the current trace file
cat ~/.config/voicetype/traces.jsonl | jq
# View all trace files (including rotated)
cat ~/.config/voicetype/traces*.jsonl | jq
# Or just view in any text editor
cat ~/.config/voicetype/traces.jsonl
Clear old traces:
# Delete all trace files
rm ~/.config/voicetype/traces*.jsonl
Analyze with grep:
# Find slow stages in current file
grep "duration_ms" ~/.config/voicetype/traces.jsonl | grep -E "duration_ms\":[0-9]{4,}"
# Search across all trace files
grep "duration_ms" ~/.config/voicetype/traces*.jsonl | grep -E "duration_ms\":[0-9]{4,}"
Configuration
Custom trace file location:
[telemetry]
enabled = true
trace_file = "~/my-custom-traces.jsonl"
Adjust rotation size or disable rotation:
[telemetry]
enabled = true
rotation_max_size_mb = 50 # Rotate at 50 MB instead of 10 MB
# Or disable rotation entirely
# rotation_enabled = false
Export to OTLP endpoint only (disable file export):
[telemetry]
enabled = true
export_to_file = false
otlp_endpoint = "http://localhost:4317"
Usage
- If using the Linux systemd service: The service will start automatically on login. VoiceType will be listening for the hotkey in the background.
- To run manually (e.g., for testing or on non-Linux systems):
Activate your virtual environment and run:
python -m voicetype
Using the Hotkey:
- Press and hold the configured hotkey (default is
Pause/Break). - Speak clearly.
- Release the hotkey to stop recording.
- The transcribed text should then be typed into your currently active application.
Managing the Service (Linux with systemd)
If you used voicetype install:
-
Check service status:
voicetype statusAlternatively:
systemctl --user status voicetype.service
-
View service logs:
journalctl --user -u voicetype.service -f
-
Restart the service: (e.g., after changing the
OPENAI_API_KEYin~/.config/voicetype/.env)systemctl --user restart voicetype.service
-
Stop the service:
systemctl --user stop voicetype.service
-
Start the service manually (if not enabled to start on login):
systemctl --user start voicetype.service
-
Disable auto-start on login:
systemctl --user disable voicetype.service
-
Enable auto-start on login (if previously disabled):
systemctl --user enable voicetype.service
Uninstallation (Linux with systemd)
To stop the service, disable auto-start, and remove the systemd service file and associated configuration:
voicetype uninstall
This will:
- Stop and disable the
voicetype.service. - Remove the service file (
~/.config/systemd/user/voicetype.service). - Remove the environment file (
~/.config/voicetype/.envcontaining your API key). - Attempt to remove the application configuration directory (
~/.config/voicetype) if it's empty.
If you installed the package using pip install ., you can uninstall it from your Python environment with:
pip uninstall voicetype
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Architecture
VoiceType uses a pipeline-based architecture with resource-based concurrency control. See docs/ARCHITECTURE.md for:
- Complete system architecture diagram (Mermaid UML)
- Component descriptions and responsibilities
- Execution flow and lifecycle
- Design principles and extension points
Vendored Dependencies
VoiceType includes a vendored version of pynput located in voicetype/_vendor/pynput/. This vendored version includes a not-yet-merged bug fix and allows for better control over keyboard/mouse input handling functionality across different platforms
Development
Preferred workflow: Pixi
- Pixi is the preferred way to create and manage the development environment for this project. It ensures reproducible, cross-platform setups using the definitions in pyproject.toml.
Setup Pixi
- Install Pixi:
- Linux/macOS (official installer):
- curl -fsSL https://pixi.sh/install.sh | bash
- macOS (Homebrew):
- brew install prefix-dev/pixi/pixi
- Verify:
- pixi --version
- Linux/macOS (official installer):
Development Environments
Available Pixi environments:
- local: Standard development environment (default)
pixi install -e local && pixi shell -e local
- dev: Development with testing tools
pixi install -e dev && pixi shell -e dev
- cpu: CPU-only (no CUDA dependencies)
pixi install -e cpu && pixi shell -e cpu
- windows-build: Build Windows installers (PyInstaller + dependencies)
pixi install -e windows-build && pixi shell -e windows-build
Run the application
- pixi run voicetype
- Equivalent to:
- python -m voicetype
- Equivalent to:
Run tests
- If a test task is defined:
- pixi run test
- Otherwise (pytest directly):
- pixi run python -m pytest
Lint and format
- If tasks are defined:
- pixi run lint
- pixi run fmt
- Or run tools directly:
- pixi run ruff format
- pixi run ruff check .
Pre-commit hooks (recommended)
- Install hooks:
- pixi run pre-commit install
- Run on all files:
- pixi run pre-commit run --all-files
Building Windows Installers (Windows only)
Using Pixi:
- Setup build environment:
pixi install -e windows-buildpixi shell -e windows-build
- Install NSIS (one-time):
- Download from https://nsis.sourceforge.io/Download
- Or via Chocolatey:
choco install nsis
- Build installer:
pixi run -e windows-build build-windows- Output:
dist/VoiceType-Setup.exe
Or build executable only (no installer):
pixi run -e windows-build build-exe- Output:
dist/voicetype/voicetype.exe
Clean build artifacts:
pixi run -e windows-build clean-build
See docs/BUILDING.md for detailed build instructions.
Alternative: Python venv (fallback)
- Ensure Python 3.11+ is installed.
- Create and activate a venv:
- python -m venv .venv
- source .venv/bin/activate
- Editable install with dev dependencies:
- pip install -U pip
- pip install -e ".[dev]"
- Run the app:
- python -m voicetype
Notes
- Dependency definitions live in pyproject.toml
- After changing dependencies, update pyproject.toml then run:
- pixi install
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicetype2-0.4.1.tar.gz.
File metadata
- Download URL: voicetype2-0.4.1.tar.gz
- Upload date:
- Size: 600.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd2dd925f50748902591c582d73c3bcf3a2157b9fea3b909b8fb91ef1d01378c
|
|
| MD5 |
57fc770a87f3ba128142c78754e213e5
|
|
| BLAKE2b-256 |
94de3fd485c78edf6cac1c64d16c69f86a93f6eb7b99d2adba4eebb5bc648246
|
Provenance
The following attestation bundles were made for voicetype2-0.4.1.tar.gz:
Publisher:
release.yaml on Adam-D-Lewis/voiceType
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voicetype2-0.4.1.tar.gz -
Subject digest:
bd2dd925f50748902591c582d73c3bcf3a2157b9fea3b909b8fb91ef1d01378c - Sigstore transparency entry: 969484343
- Sigstore integration time:
-
Permalink:
Adam-D-Lewis/voiceType@51a3b64b2a0ba64d8e3a5e39ba6d14399e72e705 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/Adam-D-Lewis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@51a3b64b2a0ba64d8e3a5e39ba6d14399e72e705 -
Trigger Event:
release
-
Statement type:
File details
Details for the file voicetype2-0.4.1-py3-none-any.whl.
File metadata
- Download URL: voicetype2-0.4.1-py3-none-any.whl
- Upload date:
- Size: 369.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e53d840612521cc424d832df4dbff9b17131224cfec7bcc5486f86beb6ff105
|
|
| MD5 |
513b67ffcd6c803482c45dd94451a167
|
|
| BLAKE2b-256 |
bbecc3e61d436703023d176d012c929e4f2d830c0326b5935545a90f7267c46c
|
Provenance
The following attestation bundles were made for voicetype2-0.4.1-py3-none-any.whl:
Publisher:
release.yaml on Adam-D-Lewis/voiceType
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voicetype2-0.4.1-py3-none-any.whl -
Subject digest:
5e53d840612521cc424d832df4dbff9b17131224cfec7bcc5486f86beb6ff105 - Sigstore transparency entry: 969484350
- Sigstore integration time:
-
Permalink:
Adam-D-Lewis/voiceType@51a3b64b2a0ba64d8e3a5e39ba6d14399e72e705 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/Adam-D-Lewis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@51a3b64b2a0ba64d8e3a5e39ba6d14399e72e705 -
Trigger Event:
release
-
Statement type: