Offline speech to text using VOSK. Based on nerd-dictation.
Project description
Pytater
Offline Speech to Text for Linux.
[!IMPORTANT] This project is a fork of ideasman42's Nerd Dictation--where the original is a script meant for easy hacking, this is a full-fledged Python package, meant to provide vastly simpler setup and a Python API on top of the original CLI.
See demo video (from ideasman42).
This is a utility that provides simple access speech to text for using in Linux without being tied to a desktop environment, using the excellent VOSK-API.
Simple to set up
Pytater can be installed with a single command from PyPi (coming soon).
Configurable
Configure pytater using config files, environment variables, or the Python API (partially complete).
Zero Overhead
As pytater is activated manually, there are no background processes.
Usage
It is suggested to bind begin/end/cancel to shortcut keys.
pytater begin
pytater end
For details on how this can be used, see:
pytater --help and pytater begin --help.
Features
Specific features include:
Numbers as Digits
Optional conversion from numbers to digits.
So Three million five hundred and sixty second becomes 3,000,562nd.
A series of numbers (such as reciting a phone number) is also supported.
So Two four six eight becomes 2,468.
Time Out
Optionally end speech to text early when no speech is detected for a given number of seconds.
(without an explicit call to end which is otherwise required).
Output Type
Output can simulate keystroke events (default) or simply print to the standard output.
User configuration
TODO: fill in this section
Suspend/Resume
Initial load time can be an issue for users on slower systems or with some of the larger language-models. In this case, suspend/resume can be useful. While suspended, all data is kept in memory and the process is stopped. Audio recording is stopped and restarted on resume.
See pytater begin --help for details on how to access these options.
Dependencies
- Python 3.6.2 (or newer).
- An audio recording utility (
parecby default). - An input simulation utility (
xdotoolby default). (This is not necessary if all you're doing is printing dictated words to the terminal.)
Audio Recording Utilities
You may select one of the following tools.
pareccommand for recording from pulse-audio.soxcommand as alternative, see the guide: Using sox with pytater.pw-catcommand for recording from pipewire.
Input Simulation Utilities
You may select one of the following input simulation utilities.
- xdotool command to simulate input in X11.
- ydotool command to simulate input anywhere (X11/Wayland/TTYs). See the setup guide: Using ydotool with pytater.
- dotool command to simulate input anywhere (X11/Wayland/TTYs).
- wtype to simulate input in Wayland".
Install
With pip (not recommended, as this will install it globally):
pip3 https://github.com/paul-c-hartman/pytater.git
Or alternatively, using uv or pipx:
uv tool install --from git+https://github.com/paul-c-hartman/pytater.git pytater
# or:
pipx install git+https://github.com/paul-c-hartman/pytater.git
# This will add a `pytater` command to your PATH
Then download a model. The complete list of models is available here. To do this:
pytater download # to download the default model, or:
pytater download --model large
# Or by URL:
pytater download --model "https://alphacephei.com/path/to/model"
To test dictation:
pytater begin &
# Start speaking.
pytater end
- Reminder that it's up to you to bind begin/end/cancel to actions you can easily access (typically key shortcuts).
Details
- Typing in results will never press enter/return.
- Recording and speech to text is performed in parallel.
Examples
Store the result of speech to text as a variable in the shell:
SPEECH="$(pytater begin --timeout=1.0 --output=STDOUT)"
Limitations
- Text from VOSK is all lower-case. While the user configuration can be used to set the case of common words like
I, this isn't very convenient. - For some users the delay in start up may be noticeable on systems with slower hard disks especially when running for the 1st time (a cold start). This is a limitation with the choice not to use a service that runs in the background. Recording begins before any the speech-to-text components are loaded to mitigate this problem.
Roadmap
- Complete and documented API (partially complete)
- Proper extension support using entry points
- Reimplement certain features as post-processors
- General solution to capitalize words (proper nouns for example)
- Proper logging system
- Processing of audio files in addition to live audio
- Support Windows & macOS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytater-0.1.0.tar.gz.
File metadata
- Download URL: pytater-0.1.0.tar.gz
- Upload date:
- Size: 40.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da9a6f92f921286f1377d1b47f995fd98376ee5655e555bff031c1da1e296e64
|
|
| MD5 |
f1679d7e30b3bdca3a0b563585ddcede
|
|
| BLAKE2b-256 |
df1aad4411ae0eb6eed70448c7c1f68e3a9beecc3bc7982e907959884be39256
|
File details
Details for the file pytater-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pytater-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98c2cef02e33fa6b2889e5e31acea9274a2ac2573d33e0dd59909edb66eca379
|
|
| MD5 |
ba208153fe3e139c104e94615e70339a
|
|
| BLAKE2b-256 |
b99d30587d5cdfb7c30448b25100a1ae00b9f96232936b013002f67914856dde
|