Skip to main content

Offline speech to text using VOSK. Based on nerd-dictation.

Project description

Pytater

Offline Speech to Text for Linux.

[!IMPORTANT] This project is a fork of ideasman42's Nerd Dictation--where the original is a script meant for easy hacking, this is a full-fledged Python package, meant to provide vastly simpler setup and a Python API on top of the original CLI.

See demo video (from ideasman42).

This is a utility that provides simple access speech to text for using in Linux without being tied to a desktop environment, using the excellent VOSK-API.

Simple to set up
   Pytater can be installed with a single command from PyPi (coming soon).
Configurable
   Configure pytater using config files, environment variables, or the Python API (partially complete).
Zero Overhead
   As pytater is activated manually, there are no background processes.

Usage

It is suggested to bind begin/end/cancel to shortcut keys.

pytater begin
pytater end

For details on how this can be used, see: pytater --help and pytater begin --help.

Features

Specific features include:

Numbers as Digits

Optional conversion from numbers to digits.

So Three million five hundred and sixty second becomes 3,000,562nd.

A series of numbers (such as reciting a phone number) is also supported.

So Two four six eight becomes 2,468.

Time Out

Optionally end speech to text early when no speech is detected for a given number of seconds. (without an explicit call to end which is otherwise required).

Output Type

Output can simulate keystroke events (default) or simply print to the standard output.

User configuration

TODO: fill in this section

Suspend/Resume

Initial load time can be an issue for users on slower systems or with some of the larger language-models. In this case, suspend/resume can be useful. While suspended, all data is kept in memory and the process is stopped. Audio recording is stopped and restarted on resume.

See pytater begin --help for details on how to access these options.

Dependencies

  • Python 3.6.2 (or newer).
  • An audio recording utility (parec by default).
  • An input simulation utility (xdotool by default). (This is not necessary if all you're doing is printing dictated words to the terminal.)

Audio Recording Utilities

You may select one of the following tools.

  • parec command for recording from pulse-audio.
  • sox command as alternative, see the guide: Using sox with pytater.
  • pw-cat command for recording from pipewire.

Input Simulation Utilities

You may select one of the following input simulation utilities.

Install

With pip (not recommended, as this will install it globally):

pip3 https://github.com/paul-c-hartman/pytater.git

Or alternatively, using uv or pipx:

uv tool install --from git+https://github.com/paul-c-hartman/pytater.git pytater
# or:
pipx install git+https://github.com/paul-c-hartman/pytater.git
# This will add a `pytater` command to your PATH

Then download a model. The complete list of models is available here. To do this:

pytater download # to download the default model, or:
pytater download --model large
# Or by URL:
pytater download --model "https://alphacephei.com/path/to/model"

To test dictation:

pytater begin &
# Start speaking.
pytater end
  • Reminder that it's up to you to bind begin/end/cancel to actions you can easily access (typically key shortcuts).

Details

  • Typing in results will never press enter/return.
  • Recording and speech to text is performed in parallel.

Examples

Store the result of speech to text as a variable in the shell:

SPEECH="$(pytater begin --timeout=1.0 --output=STDOUT)"

Limitations

  • Text from VOSK is all lower-case. While the user configuration can be used to set the case of common words like I, this isn't very convenient.
  • For some users the delay in start up may be noticeable on systems with slower hard disks especially when running for the 1st time (a cold start). This is a limitation with the choice not to use a service that runs in the background. Recording begins before any the speech-to-text components are loaded to mitigate this problem.

Roadmap

  • Complete and documented API (partially complete)
  • Proper extension support using entry points
  • Reimplement certain features as post-processors
    • General solution to capitalize words (proper nouns for example)
  • Proper logging system
  • Processing of audio files in addition to live audio
  • Support Windows & macOS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytater-0.1.0.tar.gz (40.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytater-0.1.0-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file pytater-0.1.0.tar.gz.

File metadata

  • Download URL: pytater-0.1.0.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pytater-0.1.0.tar.gz
Algorithm Hash digest
SHA256 da9a6f92f921286f1377d1b47f995fd98376ee5655e555bff031c1da1e296e64
MD5 f1679d7e30b3bdca3a0b563585ddcede
BLAKE2b-256 df1aad4411ae0eb6eed70448c7c1f68e3a9beecc3bc7982e907959884be39256

See more details on using hashes here.

File details

Details for the file pytater-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pytater-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pytater-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98c2cef02e33fa6b2889e5e31acea9274a2ac2573d33e0dd59909edb66eca379
MD5 ba208153fe3e139c104e94615e70339a
BLAKE2b-256 b99d30587d5cdfb7c30448b25100a1ae00b9f96232936b013002f67914856dde

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page