Use your speech to write to the current caret position!
Project description
speech2caret
Use your speech to write to the current caret position!
Goals
- ✅ Simple: A minimalist tool that does one thing well.
- ✅ Local: Runs entirely on your machine (uses Hugging Face models for speech recognition).
- ✅ Efficient: Optimised for low CPU and memory usage, thanks to an event-driven architecture that responds instantly to key presses without wasting resources.
Note: Tested only on Linux (Ubuntu). Other operating systems are currently unsupported.
Demo (turn volume on):
Installation
1. System Dependencies
First, install the required system libraries:
sudo apt update
sudo apt install libportaudio2 ffmpeg
2. Grant Permissions
To read keyboard events and simulate key presses, evdev needs access to your keyboard input device. Add your user to the input group to grant the necessary permissions:
sudo usermod -aG input $USER
newgrp input # or log out and back in
3. Install and Run
You can install and run speech2caret using pip or uv:
# Install the package
uv add speech2caret # or pip install speech2caret
# Run the application
speech2caret
Alternatively, you can run it directly without installation using uvx(the --index pytorch-cpu=... flag ensures only CPU packages are downloaded, avoiding GPU-related dependencies):
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu --from speech2caret speech2caret
Configuration
The first time you run speech2caret, it creates a config file at ~/.config/speech2caret/config.ini.
You’ll need to manually edit it with the following values:
keyboard_device_path
This is the path to your keyboard input device. You can find the path either following this, or by running the command below and looking for an entry that ends with -event-kbd.
ls /dev/input/by-path/
start_stop_key and resume_pause_key
These are the keys you'll use to control the app.
To find the correct name for a key, you can use the provided Python script below. First, ensure you have your keyboard_device_path from the step above, then run this command:
uvx --from evdev python -c '
keyboard_device_path = "PASTE_YOUR_KEYBOARD_DEVICE_PATH_HERE"
from evdev import InputDevice, categorize, ecodes, KeyEvent
dev = InputDevice(keyboard_device_path)
print(f"Listening for key presses on {dev.name}...")
for event in dev.read_loop():
if event.type == ecodes.EV_KEY:
key_event = categorize(event)
if key_event.keystate == KeyEvent.key_down:
print(f" {key_event.keycode}")
'
Press the keys you wish to use, and their names will be printed to the terminal. For a full list of available key names, see here.
Additional (Optional) Configuration
You can configure audio cues to notify when recording has started, stopped, paused, or resumed. To do this, update
the start_recording_audio_path, stop_recording_audio_path, resume_recording_audio_path, and pause_recording_audio_path
config variables in ~/.config/speech2caret/config.ini with the absolute paths to your choice of audio files.
Word Replacement
You can define custom word or phrase replacements in the [word_replacement] section of ~/.config/speech2caret/config.ini file.
This allows you to automatically substitute specific spoken words with desired text.
For example, to replace "new line" with a newline character or " underscore " with _, you can configure it as follows:
[word_replacement]
"new line" = "\n"
" underscore " = "_"
How to Use
- Run the
speech2caretcommand in your terminal. - Press your configured
start_stop_keyto begin recording. - Press the
resume_pause_keyto toggle between pausing and resuming. - When you are finished, press the
start_stop_keyagain. - The recorded audio will be transcribed and typed at your current caret position.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech2caret-0.3.0.tar.gz.
File metadata
- Download URL: speech2caret-0.3.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92457870e7d26c4e99fc38ae097a380df32902a658333d84e1dc6d8a754fa021
|
|
| MD5 |
039ba127270cdcb38d85627c9304c08e
|
|
| BLAKE2b-256 |
4626f365493f6177587880d1ffe7d5c74fd2c6f1cf917acc5ccb10bfd51c2a83
|
Provenance
The following attestation bundles were made for speech2caret-0.3.0.tar.gz:
Publisher:
publish.yaml on asmith26/speech2caret
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
speech2caret-0.3.0.tar.gz -
Subject digest:
92457870e7d26c4e99fc38ae097a380df32902a658333d84e1dc6d8a754fa021 - Sigstore transparency entry: 701427801
- Sigstore integration time:
-
Permalink:
asmith26/speech2caret@0653039f3a8dcdebe8916e2fa1f63a34a948c729 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/asmith26
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@0653039f3a8dcdebe8916e2fa1f63a34a948c729 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file speech2caret-0.3.0-py3-none-any.whl.
File metadata
- Download URL: speech2caret-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e797f364bb04a491a27bae32f26efa2f54fa986d44416d7c694f033058768224
|
|
| MD5 |
dc46b54da2e195962ef0c3bb42eaa2d2
|
|
| BLAKE2b-256 |
fef8f285e31edcbf0365337da4cffdcffe7f729133c701fe17637af09bbaa810
|
Provenance
The following attestation bundles were made for speech2caret-0.3.0-py3-none-any.whl:
Publisher:
publish.yaml on asmith26/speech2caret
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
speech2caret-0.3.0-py3-none-any.whl -
Subject digest:
e797f364bb04a491a27bae32f26efa2f54fa986d44416d7c694f033058768224 - Sigstore transparency entry: 701427815
- Sigstore integration time:
-
Permalink:
asmith26/speech2caret@0653039f3a8dcdebe8916e2fa1f63a34a948c729 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/asmith26
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@0653039f3a8dcdebe8916e2fa1f63a34a948c729 -
Trigger Event:
workflow_dispatch
-
Statement type: