A locally-hosted, low-latency speech-to-text solution with LLM integration.

Project description

Speak Now

A locally-hosted, low-latency speech-to-text solution with AI formatting capabilities.

Overview

Speak Now captures your speech in real-time, transcribes it, and allows you to paste it directly into any application with minimal latency. What sets it apart is the seamless integration with Google's Gemini AI to intelligently format your dictated text before pasting, all while maintaining a workflow that doesn't interrupt your focus.

Features

Minimum (and completely hidden-able) UI

Real-time Speech Recognition: Captures your speech continuously with low latency
AI-Powered Formatting: Uses Gemini 1.5 Flash to transform raw transcription into polished text
Multiple Formatting Styles: Choose between Natural, Formal, Concise, or custom formatting styles
Hotkey Controls: Use keyboard shortcuts to control all aspects of the application
Hide-able UI: Interface can be completely hidden to avoid workflow disruption
History Tracking: Access your recent transcriptions for easy reuse
Recording Toggle: Pause and resume speech recognition as needed
Customizable Configuration: Adjust settings via a TOML configuration file

Setup

Install Speak Now via pip:

pip install speak-now

For optimal performance with GPU acceleration, see the RealtimeSTT documentation.

Launch the application with:

speak-now -c <config>

The application will use default settings if no configuration file is specified.

To start in hidden mode (UI remains hidden until manually toggled):

speak-now -c <config> --hidden

Alternatively, set start_hidden = true in your configuration file.

Hotkeys

Action	Default Hotkey	Description
Paste Raw	Ctrl+`	Paste unformatted transcription text
Format & Paste	Alt+`	Format transcription with Gemini and paste
Toggle Recording	Ctrl+Alt+Space	Start/pause speech recognition
Toggle Window	Ctrl+Alt+V	Show/hide the application window

Formatting Options

Natural: Improves flow and fixes grammar while maintaining your voice
Formal: Transforms text into professional, business-appropriate language
Concise: Condenses text while preserving important information
Catgirl: Fun transformation to sound like a cute catgirl (example of custom style)
None: No formatting, equivalent to "Paste Raw"

Configuration

Speak Now uses a TOML configuration file (stt_config.toml). Key settings include:

[api]
gemini_api_key = ""  # Set your Gemini API key or use environment variable
model = "gemini-1.5-flash"  # Choose Gemini model to use

[stt]
model = "large-v2"  # Speech recognition model
timeout = 1.0  # Recognition timeout

[hotkeys]
paste_raw = "ctrl+`"
paste_formatted = "alt+`"
toggle_recording = "ctrl+alt+space"
toggle_window = "ctrl+alt+v"

[ui]
opacity = 0.90
max_history_items = 10
default_format = "Concise"
start_hidden = false  # Set to true to start with the UI hidden

[formatting_prompts]
# Customize these prompts to change formatting behavior
Natural = "Reformat this transcription to sound more natural and fix any grammar issues: "
Formal = "Reformat this transcription into formal, professional language: "
Concise = "Reformat this transcription to be more concise while preserving all important information: "
Catgirl = "Reformat this transcription to sound like a cute catgirl talking: "
None = ""  # No formatting

Current Status

This project is a work in progress. While the core functionality works well, you may encounter occasional bugs or limitations as development continues. The focus is on maintaining low latency and seamless integration with your existing workflow.

Key Benefits

Minimal Disruption: Can operate completely in the background
Low Latency: Designed for real-time use with minimal delay
Integration: Works with any application that accepts text input
Customizable Experience: Tailor the tool to your specific needs
Privacy-Focused: Speech recognition runs locally

Building from Source

To build wheels manually, run the following commands:

python -m pip install build twine
python -m build
twine check dist/*
twine upload dist/*

License

The project uses MIT License. See LICENSE for details.

Project details

Release history Release notifications | RSS feed

This version

0.1.3

Mar 1, 2025

0.1.2

Mar 1, 2025

0.1.1

Mar 1, 2025

0.1.0

Mar 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speak_now-0.1.3.tar.gz (177.1 kB view details)

Uploaded Mar 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speak_now-0.1.3-py3-none-any.whl (19.1 kB view details)

Uploaded Mar 1, 2025 Python 3

File details

Details for the file speak_now-0.1.3.tar.gz.

File metadata

Download URL: speak_now-0.1.3.tar.gz
Upload date: Mar 1, 2025
Size: 177.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for speak_now-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`cfef7398075ec265dd25684df9233750d1772e36b49416a671fd9fa068eeb684`
MD5	`7debd23482117692b6780d8bba80d859`
BLAKE2b-256	`fe1d4d2a9cdc62383b7b5cb538d32df4e0cc6e6004ff36683778face18a46c47`

See more details on using hashes here.

File details

Details for the file speak_now-0.1.3-py3-none-any.whl.

File metadata

Download URL: speak_now-0.1.3-py3-none-any.whl
Upload date: Mar 1, 2025
Size: 19.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for speak_now-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e680b0c7fe7c78536d77c5c2728fb3ab0692b18e8f88862b768e99a3e092f88e`
MD5	`3bb2c5976513c1ac805c5f590e665cc6`
BLAKE2b-256	`7544bfa649d4c05fae45cd379f08be39ba36210b4d08f91c59d11f790a8b8977`

See more details on using hashes here.

speak-now 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Speak Now

Overview

Features

Setup

Hotkeys

Formatting Options

Configuration

Current Status

Key Benefits

Building from Source

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes