Skip to main content

Add your description here

Project description

Speak Now

A locally-hosted, low-latency speech-to-text solution with LLM integration.

Overview

alt text

Speak Now captures your speech in real-time and allows you to paste it as text with optional LLM-based formatting. It's designed to be lightweight and efficient for everyday use.

Features

  • Real-time transcription using local speech recognition
  • Keyboard shortcuts for quick actions:
    • Toggle recording: Ctrl+Alt+Space
    • Paste raw text: Ctrl+
    • Format and paste: Alt+
  • Text formatting via Google's Gemini API
  • Format options include:
    • Natural - smooths out transcription
    • Formal - professional language
    • Concise - preserves key information while reducing length
    • Catgirl - adds a playful style (example custom format)
    • None - no formatting
  • Simple GUI for monitoring status and selecting format options

Setup

  1. Clone this repository
  2. Install dependencies: pip install -e .
  3. Set up your Gemini API key in stt_config.toml or as environment variable
  4. Run the application: python stt_cache_v2.py

Configuration

A default config file will be generated on first run. You can customize:

  • API settings (Gemini key, model)
  • Speech-to-text model and options
  • Keyboard shortcuts
  • UI settings
  • Formatting prompts

Current Status

This project is a work in progress. Basic functionality is implemented but you may encounter bugs or limitations. Contributions and feedback are welcome!

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speak_now-0.1.0.tar.gz (168.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speak_now-0.1.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file speak_now-0.1.0.tar.gz.

File metadata

  • Download URL: speak_now-0.1.0.tar.gz
  • Upload date:
  • Size: 168.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for speak_now-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7cb00951d4450af5a79fb692823eccc85389a59b833cc4d0a5d301fc77e74fad
MD5 2d408badd4122c014ffb4af4be27f39b
BLAKE2b-256 cd1f0aa8dfeaf5c8afe796225a31abadeed2d5296d8ac139fe9a8793afac1ab0

See more details on using hashes here.

File details

Details for the file speak_now-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: speak_now-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for speak_now-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18589ae6f464984ed9b3ffc9335e10ce7b66e83accf218ad3a920516d094785f
MD5 4a5389ec820310d00bf5eab1334cfbc7
BLAKE2b-256 622441e11f3a15d5bbd96fc4e60611315939522d0e5ef15ebbc1c02b26939cad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page