Voice-powered research assistant for physical books and papers
Project description
Klaus
A voice-based research assistant for reading physical papers and books. Place a paper under a document camera (or phone on a tripod), speak a question, and Klaus sees the page and answers aloud in a natural voice.
Stack: Claude Sonnet 4 (vision + tool use) | Moonshine Medium local STT | OpenAI gpt-4o-mini-tts | Tavily web search | PyQt6 desktop UI | SQLite memory
Platforms: Windows and macOS
Quick Setup
Windows:
pip install pipx && pipx ensurepath
Restart your terminal, then:
pipx install klaus-assistant
klaus
macOS:
brew tap bgigurtsis/klaus
brew install klaus
klaus
On first launch, a setup wizard walks you through API keys, camera, mic, and voice model setup.
Updating
Windows: pipx upgrade klaus-assistant
macOS: brew upgrade klaus
Camera Setup
Required: A camera is required for Klaus to ingest what you're currently reading and to use it as context.
A USB document camera (AKA visualiser) is reccomended. Alternatively, a phone on a gooseneck mount (~$10-15) pointed straight down at your reading surface works. Either should gives Klaus a clear, stable view of the full page.
Some reccomended apps to connect your phone to your computer are listed below:
| Setup | App |
|---|---|
| macOS + iPhone | Built-in -- Continuity Camera (iOS 16+, macOS Ventura+, no install needed) |
| macOS + Android | Camo (free, 1080p) -- install on phone + Mac, pair via QR or USB |
| Windows + Android | DroidCam (free) -- install on phone + PC, connect over Wi-Fi or USB |
| Windows + iPhone | Camo (free, 1080p) -- install on phone + PC, pair via QR or USB |
Klaus auto-detects portrait orientation and rotates the image. Override with camera_rotation in ~/.klaus/config.toml if needed.
Other install options
Prerequisites: Python 3.11+, camera, mic, speakers. On Windows, install Visual C++ Build Tools (Desktop development with C++) so webrtcvad can compile. On macOS without Homebrew: brew install python@3.13 portaudio.
From source (development):
git clone https://github.com/bgigurtsis/Klaus.git
cd Klaus
pip install -e .
klaus
API keys: The setup wizard asks for them on first launch, or add to ~/.klaus/config.toml under [api_keys]: Anthropic, OpenAI, Tavily (free tier: 1,000 searches/mo). Optional: OBSIDIAN_VAULT_PATH in .env for Obsidian notes.
Latency & Cost
End-to-end latency from question to first spoken word is 2-3 seconds (STT + Claude + first TTS chunk). TTS streams sentence-by-sentence so playback starts before the full response is generated.
| Usage | Approx. cost |
|---|---|
| 10 questions | ~$0.05 |
| 50 questions | ~$0.25 |
| 100 questions/day | ~$2.50-3.50/day |
Largest cost driver is Claude (vision + context window). STT is free (local). TTS is $0.015/min of generated audio.
Usage
Klaus captures the page image when your question ends and sends it with your transcript to Claude. If Claude is uncertain about a claim, it searches the web via Tavily before answering.
Configuration
Settings live in ~/.klaus/config.toml (created on first run). Edit any line to override defaults:
| Setting | Default | Notes |
|---|---|---|
hotkey |
F2 |
Push-to-talk key, works without app focus |
input_mode |
voice_activation |
Or push_to_talk |
voice |
cedar |
Options: coral, nova, alloy, ash, ballad, echo, fable, onyx, sage, shimmer, verse, cedar, marin |
tts_speed |
1.0 |
0.25 to 4.0 |
camera_index |
0 |
Change if you have multiple cameras |
camera_rotation |
auto |
auto, none, 90, 180, 270 |
camera_width / camera_height |
1920 / 1080 |
Camera resolution |
vad_sensitivity |
3 |
0-3, higher = more aggressive noise filtering |
vad_silence_timeout |
1.5 |
Seconds of silence before voice activation finalizes |
log_level |
INFO |
DEBUG, INFO, WARNING, ERROR |
Architecture
Mic --> WebRTC VAD --> Moonshine Medium (local STT) --\
--> Claude (vision + tools) --> TTS --> Speakers
Camera (live feed) -----------------------------------/ |
+--> Tavily (web search)
+--> Obsidian (notes)
+--> SQLite (memory)
+--> Chat UI
Speech-to-text runs entirely locally via Moonshine Medium (245M params, ~300ms latency, no API cost). Voice activation uses WebRTC VAD with multi-stage filtering (voiced ratio, RMS loudness, contiguous voiced runs) to reject background noise before audio reaches STT.
Module Layout
| Module | Role |
|---|---|
config.py |
Config, API keys, system prompt, voice settings |
camera.py |
OpenCV background thread, frame capture, auto-rotation |
audio.py |
Push-to-talk recorder (sounddevice), VAD recorder, WAV buffer |
stt.py |
Moonshine Voice local speech-to-text |
tts.py |
OpenAI gpt-4o-mini-tts with sentence-level streaming |
brain.py |
Claude vision + tool use, conversation history, tool-use loop |
search.py |
Tavily web search, exposed as a Claude tool |
notes.py |
Obsidian vault note-taking, exposed as Claude tools |
memory.py |
SQLite persistence (sessions, exchanges, knowledge profile) |
ui/ |
PyQt6 GUI (main window, camera, chat, sessions, status, theme, setup wizard, settings) |
main.py |
Wires everything together, hotkey listener, Qt signal bridge |
Data
- Config:
~/.klaus/config.toml - Database:
~/.klaus/klaus.db(sessions, exchanges, knowledge profile) - No images stored, only a short hash of each page capture
- Delete the database to start fresh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file klaus_assistant-0.3.0.tar.gz.
File metadata
- Download URL: klaus_assistant-0.3.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b60658add4f80c8401204a1deb31b6be15b64cc9049e9bd8f83d8a7e50a469cb
|
|
| MD5 |
e1cefe10e20ff6b15e6437276185ff2f
|
|
| BLAKE2b-256 |
359aadcaf69f2c6fea451ce4a9472324ff02933532c2e81f0e91ff823784519e
|
File details
Details for the file klaus_assistant-0.3.0-py3-none-any.whl.
File metadata
- Download URL: klaus_assistant-0.3.0-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
757f505db42aea8aafc29f9ad20fb09a380fa1f1315c27c02846506626e9007c
|
|
| MD5 |
f4a7ecc2d5ea0acc73276d141429c912
|
|
| BLAKE2b-256 |
4c6fe46d59c00149f01e8cd1c03402a7ab9cb27cc8daa7bf4b773f39ced57dd9
|