Voice-powered research assistant for physical books and papers

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- X11 Applications :: Qt
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- MacOS
- Microsoft :: Windows
Programming Language
Topic
- Scientific/Engineering

Project description

Klaus

A voice-based research assistant for reading physical papers and books. Place a paper under a document camera (or phone on a tripod), speak a question, and Klaus sees the page and answers aloud in a natural voice.

Platforms: Windows and macOS

Install

Requires Python 3.11+, a camera or webcam, a microphone, and speakers.

Windows

Install Python 3.11+ (check "Add to PATH" during install).
Install Visual C++ Build Tools -- needed to compile webrtcvad. Select the "Desktop development with C++" workload.
Clone and install:

git clone https://github.com/bgigurtsis/Klaus.git
cd Klaus
pip install .
klaus

Global hotkeys (F2/F3) work without app focus. No extra permissions required on Windows.

macOS

Install Python 3.11+ via Homebrew:

brew install python@3.13

Install PortAudio (required by sounddevice):

brew install portaudio

Clone and install:

git clone https://github.com/bgigurtsis/Klaus.git
cd Klaus
pip install .
klaus

Or install directly via Homebrew:

brew tap bgigurtsis/klaus
brew install klaus
klaus

On macOS, the system will prompt you to grant Accessibility permission to your terminal (or Klaus) for global hotkeys to work. Go to System Settings > Privacy & Security > Accessibility and enable the app.

First launch

On first launch, a setup wizard walks you through API key entry, camera selection, microphone test, voice model download, and an optional background profile. No manual config file editing required.

API keys

Klaus needs three API keys. The setup wizard will ask for them, or you can add them to ~/.klaus/config.toml under [api_keys]:

Key	Where to get it
Anthropic	console.anthropic.com/settings/keys
OpenAI	platform.openai.com/api-keys
Tavily	app.tavily.com (free tier: 1,000 searches/mo)

Optional: set OBSIDIAN_VAULT_PATH in .env to enable note-saving to your Obsidian vault.

Usage

Action	How
Ask a question (push-to-talk)	Hold F2, speak, release
Ask a question (voice activation)	Just speak (default mode)
Toggle input mode	F3
Switch papers	Session dropdown in the header
New session	+ New Session
Replay an answer	Replay button on any response card
Stop playback	Stop button in the status bar
Save notes	Ask Klaus to save to an Obsidian file by name
Change settings	Gear icon in the header

Klaus captures the page image when your question ends and sends it with your transcript to Claude. If Claude is uncertain about a claim, it searches the web via Tavily before answering.

Using a Phone as Your Camera

You don't need a dedicated document camera. A phone on a cheap tripod pointed down at your desk works well.

macOS -- Continuity Camera works natively. Any iPhone running iOS 16+ paired with a Mac on macOS Ventura+ appears as a webcam automatically. No extra app needed; just select the iPhone in Klaus settings.

Windows -- Install DroidCam (free, Android and iOS) or Camo (free tier, Android and iOS). These create a virtual webcam that Klaus picks up. Connect your phone, then select the virtual camera in Klaus settings.

Mounting -- An adjustable gooseneck phone mount or a small phone tripod aimed straight down at the reading surface gives the best results. These run about $10-15 on Amazon. Make sure the full page is visible in the camera preview.

Klaus auto-detects portrait orientation from phone cameras and rotates the image to landscape. If auto-detection gets it wrong, set camera_rotation in ~/.klaus/config.toml to "none", "90", "180", or "270".

Configuration

Settings live in ~/.klaus/config.toml (created on first run). Edit any line to override defaults:

Setting	Default	Notes
`hotkey`	`F2`	Push-to-talk key, works without app focus
`input_mode`	`voice_activation`	Or `push_to_talk`
`voice`	`cedar`	Options: coral, nova, alloy, ash, ballad, echo, fable, onyx, sage, shimmer, verse, cedar, marin
`tts_speed`	`1.0`	0.25 to 4.0
`camera_index`	`0`	Change if you have multiple cameras
`camera_rotation`	`auto`	`auto`, `none`, `90`, `180`, `270`
`camera_width` / `camera_height`	`1920` / `1080`	Camera resolution
`vad_sensitivity`	`3`	0-3, higher = more aggressive noise filtering
`vad_silence_timeout`	`1.5`	Seconds of silence before voice activation finalizes
`log_level`	`INFO`	DEBUG, INFO, WARNING, ERROR

Architecture

Mic --> WebRTC VAD --> Moonshine Medium (local STT) --\
                                                       --> Claude (vision + tools) --> TTS --> Speakers
Camera (live feed) -----------------------------------/        |
                                                               +--> Tavily (web search)
                                                               +--> Obsidian (notes)
                                                               +--> SQLite (memory)
                                                               +--> Chat UI

Speech-to-text runs entirely locally via Moonshine Medium (245M params, ~300ms latency, no API cost). Voice activation uses WebRTC VAD with multi-stage filtering (voiced ratio, RMS loudness, contiguous voiced runs) to reject background noise before audio reaches STT.

Latency & Cost

End-to-end latency from question to first spoken word is 2-3 seconds (STT + Claude + first TTS chunk). TTS streams sentence-by-sentence so playback starts before the full response is generated.

Usage	Approx. cost
10 questions	~$0.05
50 questions	~$0.25
100 questions/day	~$2.50-3.50/day

Largest cost driver is Claude (vision + context window). STT is free (local). TTS is $0.015/min of generated audio.

Module Layout

Module	Role
`config.py`	Config, API keys, system prompt, voice settings
`camera.py`	OpenCV background thread, frame capture, auto-rotation
`audio.py`	Push-to-talk recorder (sounddevice), VAD recorder, WAV buffer
`stt.py`	Moonshine Voice local speech-to-text
`tts.py`	OpenAI gpt-4o-mini-tts with sentence-level streaming
`brain.py`	Claude vision + tool use, conversation history, tool-use loop
`search.py`	Tavily web search, exposed as a Claude tool
`notes.py`	Obsidian vault note-taking, exposed as Claude tools
`memory.py`	SQLite persistence (sessions, exchanges, knowledge profile)
`ui/`	PyQt6 GUI (main window, camera, chat, sessions, status, theme, setup wizard, settings)
`main.py`	Wires everything together, hotkey listener, Qt signal bridge

Data

Config: ~/.klaus/config.toml
Database: ~/.klaus/klaus.db (sessions, exchanges, knowledge profile)
No images stored, only a short hash of each page capture
Delete the database to start fresh

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- X11 Applications :: Qt
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- MacOS
- Microsoft :: Windows
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.3.3

Mar 2, 2026

0.3.2

Mar 2, 2026

0.3.1

Mar 1, 2026

0.3.0

Mar 1, 2026

This version

0.1.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

klaus_assistant-0.1.0.tar.gz (869.3 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

klaus_assistant-0.1.0-py3-none-any.whl (883.4 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file klaus_assistant-0.1.0.tar.gz.

File metadata

Download URL: klaus_assistant-0.1.0.tar.gz
Upload date: Mar 1, 2026
Size: 869.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for klaus_assistant-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`27502e4893facde3793b5a99e9072a46e092c80a9f17fe664dead0e442eed1c4`
MD5	`6dc98b26307e06f0abbce70676eeeca2`
BLAKE2b-256	`4332e19f761e393f543daac49deee6470f3303fa7460ae575a12c2a8da9e1275`

See more details on using hashes here.

File details

Details for the file klaus_assistant-0.1.0-py3-none-any.whl.

File metadata

Download URL: klaus_assistant-0.1.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 883.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for klaus_assistant-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db2cc684ea74342962783f24dcf35517941e43149c3b4d70425495cd11f04b54`
MD5	`9eb2c892cb7da420c2d0ec5c70bc7730`
BLAKE2b-256	`a474c3251df7b59ca4292c9a636aaaab209ad5a912d1a931cce0ea407076e639`

See more details on using hashes here.

klaus-assistant 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Klaus

Install

Windows

macOS

First launch

API keys

Usage

Using a Phone as Your Camera

Configuration

Architecture

Latency & Cost

Module Layout

Data

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes