Voice-Interactive Reasoning Agent — a voice-controlled AI agent that accesses local folder/markdown context during collaborative sessions.
Project description
VIRA — Voice-Interactive Reasoning Agent
A voice-controlled AI agent that talks to Claude (or a local model via Ollama),
loads local folder/markdown context and long-term Obsidian-vault memory into the
conversation, and runs reusable markdown skills and multi-step workflows. It
holds a hands-free spoken conversation — local Whisper speech-to-text plus
ElevenLabs voices (VIRA / VIRO / Friday) with a macOS say fallback — and you can
switch voice, persona, and even the LLM provider mid-conversation, by voice.
Cost is first-class: per-call usage/cost, a running total, and a spend cap.
Documentation
| Doc | What it covers |
|---|---|
| VIRA_PLAN.md | Master roadmap & vision (ICM layers, phases) |
| ARCHITECTURE.md | As-built architecture (GitNexus-generated) |
| ARCHITECTURE-PLAN.md | Design-intent architecture |
| MEMORY-PLAN.md | Obsidian memory vault — Phase 21 (shipped) |
| UIPLAN.md | Web UI — Phase 18 (browser chat + settings shipped) |
| VOICE-CLONE-PLAN.md | User voice clone — Phase 20 (shipped) |
| BARGE-IN-FIX.md | Voice / barge-in debugging notes |
| docs/DIALOGUE.md | Topic-grounded dialogue (ICM flows, recipes) |
| CHANGELOG.md | Release notes (Keep a Changelog) |
| harness/plans/vira-mvp/ | Phase build plans (incl. SHELLPLAN.md) |
Requirements
- Python ≥ 3.11 (developed on 3.14)
- An Anthropic API key for live agent calls (not needed to run the tests)
Install
From PyPI or pipx (v0.2.1+)
pip install vira
pip install 'vira[voice]' # + local Whisper STT
pip install 'vira[web]' # + `vira web` UI
pip install 'vira[voice,web,mcp,browser]' # all runtime extras
# or: pip install 'vira[all]'
pipx install vira # isolated CLI on PATH
pipx install 'vira[voice]'
Set ANTHROPIC_API_KEY (or use llm_provider: ollama) and copy settings from
config/default.yaml into ~/.vira/config.yaml as needed.
Maintainers: pip install -e '.[release]', then ./scripts/release_check.sh
and twine upload dist/* when publishing a tag.
From source (development)
python3 -m venv .venv
source .venv/bin/activate
pip install -e . # core
# pip install -e '.[voice]' # + local Whisper STT (heavy/native deps)
Configure
cp .env.example .env # then set ANTHROPIC_API_KEY=...
Settings live in config/default.yaml. The agent model is a changeable default resolved with this precedence (lowest → highest):
config/default.yaml (claude-sonnet-4-6) → VIRA_MODEL env var → vira --model <id>
So you can change it permanently (edit the yaml), per-shell
(export VIRA_MODEL=claude-opus-4-8), or per-command (vira chat --model claude-opus-4-8).
Other settings include prompt caching (prompt_cache, on by default — caches the
stable context/skill prefix so multi-turn chat is cheaper/faster), context byte
budgets, and the voice / workflows / sessions directories.
Cost awareness
Cost is a first-class feature. Every chat / skill run / workflow run prints a
one-line usage + estimated-cost summary (toggle with show_usage), and you can
estimate spend before a call with a free token preflight:
vira tokens --file big-prompt.md # input tokens + estimated input cost (free; no completion)
vira skill run refiner "..." # → usage: 26 in, 5 out | ~$0.0001 (1 call) | total: 336 tok, ~$0.0007 (3 calls)
vira usage # show the running total used so far
vira usage --reset # start a fresh tally
A running total (persisted in ~/.vira/usage.json, overridable via VIRA_USAGE_FILE)
is appended to every usage line and accumulates across commands — in vira chat it
updates live after each turn.
Spend cap — set a per-session budget; VIRA warns at 80% and blocks paid calls once the running total reaches it:
vira --spend-cap 0.50 chat # stop once this session has spent ~$0.50
# or set `spend_cap` in config/default.yaml; reset the session with `vira usage --reset`
Keep costs down: prompt caching is on by default, max_tokens defaults low, and for
simple/bulk work use --model claude-haiku-4-5.
Local models (Ollama)
Claude remains the default provider. For offline or privacy-sensitive work, switch to a local model via Ollama — no Anthropic key required.
ollama serve
ollama pull llama3.2:3b
# Per-session
vira --provider ollama --model llama3.2:3b chat
# Or persist in config/default.yaml:
# llm_provider: ollama
# model: llama3.2:3b
# Or via .env: VIRA_LLM_PROVIDER=ollama VIRA_MODEL=llama3.2:3b
Works with vira chat, skill run, workflow run, voice listen, and voice chat.
Spend caps and Anthropic token preflight (vira tokens) are skipped for Ollama (local,
no API cost). Whisper STT is already local; pair with --tts say for fully offline
voice on macOS.
Manual integration check: python scripts/spike_ollama.py
Obsidian memory
VIRA can load long-term memory from an Obsidian-compatible markdown vault and stay
expert about itself via synced docs in Self/.
vira memory init # ./memories Obsidian vault (wikilinks)
vira memory links # show [[wikilinks]] graph
vira memory sync-self # copy README, ARCHITECTURE, etc. → Self/
vira chat # auto-loads vault + remembers multi-turn history
vira --provider ollama --model llama3.2:3b chat # every provider gets vault memory
vira skill run refiner "…" # skills/workflows/voice use memory too
vira memory consolidate session-….jsonl # distill transcript → Memory/sessions/
# Point at your Obsidian vault folder:
# memory_vault: ./memories in config/default.yaml (open in Obsidian)
See MEMORY-PLAN.md for layout, load order, and the self-improvement loop.
Usage
vira chat # interactive chat (Ctrl-D / 'exit' to quit)
vira chat --model claude-opus-4-8 # one-off model override
vira skills list # list available skills
vira skill run refiner "tighten this please"
vira context load ./context # preview a folder without chatting
vira context budget ./context # byte/token limits + skip list (optional --exact)
vira kb list # registered topic knowledge bases
vira --topic vira chat # load primary expertise for this topic
vira --kb ~/Projects/foo/docs skill run refiner "…" # ad-hoc KB on any command
Host access (tools)
VIRA can read files on your machine, optionally write/edit files, and (optionally)
run shell commands through Claude tool use. Read-only by default; every shell
command and every file write requires confirmation each time, and .git, .ssh, and
secret files (.env, *.pem, id_rsa, …) are refused. Anthropic provider only.
vira do "summarise the Python files in vira/core" # one-shot task (read-only)
vira do "show disk usage" --allow-shell # shell, confirmed per command
vira do "add a docstring to vira/core/agent.py" --allow-write # write/edit, confirmed per file
vira chat --tools # multi-turn chat + read_file/list_dir
vira chat --tools --allow-shell # chat + confirm-gated shell
vira chat --tools --allow-write # chat + confirm-gated write_file/edit_file
vira chat --tools --allow-shell --yes # skip confirm (risky)
vira chat --tools --allow-browser # + Playwright browser tools
vira do "open example.com and summarize" --allow-browser # one-shot with browser
vira voice chat --hands-free --tools # voice + read-only tools
vira voice chat --tools --allow-shell # shell with spoken yes/no confirm
Set tools_enabled: true in ~/.vira/config.yaml or config/default.yaml to
default --tools on for vira chat. Shell still needs --allow-shell or
tools_allow_shell: true; writes need --allow-write or tools_allow_write: true.
Tools are confined to tools_root (default: .).
MCP servers (optional pip install 'vira[mcp]'): configure tools_mcp_servers in
config — tools appear as mcp_<server>_<name>. List with vira tools mcp list.
Browser automation (optional pip install 'vira[browser]' then
playwright install chromium): enable with --allow-browser or tools_allow_browser: true.
Tools: browser_navigate, browser_snapshot, browser_screenshot (read-only);
browser_click / browser_fill (confirmed). List with vira tools browser list.
Only http:// and https:// URLs are allowed.
Ollama tool calling — vira chat --tools and vira do work with
llm_provider: ollama when the model supports tools (e.g. llama3.1+, qwen2.5).
Same confirm gating and registry as Claude.
Voice (spoken conversation)
Install the extra, then talk to VIRA. Speech-to-text runs locally via Whisper.
Text-to-speech supports built-in VIRA (warm Irish, default), VIRO (British
butler), and Friday (Irish) profiles via ElevenLabs Voice Design, with macOS
say as an offline fallback.
pip install -e '.[voice]'
# One-time: add ELEVENLABS_API_KEY to .env, then design and store both voices.
vira voice setup --profile all
vira voice chat # VIRA (default); say "goodbye" to stop
vira voice chat --profile viro # switch to VIRO (British butler)
vira voice chat --tts elevenlabs # force ElevenLabs (requires setup)
vira voice chat --seconds 7 --stt-model small
vira voice chat --hands-free # VAD: auto-send when you stop talking
vira voice chat --hands-free --wake-word vira # only answer turns starting with "vira"
# Interrupt a long reply: press Enter (default), or speak clearly over it (--barge-in / auto with --hands-free)
# Tune barge-in sensitivity: `vira web` (slider) or voice_barge_in_margin in config (lower = easier)
# Mute mic (side conversations): press Space to toggle — cancels an active recording too
vira voice listen recording.wav # transcribe one file, answer, speak the reply
vira voice profiles # show profiles, stored IDs, cache + key status (offline)
vira voice preview --profile vira # play design previews without storing them
vira voice preview --profile vira --index 1 --commit # store preview #1 as the voice
vira voice cleanup # delete leftover VIRA-* voices (needs voices delete perm)
# Clone your own voice (ElevenLabs Instant Voice Cloning; own voice only)
vira voice clone record # guided mic capture + upload (~60s+ total)
vira voice clone record --i-consent # skip consent prompt (scripts)
vira voice clone from-files a.wav b.wav # upload existing audio
vira voice clone status # stored clone + subscription hints
vira voice chat --profile user # speak with your cloned voice
Voice settings (profile, TTS engine, Whisper model, record seconds) live in
config/default.yaml. Stored ElevenLabs voice IDs are written
to ~/.vira/voices.json. The first voice run downloads the chosen Whisper model.
Microphone capture needs mic permission for your terminal.
Tuning without code edits — override any built-in profile from config/default.yaml
under voice_profile_overrides (e.g. friday.voice_settings.stability); overrides are
merged at design and speak time. Audio cache — repeated ElevenLabs lines are cached
as MP3s under ~/.vira/audio-cache/ (SHA-256 keyed on voice + model + text + settings),
so they replay without re-synthesizing. The cache covers the ElevenLabs playback path
only (the macOS say/Moira path, e.g. Friday's prefer_say, is not cached). Disable with
voice_audio_cache: false. vira voice preview costs ElevenLabs design credits; it only
stores a voice when you pass --commit.
Hands-free — vira voice chat --hands-free records until you stop talking
(energy-based voice-activity detection; no extra dependency) instead of a fixed
--seconds. Tune voice_vad_threshold / voice_vad_silence_ms (and the
voice_vad_* caps) in config/default.yaml, or default it on with voice_vad: true.
Add an optional wake word (--wake-word vira or voice_wake_word: vira) so VIRA
only answers turns that start with it — matched leniently (after an optional "hey/ok/
okay/yo "), so speech-to-text homophones like "Vera" still wake her.
Switch voice mid-conversation — just say "use Friday", "use VIRA", or
"use VIRO" (also "switch to …" / "talk as …") to change the speaking voice on the
fly, without leaving voice chat. The persona switches with the voice too — Friday
introduces herself as Friday (and may call you "boss"), VIRO as a composed British
butler — while the conversation memory carries over. Press Enter to interrupt a
long reply, or speak over it with barge-in (--barge-in, on by default with
--hands-free). Press Space to mute the mic before or during a capture.
Switch model/provider mid-conversation — say "switch to Claude" / "use Anthropic" to use Claude, or "go local" / "go offline" (or "use Ollama") to switch to the local Ollama model — without restarting. Matching is fuzzy and covers common speech-to-text mishearings ("cloud"→Claude, "llama"→Ollama); "go local" is the most reliable phrase for Ollama since "Ollama" is often mis-transcribed. Memory carries over; the spend cap re-applies once you're back on Claude.
Web UI (browser chat)
A local, dark, single-page chat UI in the browser — React frontend, FastAPI backend — reusing the same agent, knowledge bases, vault memory, sessions, and spend-cap as the CLI.
pip install -e '.[web]'
vira web # opens http://127.0.0.1:8765/
vira web --no-open # headless server only
# Remote access — require a token before binding beyond localhost:
VIRA_WEB_AUTH_TOKEN=$(openssl rand -hex 16) vira web --host 0.0.0.0 --port 9000
# clients then pass it: open http://host:9000/?token=… (or Authorization: Bearer …)
- Streaming chat (Server-Sent Events) with a live token cursor and the same per-turn usage/cost line as the CLI.
- Browser voice (Phase 18.1) — tap the mic for push-to-talk (records → Whisper STT → sends) with a live level ring and a listen/speak status chip; the 🔊 toggle speaks replies back in VIRA's voice (ElevenLabs), with a voice-profile picker in settings.
- Remote control / WebSocket (Phase 19) — an SSE/WS transport toggle in the
header switches chat to
/api/ws/chat. Setweb_auth_token(orVIRA_WEB_AUTH_TOKEN) to require a token on every/apiroute and the WS handshake (Beareror?token=), so binding beyond localhost is safe; empty = localhost trust. - Topic knowledge picker + memory toggle in the header — switching either starts a fresh session with that primary KB / vault memory.
- Voice settings drawer (⚙): barge-in toggle + sensitivity slider
(
voice_barge_in_margin), Space-mute, Enter-interrupt, STT model — saved to~/.vira/config.yaml, applied on the nextvira voice chat. - Session transcripts are written to
~/.vira/sessions/*.jsonl(path shown in the footer).
The UI is built from frontend/ (React + Vite + TypeScript) into a single
self-contained vira/web/static/index.html that FastAPI serves — so CI stays
Python-only. See frontend/README.md to develop or rebuild it.
Security controls (auth token, bind rules, path allowlist, request limits) are documented in SECURITY.md.
Workflows
Chain skills into a multi-step pipeline with a YAML file. Steps interpolate
{{variables}} and prior steps' output ({{step.output}}), can be guarded with
when:, can fan out over a list with foreach: (binding {{item}} per element),
and can save results to a file with save_to:.
vira workflow list
vira workflow run refine_and_summarize.yaml --var text="your draft here" --var out_dir=.
vira workflow run branch_and_parallel.yaml --var mode=fast --var text="hello"
See examples/workflows/refine_and_summarize.yaml.
Conversation memory & transcripts
vira chat and vira voice chat are multi-turn — VIRA remembers the
conversation within a session, and replies stream token-by-token. Ground the conversation in a topic knowledge base (--topic / --kb / --context),
and every session is logged as JSONL under sessions_dir (default ~/.vira/sessions).
vira --topic vira chat # primary expertise from config knowledgebases
vira chat --context ./my-notes # ad-hoc folder for this chat only
vira session list # list recorded transcripts
vira session extract session-20260607-120000.jsonl -o workflow.md # mine a session into a workflow
session extract reads a transcript and asks the agent to distill the repeatable
task into a reusable workflow + decision summary — the first cut of turning
dialogue into reusable structure. See docs/DIALOGUE.md for the
full ICM flow (context → chat → extract → workflow), diagrams, and gaps.
Layout
vira/
config.py # Settings + model/provider precedence
core/
context.py # ContextManager / ContextBundle
knowledge.py # Topic knowledge base registry + primary expertise loading
agent.py # LLMClient protocol, ViraAgent (+ memory, persona, client swap)
llm.py # provider factory (Claude / Ollama)
ollama.py # local Ollama client (OpenAI-compatible, stdlib urllib)
skills.py # Skill, SkillEngine
session.py # SessionLogger (JSONL transcripts)
workflow.py # Workflow, WorkflowEngine, load_workflow
extract.py # session transcript -> reusable workflow draft
usage.py # token/cost tracking, running total, spend cap
voice/ # optional extra (lazy imports)
recorder.py stt.py tts.py pipeline.py
vad.py # energy VAD, wake word, voice/provider voice-commands
profiles.py elevenlabs.py # voice profiles + ElevenLabs Voice Design
memory/ # Obsidian vault: self-knowledge, facts, session distillation
cli/main.py # Typer CLI
examples/skills/refiner/ # sample skill (skill.md + prompts/)
tests/ # unittest suite (no API key / network needed)
config/default.yaml
Develop
Run the test suite from the repo root (no API key or network required — the Anthropic client is dependency-injected and faked in tests):
python3 -m unittest discover -s tests -t .
Lint and coverage (install dev tools first with pip install -e '.[dev]'):
ruff check . # lint + import order
coverage run -m unittest discover -s tests -t . && coverage report
CI runs both — a ruff check lint job and the test suite under a coverage gate
(see .github/workflows/ci.yml).
Roadmap
Delivered: CLI + Claude agent, folder/markdown context, markdown skills,
multi-turn memory, session transcripts, a YAML workflow engine, and a real voice
loop (Whisper STT + macOS say TTS). Post-MVP ideas (tracked in
harness/plans/vira-mvp/EFFORT.md → Improvements & Opportunities): streaming
responses, prompt caching, token-budget-aware context, decision-tree extraction
from sessions, wake-word/VAD, non-macOS TTS, and PyPI/pipx packaging.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vira_agent-0.2.1.tar.gz.
File metadata
- Download URL: vira_agent-0.2.1.tar.gz
- Upload date:
- Size: 207.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71b0394aa26d3d791e663221b56791cc47e6cdc86ed031f612d64952c47b33b6
|
|
| MD5 |
4f623150e0612e8f45e301a3b8c66bbd
|
|
| BLAKE2b-256 |
f56d96b8e221d59328fc036ec31db2fde5d53bb00c4bf01b123315ee96b84c82
|
Provenance
The following attestation bundles were made for vira_agent-0.2.1.tar.gz:
Publisher:
publish.yml on 3rdAI-admin/VIRA
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vira_agent-0.2.1.tar.gz -
Subject digest:
71b0394aa26d3d791e663221b56791cc47e6cdc86ed031f612d64952c47b33b6 - Sigstore transparency entry: 1756347138
- Sigstore integration time:
-
Permalink:
3rdAI-admin/VIRA@b645e79709e850c9e0292b10eeab75661d3fceea -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/3rdAI-admin
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b645e79709e850c9e0292b10eeab75661d3fceea -
Trigger Event:
release
-
Statement type:
File details
Details for the file vira_agent-0.2.1-py3-none-any.whl.
File metadata
- Download URL: vira_agent-0.2.1-py3-none-any.whl
- Upload date:
- Size: 174.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ddefd2a08e43261f4d7d80ee0c80dd85234a3eece251fc6b8a86e7c2867047f
|
|
| MD5 |
9a004900d92cb8e3640d18818ad70c68
|
|
| BLAKE2b-256 |
bfc13e95a99569f177e371793a954cd1e9f587595c56e0970f65f2be32d63fbd
|
Provenance
The following attestation bundles were made for vira_agent-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on 3rdAI-admin/VIRA
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vira_agent-0.2.1-py3-none-any.whl -
Subject digest:
4ddefd2a08e43261f4d7d80ee0c80dd85234a3eece251fc6b8a86e7c2867047f - Sigstore transparency entry: 1756347146
- Sigstore integration time:
-
Permalink:
3rdAI-admin/VIRA@b645e79709e850c9e0292b10eeab75661d3fceea -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/3rdAI-admin
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b645e79709e850c9e0292b10eeab75661d3fceea -
Trigger Event:
release
-
Statement type: