Skip to main content

Voice-Interactive Reasoning Agent — a voice-controlled AI agent that accesses local folder/markdown context during collaborative sessions.

Project description

VIRA — Voice-Interactive Reasoning Agent

CI

A voice-controlled AI agent that talks to Claude (or a local model via Ollama), loads local folder/markdown context and long-term Obsidian-vault memory into the conversation, and runs reusable markdown skills and multi-step workflows. It holds a hands-free spoken conversation — local Whisper speech-to-text plus ElevenLabs voices (VIRA / VIRO / Friday) with a macOS say fallback — and you can switch voice, persona, and even the LLM provider mid-conversation, by voice. Cost is first-class: per-call usage/cost, a running total, and a spend cap.

Documentation

Doc What it covers
VIRA_PLAN.md Master roadmap & vision (ICM layers, phases)
ARCHITECTURE.md As-built architecture (GitNexus-generated)
ARCHITECTURE-PLAN.md Design-intent architecture
MEMORY-PLAN.md Obsidian memory vault — Phase 21 (shipped)
UIPLAN.md Web UI — Phase 18 (browser chat + settings shipped)
VOICE-CLONE-PLAN.md User voice clone — Phase 20 (shipped)
BARGE-IN-FIX.md Voice / barge-in debugging notes
docs/DIALOGUE.md Topic-grounded dialogue (ICM flows, recipes)
CHANGELOG.md Release notes (Keep a Changelog)
harness/plans/vira-mvp/ Phase build plans (incl. SHELLPLAN.md)

Requirements

  • Python ≥ 3.11 (developed on 3.14)
  • An Anthropic API key for live agent calls (not needed to run the tests)

Install

From PyPI or pipx (v0.2.1+)

pip install vira
pip install 'vira[voice]'                        # + local Whisper STT
pip install 'vira[web]'                          # + `vira web` UI
pip install 'vira[voice,web,mcp,browser]'        # all runtime extras
# or: pip install 'vira[all]'

pipx install vira                                # isolated CLI on PATH
pipx install 'vira[voice]'

Set ANTHROPIC_API_KEY (or use llm_provider: ollama) and copy settings from config/default.yaml into ~/.vira/config.yaml as needed.

Maintainers: pip install -e '.[release]', then ./scripts/release_check.sh and twine upload dist/* when publishing a tag.

From source (development)

python3 -m venv .venv
source .venv/bin/activate
pip install -e .            # core
# pip install -e '.[voice]' # + local Whisper STT (heavy/native deps)

Configure

cp .env.example .env        # then set ANTHROPIC_API_KEY=...

Settings live in config/default.yaml. The agent model is a changeable default resolved with this precedence (lowest → highest):

config/default.yaml (claude-sonnet-4-6)  →  VIRA_MODEL env var  →  vira --model <id>

So you can change it permanently (edit the yaml), per-shell (export VIRA_MODEL=claude-opus-4-8), or per-command (vira chat --model claude-opus-4-8).

Other settings include prompt caching (prompt_cache, on by default — caches the stable context/skill prefix so multi-turn chat is cheaper/faster), context byte budgets, and the voice / workflows / sessions directories.

Cost awareness

Cost is a first-class feature. Every chat / skill run / workflow run prints a one-line usage + estimated-cost summary (toggle with show_usage), and you can estimate spend before a call with a free token preflight:

vira tokens --file big-prompt.md          # input tokens + estimated input cost (free; no completion)
vira skill run refiner "..."              # → usage: 26 in, 5 out | ~$0.0001 (1 call) | total: 336 tok, ~$0.0007 (3 calls)
vira usage                                # show the running total used so far
vira usage --reset                        # start a fresh tally

A running total (persisted in ~/.vira/usage.json, overridable via VIRA_USAGE_FILE) is appended to every usage line and accumulates across commands — in vira chat it updates live after each turn.

Spend cap — set a per-session budget; VIRA warns at 80% and blocks paid calls once the running total reaches it:

vira --spend-cap 0.50 chat          # stop once this session has spent ~$0.50
# or set `spend_cap` in config/default.yaml; reset the session with `vira usage --reset`

Keep costs down: prompt caching is on by default, max_tokens defaults low, and for simple/bulk work use --model claude-haiku-4-5.

Local models (Ollama)

Claude remains the default provider. For offline or privacy-sensitive work, switch to a local model via Ollama — no Anthropic key required.

ollama serve
ollama pull llama3.2:3b

# Per-session
vira --provider ollama --model llama3.2:3b chat

# Or persist in config/default.yaml:
#   llm_provider: ollama
#   model: llama3.2:3b
# Or via .env: VIRA_LLM_PROVIDER=ollama  VIRA_MODEL=llama3.2:3b

Works with vira chat, skill run, workflow run, voice listen, and voice chat. Spend caps and Anthropic token preflight (vira tokens) are skipped for Ollama (local, no API cost). Whisper STT is already local; pair with --tts say for fully offline voice on macOS.

Manual integration check: python scripts/spike_ollama.py

Obsidian memory

VIRA can load long-term memory from an Obsidian-compatible markdown vault and stay expert about itself via synced docs in Self/.

vira memory init                      # ./memories Obsidian vault (wikilinks)
vira memory links                     # show [[wikilinks]] graph
vira memory sync-self                 # copy README, ARCHITECTURE, etc. → Self/
vira chat                             # auto-loads vault + remembers multi-turn history
vira --provider ollama --model llama3.2:3b chat   # every provider gets vault memory
vira skill run refiner "…"            # skills/workflows/voice use memory too
vira memory consolidate session-….jsonl   # distill transcript → Memory/sessions/

# Point at your Obsidian vault folder:
# memory_vault: ./memories   in config/default.yaml (open in Obsidian)

See MEMORY-PLAN.md for layout, load order, and the self-improvement loop.

Usage

vira chat                                # interactive chat (Ctrl-D / 'exit' to quit)
vira chat --model claude-opus-4-8        # one-off model override
vira skills list                         # list available skills
vira skill run refiner "tighten this please"
vira context load ./context              # preview a folder without chatting
vira context budget ./context            # byte/token limits + skip list (optional --exact)
vira kb list                             # registered topic knowledge bases
vira --topic vira chat                   # load primary expertise for this topic
vira --kb ~/Projects/foo/docs skill run refiner "…"   # ad-hoc KB on any command

Host access (tools)

VIRA can read files on your machine, optionally write/edit files, and (optionally) run shell commands through Claude tool use. Read-only by default; every shell command and every file write requires confirmation each time, and .git, .ssh, and secret files (.env, *.pem, id_rsa, …) are refused. Anthropic provider only.

vira do "summarise the Python files in vira/core"              # one-shot task (read-only)
vira do "show disk usage" --allow-shell                        # shell, confirmed per command
vira do "add a docstring to vira/core/agent.py" --allow-write  # write/edit, confirmed per file

vira chat --tools                                              # multi-turn chat + read_file/list_dir
vira chat --tools --allow-shell                                # chat + confirm-gated shell
vira chat --tools --allow-write                                # chat + confirm-gated write_file/edit_file
vira chat --tools --allow-shell --yes                          # skip confirm (risky)
vira chat --tools --allow-browser                              # + Playwright browser tools
vira do "open example.com and summarize" --allow-browser       # one-shot with browser

vira voice chat --hands-free --tools                           # voice + read-only tools
vira voice chat --tools --allow-shell                          # shell with spoken yes/no confirm

Set tools_enabled: true in ~/.vira/config.yaml or config/default.yaml to default --tools on for vira chat. Shell still needs --allow-shell or tools_allow_shell: true; writes need --allow-write or tools_allow_write: true. Tools are confined to tools_root (default: .).

MCP servers (optional pip install 'vira[mcp]'): configure tools_mcp_servers in config — tools appear as mcp_<server>_<name>. List with vira tools mcp list.

Browser automation (optional pip install 'vira[browser]' then playwright install chromium): enable with --allow-browser or tools_allow_browser: true. Tools: browser_navigate, browser_snapshot, browser_screenshot (read-only); browser_click / browser_fill (confirmed). List with vira tools browser list. Only http:// and https:// URLs are allowed.

Ollama tool callingvira chat --tools and vira do work with llm_provider: ollama when the model supports tools (e.g. llama3.1+, qwen2.5). Same confirm gating and registry as Claude.

Voice (spoken conversation)

Install the extra, then talk to VIRA. Speech-to-text runs locally via Whisper. Text-to-speech supports built-in VIRA (warm Irish, default), VIRO (British butler), and Friday (Irish) profiles via ElevenLabs Voice Design, with macOS say as an offline fallback.

pip install -e '.[voice]'

# One-time: add ELEVENLABS_API_KEY to .env, then design and store both voices.
vira voice setup --profile all

vira voice chat                          # VIRA (default); say "goodbye" to stop
vira voice chat --profile viro           # switch to VIRO (British butler)
vira voice chat --tts elevenlabs         # force ElevenLabs (requires setup)
vira voice chat --seconds 7 --stt-model small
vira voice chat --hands-free             # VAD: auto-send when you stop talking
vira voice chat --hands-free --wake-word vira   # only answer turns starting with "vira"
# Interrupt a long reply: press Enter (default), or speak clearly over it (--barge-in / auto with --hands-free)
# Tune barge-in sensitivity: `vira web` (slider) or voice_barge_in_margin in config (lower = easier)
# Mute mic (side conversations): press Space to toggle — cancels an active recording too
vira voice listen recording.wav          # transcribe one file, answer, speak the reply

vira voice profiles                      # show profiles, stored IDs, cache + key status (offline)
vira voice preview --profile vira        # play design previews without storing them
vira voice preview --profile vira --index 1 --commit   # store preview #1 as the voice
vira voice cleanup                       # delete leftover VIRA-* voices (needs voices delete perm)

# Clone your own voice (ElevenLabs Instant Voice Cloning; own voice only)
vira voice clone record                  # guided mic capture + upload (~60s+ total)
vira voice clone record --i-consent      # skip consent prompt (scripts)
vira voice clone from-files a.wav b.wav    # upload existing audio
vira voice clone status                  # stored clone + subscription hints
vira voice chat --profile user           # speak with your cloned voice

Voice settings (profile, TTS engine, Whisper model, record seconds) live in config/default.yaml. Stored ElevenLabs voice IDs are written to ~/.vira/voices.json. The first voice run downloads the chosen Whisper model. Microphone capture needs mic permission for your terminal.

Tuning without code edits — override any built-in profile from config/default.yaml under voice_profile_overrides (e.g. friday.voice_settings.stability); overrides are merged at design and speak time. Audio cache — repeated ElevenLabs lines are cached as MP3s under ~/.vira/audio-cache/ (SHA-256 keyed on voice + model + text + settings), so they replay without re-synthesizing. The cache covers the ElevenLabs playback path only (the macOS say/Moira path, e.g. Friday's prefer_say, is not cached). Disable with voice_audio_cache: false. vira voice preview costs ElevenLabs design credits; it only stores a voice when you pass --commit.

Hands-freevira voice chat --hands-free records until you stop talking (energy-based voice-activity detection; no extra dependency) instead of a fixed --seconds. Tune voice_vad_threshold / voice_vad_silence_ms (and the voice_vad_* caps) in config/default.yaml, or default it on with voice_vad: true. Add an optional wake word (--wake-word vira or voice_wake_word: vira) so VIRA only answers turns that start with it — matched leniently (after an optional "hey/ok/ okay/yo "), so speech-to-text homophones like "Vera" still wake her.

Switch voice mid-conversation — just say "use Friday", "use VIRA", or "use VIRO" (also "switch to …" / "talk as …") to change the speaking voice on the fly, without leaving voice chat. The persona switches with the voice too — Friday introduces herself as Friday (and may call you "boss"), VIRO as a composed British butler — while the conversation memory carries over. Press Enter to interrupt a long reply, or speak over it with barge-in (--barge-in, on by default with --hands-free). Press Space to mute the mic before or during a capture.

Switch model/provider mid-conversation — say "switch to Claude" / "use Anthropic" to use Claude, or "go local" / "go offline" (or "use Ollama") to switch to the local Ollama model — without restarting. Matching is fuzzy and covers common speech-to-text mishearings ("cloud"→Claude, "llama"→Ollama); "go local" is the most reliable phrase for Ollama since "Ollama" is often mis-transcribed. Memory carries over; the spend cap re-applies once you're back on Claude.

Web UI (browser chat)

A local, dark, single-page chat UI in the browser — React frontend, FastAPI backend — reusing the same agent, knowledge bases, vault memory, sessions, and spend-cap as the CLI.

pip install -e '.[web]'
vira web                  # opens http://127.0.0.1:8765/
vira web --no-open        # headless server only

# Remote access — require a token before binding beyond localhost:
VIRA_WEB_AUTH_TOKEN=$(openssl rand -hex 16) vira web --host 0.0.0.0 --port 9000
# clients then pass it: open http://host:9000/?token=…  (or Authorization: Bearer …)
  • Streaming chat (Server-Sent Events) with a live token cursor and the same per-turn usage/cost line as the CLI.
  • Browser voice (Phase 18.1) — tap the mic for push-to-talk (records → Whisper STT → sends) with a live level ring and a listen/speak status chip; the 🔊 toggle speaks replies back in VIRA's voice (ElevenLabs), with a voice-profile picker in settings.
  • Remote control / WebSocket (Phase 19) — an SSE/WS transport toggle in the header switches chat to /api/ws/chat. Set web_auth_token (or VIRA_WEB_AUTH_TOKEN) to require a token on every /api route and the WS handshake (Bearer or ?token=), so binding beyond localhost is safe; empty = localhost trust.
  • Topic knowledge picker + memory toggle in the header — switching either starts a fresh session with that primary KB / vault memory.
  • Voice settings drawer (⚙): barge-in toggle + sensitivity slider (voice_barge_in_margin), Space-mute, Enter-interrupt, STT model — saved to ~/.vira/config.yaml, applied on the next vira voice chat.
  • Session transcripts are written to ~/.vira/sessions/*.jsonl (path shown in the footer).

The UI is built from frontend/ (React + Vite + TypeScript) into a single self-contained vira/web/static/index.html that FastAPI serves — so CI stays Python-only. See frontend/README.md to develop or rebuild it.

Security controls (auth token, bind rules, path allowlist, request limits) are documented in SECURITY.md.

Workflows

Chain skills into a multi-step pipeline with a YAML file. Steps interpolate {{variables}} and prior steps' output ({{step.output}}), can be guarded with when:, can fan out over a list with foreach: (binding {{item}} per element), and can save results to a file with save_to:.

vira workflow list
vira workflow run refine_and_summarize.yaml --var text="your draft here" --var out_dir=.
vira workflow run branch_and_parallel.yaml --var mode=fast --var text="hello"

See examples/workflows/refine_and_summarize.yaml.

Conversation memory & transcripts

vira chat and vira voice chat are multi-turn — VIRA remembers the conversation within a session, and replies stream token-by-token. Ground the conversation in a topic knowledge base (--topic / --kb / --context), and every session is logged as JSONL under sessions_dir (default ~/.vira/sessions).

vira --topic vira chat                # primary expertise from config knowledgebases
vira chat --context ./my-notes        # ad-hoc folder for this chat only
vira session list                     # list recorded transcripts
vira session extract session-20260607-120000.jsonl -o workflow.md   # mine a session into a workflow

session extract reads a transcript and asks the agent to distill the repeatable task into a reusable workflow + decision summary — the first cut of turning dialogue into reusable structure. See docs/DIALOGUE.md for the full ICM flow (context → chat → extract → workflow), diagrams, and gaps.

Layout

vira/
  config.py            # Settings + model/provider precedence
  core/
    context.py         # ContextManager / ContextBundle
    knowledge.py       # Topic knowledge base registry + primary expertise loading
    agent.py           # LLMClient protocol, ViraAgent (+ memory, persona, client swap)
    llm.py             # provider factory (Claude / Ollama)
    ollama.py          # local Ollama client (OpenAI-compatible, stdlib urllib)
    skills.py          # Skill, SkillEngine
    session.py         # SessionLogger (JSONL transcripts)
    workflow.py        # Workflow, WorkflowEngine, load_workflow
    extract.py         # session transcript -> reusable workflow draft
    usage.py           # token/cost tracking, running total, spend cap
  voice/               # optional extra (lazy imports)
    recorder.py stt.py tts.py pipeline.py
    vad.py             # energy VAD, wake word, voice/provider voice-commands
    profiles.py elevenlabs.py   # voice profiles + ElevenLabs Voice Design
  memory/              # Obsidian vault: self-knowledge, facts, session distillation
  cli/main.py          # Typer CLI
examples/skills/refiner/   # sample skill (skill.md + prompts/)
tests/                     # unittest suite (no API key / network needed)
config/default.yaml

Develop

Run the test suite from the repo root (no API key or network required — the Anthropic client is dependency-injected and faked in tests):

python3 -m unittest discover -s tests -t .

Lint and coverage (install dev tools first with pip install -e '.[dev]'):

ruff check .                                   # lint + import order
coverage run -m unittest discover -s tests -t . && coverage report

CI runs both — a ruff check lint job and the test suite under a coverage gate (see .github/workflows/ci.yml).

Roadmap

Delivered: CLI + Claude agent, folder/markdown context, markdown skills, multi-turn memory, session transcripts, a YAML workflow engine, and a real voice loop (Whisper STT + macOS say TTS). Post-MVP ideas (tracked in harness/plans/vira-mvp/EFFORT.md → Improvements & Opportunities): streaming responses, prompt caching, token-budget-aware context, decision-tree extraction from sessions, wake-word/VAD, non-macOS TTS, and PyPI/pipx packaging.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vira_agent-0.2.1.tar.gz (207.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vira_agent-0.2.1-py3-none-any.whl (174.8 kB view details)

Uploaded Python 3

File details

Details for the file vira_agent-0.2.1.tar.gz.

File metadata

  • Download URL: vira_agent-0.2.1.tar.gz
  • Upload date:
  • Size: 207.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vira_agent-0.2.1.tar.gz
Algorithm Hash digest
SHA256 71b0394aa26d3d791e663221b56791cc47e6cdc86ed031f612d64952c47b33b6
MD5 4f623150e0612e8f45e301a3b8c66bbd
BLAKE2b-256 f56d96b8e221d59328fc036ec31db2fde5d53bb00c4bf01b123315ee96b84c82

See more details on using hashes here.

Provenance

The following attestation bundles were made for vira_agent-0.2.1.tar.gz:

Publisher: publish.yml on 3rdAI-admin/VIRA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vira_agent-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: vira_agent-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 174.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vira_agent-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ddefd2a08e43261f4d7d80ee0c80dd85234a3eece251fc6b8a86e7c2867047f
MD5 9a004900d92cb8e3640d18818ad70c68
BLAKE2b-256 bfc13e95a99569f177e371793a954cd1e9f587595c56e0970f65f2be32d63fbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for vira_agent-0.2.1-py3-none-any.whl:

Publisher: publish.yml on 3rdAI-admin/VIRA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page