Skip to main content

Autonomous, fully-cited multi-agent research crew for your terminal (TUI + CLI).

Project description

vibe-research

CI Python License: MIT

An autonomous, fully-cited multi-agent research crew that lives in your terminal. Give it a topic; a planner splits it into sub-questions, parallel researchers answer each with live web search, a panel of adversarial fact-checkers votes on every claim, an editor decides whether coverage is good enough — and if not, the crew re-plans and researches the gaps itself before writing a cited Markdown report. Hands-off.

It also remembers: each run is distilled to long-term memory so later, related topics build on what it already learned instead of starting cold.

vibe-research "impact of EPF rate changes on SME payroll in Sri Lanka"

A split-screen TUI shows the agent working on the left and the report building on the right. Reports are saved to disk automatically.


Honest expectations (read this)

This tool is engineered to be accurate and fully cited, not "100% correct". No LLM-based tool can promise zero mistakes — they can hallucinate. What vibe-research does instead:

  • grounds every claim in a real source (with URLs),
  • runs a dedicated verify pass that re-checks claims and flags weak/conflicting ones,
  • ends every report with a "Confidence & Gaps" section that says where it's unsure.

For anything high-stakes (compliance, legal, financial numbers), read that section and sanity-check the sources. The tool runs fully autonomously by default; "autonomous" isn't the same as "safe to use unread for things that bite you."


Engines, one tool

vibe-research runs on any of several backends. It auto-detects an Anthropic engine, or you can force one with --mode.

Mode Engine Billing Best for
api Anthropic Messages API pay-per-token (Console) products, multi-user, always-on
subscription Claude Agent SDK draws from your Claude subscription your own / internal use
openai OpenAI Responses API pay-per-token (OpenAI) GPT models / web search
gemini Google Gemini (OpenAI-compat) pay-per-token (Google) Gemini models
glm Zhipu GLM (OpenAI-compat) pay-per-token (Zhipu) GLM models / built-in web search
kimi Moonshot Kimi (OpenAI-compat) pay-per-token (Moonshot) Kimi models

The gemini/glm/kimi engines all speak the OpenAI API, so they need the same [openai] extra and their own API key (GEMINI_API_KEY / GLM_API_KEY / KIMI_API_KEY). Claude-named model defaults auto-map to each provider's models; override with --planner-model/--worker-model. Live web search is built in for api, openai, and glm; the others answer from model knowledge (the fact-checker scores sourcing accordingly).

Subscription mode caveat. Anthropic does not permit third-party apps to offer claude.ai login to other users without prior approval, and the subscription-billing path for the Agent SDK is something Anthropic has said it may change. Use subscription mode for your own usage. For a customer-facing product, use api mode.

OpenAI has no subscription API. Unlike Claude's Agent SDK path, OpenAI API usage is always metered per token against your OpenAI account — a ChatGPT Plus/Pro plan does not grant programmatic access. Set OPENAI_API_KEY and run with --mode openai (pip install "vibe-research[openai]"). Claude-named model defaults are auto-mapped to gpt-4o / gpt-4o-mini; override with --planner-model gpt-5 etc.


Install

Like other terminal tools, pipx is the cleanest install (isolated, on your PATH):

pipx install .
# or, from the project directory during development:
pip install -e ".[dev]"

That gives you API mode out of the box. For subscription mode, also install the Agent SDK extra and set up the Claude Code login (see below):

pipx install ".[subscription]"

For PDF export (--pdf / Ctrl+P), add the pdf extra — pure-Python, no system libraries, and it uses a system Unicode font so accents/arrows render:

pipx install ".[pdf]"        # or: pip install fpdf2

Check everything at any time:

vibe-research doctor

Set up API mode

export ANTHROPIC_API_KEY=sk-ant-...     # get one at https://console.anthropic.com
vibe-research "your topic"

Set up subscription mode

npm install -g @anthropic-ai/claude-code   # the engine the Agent SDK wraps
claude                                      # then type /login and sign in (Pro/Max)
pip install claude-agent-sdk
unset ANTHROPIC_API_KEY                      # IMPORTANT: a stray key bills you per token
vibe-research --mode subscription "your topic"

After a run, confirm usage showed up on your Claude.ai account (not the API console) to be sure the subscription path is wired.


Usage

vibe-research "your topic"                 # TUI (default)
vibe-research run "your topic" --no-tui    # headless: prints progress + saves report
vibe-research --mode subscription "topic"  # force subscription engine
vibe-research --mode openai "topic"        # use OpenAI (needs OPENAI_API_KEY)
vibe-research --mode gemini "topic"        # Gemini (GEMINI_API_KEY) — also glm, kimi
vibe-research run "topic" --parallel 3 --subquestions 6

# autonomy knobs
vibe-research run "topic" --iterations 3   # up to 3 self-refining (gap-filling) rounds
vibe-research run "topic" --votes 3        # 3 adversarial fact-checkers per finding
vibe-research run "topic" --quality 0.85   # keep refining until 85% confidence
vibe-research run "topic" --no-debate      # single fact-check instead of a vote
vibe-research run "topic" --no-memory      # don't recall or persist memory
vibe-research run "topic" --no-humanize    # skip the human-voice rewrite (raw draft)

# sourcing & depth
vibe-research run "topic" --depth deep          # quick | standard | deep preset
vibe-research run "topic" --since 2022          # prefer recent sources
vibe-research run "topic" --only-domains gov,edu     # restrict to trusted domains
vibe-research run "topic" --block-domains reddit.com # drop specific domains
vibe-research run "topic" --citations plain     # plain source list (default: ranked)

# per-stage models
vibe-research run "topic" --writer-model claude-opus-4-8 --verifier-model claude-sonnet-4-6

# length, style & visuals
vibe-research run "topic" --words 1500          # target length (or --pages 3)
vibe-research run "topic" --style essay         # report | essay | brief
vibe-research run "topic" --no-charts --no-diagrams   # opt out of visuals

# output & UX
vibe-research run "topic" --pdf --html --json --docx   # PDF, HTML, JSON sidecar, Word doc
vibe-research run "topic" --open                # open the report when it's done
vibe-research run "topic" --no-tui --quiet      # print only the saved path(s)
vibe-research run "topic" --no-tui --verbose    # print fact-check + editor detail
vibe-research run "topic" --debug               # write a JSONL trace of every model call

# reliability (retries/timeout/throttle per model call)
vibe-research run "topic" --retries 5 --timeout 240 --concurrency 6

vibe-research doctor                        # environment / deps / pipeline check
vibe-research history                       # list past reports
vibe-research memory                        # list long-term memory records
vibe-research memory --clear                # wipe long-term memory
vibe-research config                        # show config
vibe-research config --set max_iterations=3
vibe-research config --set enable_memory=false
vibe-research config --set export_pdf=true  # always export PDF too
vibe-research --version

In the TUI, Ctrl+P exports the current report to a PDF (saved next to the Markdown file); Ctrl+S copies it to the clipboard.

Config

Stored at ~/.config/vibe-research/config.json:

Key Default Meaning
mode auto auto, api, subscription, or openai
planner_model claude-opus-4-8 planning, fact-check, editing, write-up
worker_model claude-sonnet-4-6 the many web-search calls
max_parallel 2 concurrent research threads
subquestions 5 how many sub-questions to research
max_iterations 2 self-refining rounds (re-plan + research gaps)
verifier_votes 2 adversarial fact-checkers per finding
quality_threshold 0.75 editor confidence needed to stop refining
enable_debate true multi-verifier voting vs. a single fact-check
enable_memory true recall from / persist to long-term memory
humanize true final pass: rewrite the report in a natural human voice
citations ranked ranked (by source credibility) or plain list
since_year 0 prefer sources from this year onward (0 = off)
only_domains "" comma-sep domain substrings to keep (e.g. gov,edu)
block_domains "" comma-sep domain substrings to drop (e.g. reddit.com)
verifier_model / writer_model / humanizer_model "" per-stage model overrides (empty → planner model)
prose_style report report, essay, or brief writing style
words 0 target word count (0 = model decides)
enable_charts / enable_diagrams / enable_figures true allow data charts / mermaid diagrams / figures
export_pdf false also write a PDF beside every saved report
export_html false also write a styled HTML page beside every report
export_json false also write a structured JSON sidecar (findings + verdicts)
export_docx false also write a Word .docx (needs the [docx] extra)
debug false write a JSONL trace of every model call
open_after false open the report when a run finishes
max_retries 3 exponential-backoff retries per model call (0 = none)
call_timeout 180 per-call timeout in seconds (0 = none)
max_concurrency 4 cap on simultaneous model calls (API-pressure smoothing)
reports_dir (default) where reports are saved
memory_dir (default) where long-term memory is stored

Reports default to ~/.local/share/vibe-research/reports/ and memory to ~/.local/share/vibe-research/memory/.

Every .md report opens with a metadata line (date · mode · confidence · sources · sub-questions). With --json you also get a machine-readable sidecar containing every finding, its confidence, the fact-check verdicts, and the run's token usage.

Reliability

Model calls are wrapped so a flaky network doesn't sink a multi-minute run: each call is retried with exponential backoff on transient errors (429 / 5xx / timeout / connection), bounded by a per-call timeout, and the whole run is throttled to max_concurrency simultaneous calls so it doesn't hammer the API. Anything that still fails degrades gracefully — a dead research thread becomes a zero-confidence finding, a failed fact-check abstains — rather than crashing. Ctrl+C stops a headless run cleanly.


How it works

A crew of single-purpose agents, coordinated in a self-refining loop:

              ┌─────────── memory (recall related past runs) ───────────┐
              ▼                                                          │
topic ─▶ PLANNER ─▶ RESEARCHERS (parallel, web search) ─▶ VERIFIERS (vote) ─▶ EDITOR
              ▲                                                              │
              └──────────── gaps? re-plan & research more ◀─────────────────┘
                                        │ approved / confidence met / out of rounds
                                        ▼
                        SYNTHESIZER ─▶ HUMANIZER ─▶ validated report ─▶ remember
  1. Planner — splits the topic into focused, non-overlapping sub-questions, seeded with anything relevant recalled from long-term memory.
  2. Researchers — answer each sub-question with live web search + citations, in parallel. Every answer is parsed and validated into a typed structure.
  3. Verifiers — several adversarial fact-checkers judge each claim against its sources with different lenses; their votes are aggregated into one confidence.
  4. Editor — decides whether coverage is thorough and well-supported enough. If not, it names the gaps, the planner turns them into new sub-questions, and the crew researches them — looping until it's confident or runs out of rounds.
  5. Synthesizer — writes a cited report, dropping or flagging claims the fact-checkers rejected, and ends with an honest "Confidence & Gaps" section. Conflicting evidence is surfaced in a Disagreements section, and sources are listed ranked by credibility (primary/authoritative → news → blog).
  6. Humanizer — a final pass that rewrites the draft in a natural human voice (varied rhythm, no AI tells), changing only voice and flow. It never alters a fact and is guarded so it can't drop citations — if a rewrite loses too many source URLs, the original draft is kept.
  7. Memory — the run is distilled to disk so future related topics build on it.

Every value that crosses an agent boundary is run through a strict, stdlib-only data-validation layer (schemas.py): malformed or hallucinated model output is caught at the boundary and either salvaged or safely rejected — the pipeline never trusts raw text.


Development

make dev      # editable install with dev + subscription extras
make test     # run the offline test-suite (no network needed)
make doctor   # environment check

The test-suite runs the full pipeline against a fake backend, so it validates the orchestration logic without any API calls or network access.


Roadmap

Bigger features that need live external services or are larger projects (open to contributions):

  • Local backendOllama for fully offline / free runs (OpenAI, Gemini, GLM, Kimi supported).
  • Local document RAG — research over your own PDFs/notes.
  • MCP server mode — expose vibe-research as a tool to other agents.
  • Recursive / multi-hop research — drill deeper into a single finding.
  • Source archival — snapshot cited pages against link rot.
  • Streaming TUI, scheduled watch mode, Obsidian/Notion sync.

See CHANGELOG.md for what's already shipped.


License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_research-0.6.0.tar.gz (76.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibe_research-0.6.0-py3-none-any.whl (64.6 kB view details)

Uploaded Python 3

File details

Details for the file vibe_research-0.6.0.tar.gz.

File metadata

  • Download URL: vibe_research-0.6.0.tar.gz
  • Upload date:
  • Size: 76.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vibe_research-0.6.0.tar.gz
Algorithm Hash digest
SHA256 268c6fc697cd593469bccc63e4368825c381e6cc0810703d5d4ae23938d6cdf4
MD5 d8eb76656b33f5dbabc9630ae2696057
BLAKE2b-256 1f6893b44b4f8ff11a1b195411b9e4599b551afc54d7a407604c4fba375025cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_research-0.6.0.tar.gz:

Publisher: publish.yml on shalinda-j/Vibe-Research

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vibe_research-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: vibe_research-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vibe_research-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba081b3c2b09b952b3733fb7c6128737d6bc87d7dbcbe2c3d56e2b2fcbe0d573
MD5 4dc75595c31f81fee03b48fa0afd1ce5
BLAKE2b-256 b7764e86b4fbc81952df3eaaf1d0140e235de6ccddf4364c42e9dba61426043e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_research-0.6.0-py3-none-any.whl:

Publisher: publish.yml on shalinda-j/Vibe-Research

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page