Skip to main content

Agent-driven research knowledge base. Browse, collect, and synthesize web sources into a searchable wiki.

Project description

HYPERRESEARCH

Ultra deep, persistent web research for AI coding agents

PyPI version Python 3.11+ License: MIT GitHub stars


Your AI agent searches the web, finds great sources, synthesizes an answer — then the session ends and everything is gone. Next time, it starts from zero.

Hyperresearch makes research persist. Every source your agent finds is fetched with a real headless browser, saved as searchable markdown, and indexed into a knowledge base that compounds across sessions. Hyperresearch can follow maps of links to digest paths of information that current agent web searching isn't capable of.

pip install hyperresearch
hyperresearch install

What people use it for

  • In-depth topic research — "Research the latest advances in state space models" → agent fetches 20+ papers, docs, and blog posts, follows citations to primary sources, builds a linked knowledge graph
  • News tracking over time — run research sessions weekly on a topic, each session builds on what's already collected, nothing gets re-fetched
  • State-of-the-art surveys — "What's the current SOTA for speech recognition?" → agent goes down rabbit holes, collects benchmarks, papers, and implementations
  • Competitive analysis — scrape company pages, LinkedIn profiles, product docs, news articles into a persistent, searchable corpus
  • Due diligence — aggregate everything about a person, company, or technology from across the web with authenticated crawling (LinkedIn, Twitter, paywalled sites)

How it works

  1. You ask your agent to research something
  2. Agent searches the web — multiple queries, different angles
  3. Fetches every source with a real headless browser (crawl4ai) — JS rendering, bot detection bypass, login-gated content
  4. Saves each page as a searchable markdown note with tags, summary, and source tracking
  5. Follows links to primary sources — the paper, not the blog post about the paper
  6. Auto-links related notes with [[wiki-links]] across the knowledge graph
  7. Synthesizes findings into a summary note linking all sources
  8. Next session — agent checks the KB before searching the web. Knowledge compounds.
your-repo/
  .hyperresearch/        # Config + SQLite FTS5 index (rebuildable)
  research/
    notes/               # Markdown notes — the source of truth
    index/               # Auto-generated wiki pages
  CLAUDE.md              # Agent docs (auto-injected)

Works with every major agent

hyperresearch install hooks into your agent in one step:

Platform Hook Trigger
Claude Code .claude/settings.json + /research skill Before WebSearch, WebFetch
Codex .codex/hooks.json Before Bash
Cursor .cursor/rules/hyperresearch.mdc Always-apply rule
Gemini CLI .gemini/settings.json Before tool calls
hyperresearch install --platform all    # Hook every platform at once

Key features

  • Real headless browser — crawl4ai runs local Chromium. Handles JavaScript, bypasses bot detection, renders SPAs. Not a simple HTTP fetch.
  • Native PDF extraction — fetches PDFs directly, extracts full text with pymupdf, saves raw files to research/raw/. Academic papers, whitepapers, reports — all first-class citizens.
  • Authenticated crawling — log into LinkedIn, Twitter, paywalled news. Your sessions persist across fetches.
  • Agent-driven curation — fetcher subagents read each source, write real summaries, add meaningful tags, and flag junk. No keyword-matching shortcuts.
  • Junk detection — Cloudflare captchas, error pages, cookie walls, binary garbage, login redirects all caught and rejected before saving.
  • Multi-round deep research — the agent does multiple rounds of search → fetch → follow links, with a source checkpoint that forces breadth before synthesis.
  • Gap analysis + adversarial audit — after drafting, the agent re-reads the original query to find gaps, then spawns two auditor subagents (comprehensiveness + logic) to tear the draft apart. Runs twice.
  • Cheap parallel fetching — ships a Haiku-powered subagent that fetches, summarizes, and quality-checks URLs in parallel for pennies. Spawn 10-20 per round.
  • Scholarly API guidance — agent docs encourage use of arXiv, Semantic Scholar, CrossRef, and PubMed APIs for academic topics.
  • /research skill — scripted deep research workflow. Clarifies ambiguous requests, searches broadly, fetches aggressively, follows rabbit holes, audits, synthesizes.
  • Smart SPA wait — polls DOM stability instead of fixed delays. Fast pages finish instantly, SPAs get up to 10 seconds.
  • FTS5 search — instant full-text search across thousands of notes with BM25 ranking
  • Knowledge graph[[wiki-links]], backlinks, hub detection, auto-linking
  • MCP server — 13 tools (read + write) for Claude Desktop, Cursor, or any MCP client
  • Note lifecycle — draft → review → evergreen → stale → deprecated → archive

Commands

# Research
hyperresearch fetch <url> --tag t -j           # Fetch a URL into the KB
hyperresearch fetch-batch <urls...> -j         # Fetch many URLs at once
hyperresearch research "topic" --max 5 -j      # Full pipeline: search → fetch → link → synthesize

# Search
hyperresearch search "query" -j                # Full-text search
hyperresearch search "query" --max-tokens 8000 # Stay within context budget
hyperresearch note show <id> -j                # Read a note

# Knowledge graph
hyperresearch link --auto -j                   # Auto-link related notes
hyperresearch graph hubs -j                    # Most-connected notes
hyperresearch graph backlinks <id> -j          # What links to this note

# Manage
hyperresearch sources list -j                  # Every URL ever fetched
hyperresearch lint -j                          # Health check
hyperresearch repair -j                        # Fix links, rebuild indexes

Every command returns {"ok": true, "data": {...}} with -j.

Authenticated crawling

Fetch from LinkedIn, Twitter, paywalled sites — anything you can log into:

hyperresearch setup       # Browser opens. Log into your sites. Done.
# .hyperresearch/config.toml
[web]
provider = "crawl4ai"
profile = "research"

LinkedIn, Twitter, Facebook, Instagram, and TikTok automatically use a visible browser to avoid session kills.

Philosophy

  • No LLM calls. Hyperresearch stores, indexes, and searches. Your agent is the LLM.
  • Markdown is truth. Notes are plain files. SQLite is a rebuildable cache.
  • Over-collect, then prune. Fetch aggressively. Deprecate what you don't need.
  • Check before you fetch. Hooks kill redundant searches across sessions.
  • Raw content is king. Save the original with formatting, not a rewritten summary.

Requirements

  • Python 3.11+
  • Windows, macOS, Linux

License

MIT

Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperresearch-0.4.0.tar.gz (142.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hyperresearch-0.4.0-py3-none-any.whl (152.6 kB view details)

Uploaded Python 3

File details

Details for the file hyperresearch-0.4.0.tar.gz.

File metadata

  • Download URL: hyperresearch-0.4.0.tar.gz
  • Upload date:
  • Size: 142.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperresearch-0.4.0.tar.gz
Algorithm Hash digest
SHA256 3364e987211a339d382a7667cfe62175eaca644263ad1a95956ac94f82eb16ae
MD5 f04637a424b50fd68da13507762ce3f5
BLAKE2b-256 d8e246079b54689abec5ff5a7b277522f0fd620040f24d125ab9e3d87bc6bbde

See more details on using hashes here.

Provenance

The following attestation bundles were made for hyperresearch-0.4.0.tar.gz:

Publisher: publish.yml on jordan-gibbs/hyperresearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hyperresearch-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: hyperresearch-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 152.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperresearch-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1710921bb21f7b93d52e72e169aab27011718b6a83908850e941e0bf98b2dfa
MD5 49a4bf07c8f032f0ee1b282f2830ff4a
BLAKE2b-256 f5cc7af26b303eb6e7c5c4d6277a41782621f7d8f9f79f300d995c441aa8da0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for hyperresearch-0.4.0-py3-none-any.whl:

Publisher: publish.yml on jordan-gibbs/hyperresearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page