Agent-driven research knowledge base. Browse, collect, and synthesize web sources into a searchable wiki.
Project description
Ultra deep, persistent web research for AI coding agents
Your AI agent searches the web, finds great sources, synthesizes an answer — then the session ends and everything is gone. Next time, it starts from zero.
Hyperresearch makes research persist. Every source your agent finds is fetched with a real headless browser, saved as searchable markdown, and indexed into a knowledge base that compounds across sessions. Hyperresearch can follow maps of links to digest paths of information that current agent web searching isn't capable of.
pip install hyperresearch
hyperresearch install
What people use it for
- In-depth topic research — "Research the latest advances in state space models" → agent fetches 20+ papers, docs, and blog posts, follows citations to primary sources, builds a linked knowledge graph
- News tracking over time — run research sessions weekly on a topic, each session builds on what's already collected, nothing gets re-fetched
- State-of-the-art surveys — "What's the current SOTA for speech recognition?" → agent goes down rabbit holes, collects benchmarks, papers, and implementations
- Competitive analysis — scrape company pages, LinkedIn profiles, product docs, news articles into a persistent, searchable corpus
- Due diligence — aggregate everything about a person, company, or technology from across the web with authenticated crawling (LinkedIn, Twitter, paywalled sites)
How it works
- You ask your agent to research something
- Agent searches the web — multiple queries, different angles
- Fetches every source with a real headless browser (crawl4ai) — JS rendering, bot detection bypass, login-gated content
- Saves each page as a searchable markdown note with tags, summary, and source tracking
- Follows links to primary sources — the paper, not the blog post about the paper
- Auto-links related notes with
[[wiki-links]]across the knowledge graph - Synthesizes findings into a summary note linking all sources
- Next session — agent checks the KB before searching the web. Knowledge compounds.
your-repo/
.hyperresearch/ # Config + SQLite FTS5 index (rebuildable)
research/
notes/ # Markdown notes — the source of truth
index/ # Auto-generated wiki pages
CLAUDE.md # Agent docs (auto-injected)
Works with every major agent
hyperresearch install hooks into your agent in one step:
| Platform | Hook | Trigger |
|---|---|---|
| Claude Code | .claude/settings.json + /research skill |
Before WebSearch, WebFetch |
| Codex | .codex/hooks.json |
Before Bash |
| Cursor | .cursor/rules/hyperresearch.mdc |
Always-apply rule |
| Gemini CLI | .gemini/settings.json |
Before tool calls |
hyperresearch install --platform all # Hook every platform at once
Key features
- Real headless browser — crawl4ai runs local Chromium. Handles JavaScript, bypasses bot detection, renders SPAs. Not a simple HTTP fetch.
- Native PDF extraction — fetches PDFs directly, extracts full text with pymupdf, saves raw files to
research/raw/. Academic papers, whitepapers, reports — all first-class citizens. - Authenticated crawling — log into LinkedIn, Twitter, paywalled news. Your sessions persist across fetches.
- Agent-driven curation — fetcher subagents read each source, write real summaries, add meaningful tags, and flag junk. No keyword-matching shortcuts.
- Junk detection — Cloudflare captchas, error pages, cookie walls, binary garbage, login redirects all caught and rejected before saving.
- Multi-round deep research — the agent does multiple rounds of search → fetch → follow links, with a source checkpoint that forces breadth before synthesis.
- Gap analysis + adversarial audit — after drafting, the agent re-reads the original query to find gaps, then spawns two auditor subagents (comprehensiveness + logic) to tear the draft apart. Runs twice.
- Cheap parallel fetching — ships a Haiku-powered subagent that fetches, summarizes, and quality-checks URLs in parallel for pennies. Spawn 10-20 per round.
- Scholarly API guidance — agent docs encourage use of arXiv, Semantic Scholar, CrossRef, and PubMed APIs for academic topics.
/researchskill — scripted deep research workflow. Clarifies ambiguous requests, searches broadly, fetches aggressively, follows rabbit holes, audits, synthesizes.- Smart SPA wait — polls DOM stability instead of fixed delays. Fast pages finish instantly, SPAs get up to 10 seconds.
- FTS5 search — instant full-text search across thousands of notes with BM25 ranking
- Knowledge graph —
[[wiki-links]], backlinks, hub detection, auto-linking - MCP server — 13 tools (read + write) for Claude Desktop, Cursor, or any MCP client
- Note lifecycle — draft → review → evergreen → stale → deprecated → archive
Commands
# Research
hyperresearch fetch <url> --tag t -j # Fetch a URL into the KB
hyperresearch fetch-batch <urls...> -j # Fetch many URLs at once
hyperresearch research "topic" --max 5 -j # Full pipeline: search → fetch → link → synthesize
# Search
hyperresearch search "query" -j # Full-text search
hyperresearch search "query" --max-tokens 8000 # Stay within context budget
hyperresearch note show <id> -j # Read a note
# Knowledge graph
hyperresearch link --auto -j # Auto-link related notes
hyperresearch graph hubs -j # Most-connected notes
hyperresearch graph backlinks <id> -j # What links to this note
# Manage
hyperresearch sources list -j # Every URL ever fetched
hyperresearch lint -j # Health check
hyperresearch repair -j # Fix links, rebuild indexes
Every command returns {"ok": true, "data": {...}} with -j.
Authenticated crawling
Fetch from LinkedIn, Twitter, paywalled sites — anything you can log into:
hyperresearch setup # Browser opens. Log into your sites. Done.
# .hyperresearch/config.toml
[web]
provider = "crawl4ai"
profile = "research"
LinkedIn, Twitter, Facebook, Instagram, and TikTok automatically use a visible browser to avoid session kills.
Philosophy
- No LLM calls. Hyperresearch stores, indexes, and searches. Your agent is the LLM.
- Markdown is truth. Notes are plain files. SQLite is a rebuildable cache.
- Over-collect, then prune. Fetch aggressively. Deprecate what you don't need.
- Check before you fetch. Hooks kill redundant searches across sessions.
- Raw content is king. Save the original with formatting, not a rewritten summary.
Requirements
- Python 3.11+
- Windows, macOS, Linux
License
Star History
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hyperresearch-0.4.0.tar.gz.
File metadata
- Download URL: hyperresearch-0.4.0.tar.gz
- Upload date:
- Size: 142.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3364e987211a339d382a7667cfe62175eaca644263ad1a95956ac94f82eb16ae
|
|
| MD5 |
f04637a424b50fd68da13507762ce3f5
|
|
| BLAKE2b-256 |
d8e246079b54689abec5ff5a7b277522f0fd620040f24d125ab9e3d87bc6bbde
|
Provenance
The following attestation bundles were made for hyperresearch-0.4.0.tar.gz:
Publisher:
publish.yml on jordan-gibbs/hyperresearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hyperresearch-0.4.0.tar.gz -
Subject digest:
3364e987211a339d382a7667cfe62175eaca644263ad1a95956ac94f82eb16ae - Sigstore transparency entry: 1287083738
- Sigstore integration time:
-
Permalink:
jordan-gibbs/hyperresearch@e0eba940238a56c3524aa36362fa3e2a2d8751f5 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/jordan-gibbs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e0eba940238a56c3524aa36362fa3e2a2d8751f5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file hyperresearch-0.4.0-py3-none-any.whl.
File metadata
- Download URL: hyperresearch-0.4.0-py3-none-any.whl
- Upload date:
- Size: 152.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1710921bb21f7b93d52e72e169aab27011718b6a83908850e941e0bf98b2dfa
|
|
| MD5 |
49a4bf07c8f032f0ee1b282f2830ff4a
|
|
| BLAKE2b-256 |
f5cc7af26b303eb6e7c5c4d6277a41782621f7d8f9f79f300d995c441aa8da0f
|
Provenance
The following attestation bundles were made for hyperresearch-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on jordan-gibbs/hyperresearch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hyperresearch-0.4.0-py3-none-any.whl -
Subject digest:
a1710921bb21f7b93d52e72e169aab27011718b6a83908850e941e0bf98b2dfa - Sigstore transparency entry: 1287083831
- Sigstore integration time:
-
Permalink:
jordan-gibbs/hyperresearch@e0eba940238a56c3524aa36362fa3e2a2d8751f5 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/jordan-gibbs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e0eba940238a56c3524aa36362fa3e2a2d8751f5 -
Trigger Event:
release
-
Statement type: