RabbitHole deep research CLI
Project description
RabbitHole — Deep Research Orchestrator
A recursive, multi-agent research system that performs deep research on any topic and generates comprehensive reports. Features fully sequential execution for constant memory usage.
Table of Contents
- Quick Start
- Example Outputs
- What This Does
- Key Features
- Installation
- Configuration
- Usage
- Understanding Sequential Execution
- Project Structure
- Performance Tuning
- Troubleshooting
- Architecture
- Advanced Topics
Quick Start
1. Install (30 seconds)
cd RabbitHole
python3 -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
2. Configure (Edit .env file)
# Required: Get your key at https://openrouter.ai/keys
OPENROUTER_API_KEY=sk-or-v1-your-key-here
# Research scope
MAX_DEPTH=5 # Recursion depth (2-10)
MAX_CHILDREN=12 # Children per agent (2-20)
SOURCE_COUNT=50 # Sources per topic (10-100)
# Resource control (IMPORTANT!)
MAX_CONCURRENT_TASKS=1 # 1=sequential, 4=parallel
3. Run Research
python -m rabbithole.cli "Your research topic here"
# Keep runtime files for debugging (db, cache, artifacts):
python -m rabbithole.cli "Your topic" --no-cleanup
Output saved to: research_report.md
Runtime files (db, cache) are stored in $TMPDIR/rabbithole and automatically cleaned up after each run. This keeps the project directory clean for packaging (pipx/homebrew).
Example Outputs
See example output/ for sample reports. Note that "AI Ethics (2 hours)" used deeper recursion with more agents but a lower target word count, while "Uses for LLMs (6 minutes)" generated to a higher word target in 6 minutes with less recursion. Both are of quite good quality and depth.
What This Does
RabbitHole creates a tree of specialized research agents that recursively explore your topic:
Your Topic: "History of the Byzantine Empire"
↓
Root Agent: Fetches 50 sources, summarizes, derives subtopics
├─ "Early Byzantine Period (330-610)"
│ ├─ "Constantine's founding of Constantinople"
│ ├─ "Justinian's reconquest campaigns"
│ └─ "Codification of Roman law"
├─ "Byzantine military tactics"
│ ├─ "Greek fire technology"
│ └─ "Theme system organization"
└─ "Byzantine art and architecture"
├─ "Hagia Sophia construction"
└─ "Icon veneration controversies"
Each agent:
- Fetches N sources (default: 50) from the web
- Summarizes each source with LLM
- Derives child topics from summaries
- Spawns child agents (up to MAX_CHILDREN)
- Continues recursively up to MAX_DEPTH levels
Final Output: A comprehensive Markdown report with:
- 200-word executive summary
- Table of contents
- Detailed sections for each researched topic
- Source citations with URLs
- Full provenance appendix
Key Features
🎯 Deep, Recursive Research
- Agents spawn sub-agents up to configurable depth
- Each agent specializes in a subtopic
- Potentially thousands of sources analyzed
🔒 Sequential Execution (Default)
- Only 1 operation at a time when
MAX_CONCURRENT_TASKS=1 - Constant memory usage (~500MB) regardless of depth
- Scales by TIME only, not resources
- Safe for limited hardware (4GB RAM)
⚡ Configurable Parallelism
- Increase
MAX_CONCURRENT_TASKSfor faster results - Trade memory for speed when you have RAM
- Up to 8× faster with more concurrent tasks
🧹 Clean Runtime
- Runtime files (db, cache) stored in system temp directory (
$TMPDIR/rabbithole) - Automatic cleanup after each run (configurable via
AUTO_CLEANUP) - Project directory stays clean — ready for pipx/homebrew packaging
- Content-hash caching for summaries avoids redundant LLM calls
🌐 Real Web Search
- Multiple search backends: Brave, SerpAPI, Tavily, Exa, Bing, Wikipedia, arXiv
- Configurable fallback chain: Define provider order (e.g.,
brave,serpapi,bing,wikipedia) - Fetches actual online sources, not simulated data
📊 Progress Tracking
- Real-time logs of pending/in-progress/done tasks
- API call and token usage monitoring
- Detailed task start/completion logs
- Colored console output with category-based formatting
- Configurable via
NO_COLOR,FORCE_COLOR,LOG_TIMESTAMPS
💰 Budget Controls
- Optional limits on tokens, calls, time, cost
- Automatic stopping when budgets exceeded
Installation
Prerequisites
- Python 3.9 or higher
- 1GB RAM minimum (2-4GB+ recommended for parallel execution)
- Internet connection
- OpenRouter API key (free tier available)
Steps
-
Clone or navigate to the project:
cd /path/to/RabbitHole
-
Create virtual environment:
python3 -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Get OpenRouter API key:
- Visit https://openrouter.ai/keys
- Sign up (free tier available)
- Generate an API key
- Add to
.envfile (see Configuration)
Configuration
All configuration is done through the .env file in the project root.
Essential Settings
# OpenRouter API (Required)
OPENROUTER_API_KEY=sk-or-v1-your-key-here
OPENROUTER_API_BASE=https://openrouter.ai/api
OPENROUTER_MODEL=arcee-ai/trinity-large-preview:free
OPENROUTER_LOG=1 # Log API requests (1=on, 0=off)
# Research Scope
MAX_DEPTH=5 # Recursion depth (2-10)
MAX_CHILDREN=12 # Children per agent (2-20)
SOURCE_COUNT=50 # Sources to fetch per topic (10-100)
# Resource Control (CRITICAL!)
MAX_CONCURRENT_TASKS=1 # System-wide parallelism limit
# 1 = Fully sequential (default, safest)
# 2-4 = Limited parallel (requires 8-16GB RAM)
# 8+ = High parallel (requires 32GB+ RAM)
CONCURRENCY=1 # Number of worker coroutines (usually 1)
# Output and Storage
OUTPUT_PATH=research_report.md
DB_PATH=runtime/rabbithole/state.db
# Progress Logging
PROGRESS_LOG=1 # Enable progress logs
PROGRESS_VERBOSE_TASKS=1 # Show detailed task logs
PROGRESS_INTERVAL_SEC=5 # Heartbeat interval
# Console Output Formatting
NO_COLOR=0 # Set to 1 to disable colored output
FORCE_COLOR=0 # Set to 1 to force colors (even in non-TTY)
LOG_TIMESTAMPS=0 # Set to 1 to add timestamps to logs
Popular OpenRouter Models
# Free models
OPENROUTER_MODEL=arcee-ai/trinity-large-preview:free
OPENROUTER_MODEL=meta-llama/llama-3.2-3b-instruct:free
# Paid models (fast & cheap)
OPENROUTER_MODEL=openai/gpt-4o-mini
OPENROUTER_MODEL=anthropic/claude-3-haiku
# Paid models (high quality)
OPENROUTER_MODEL=openai/gpt-4o
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
See all models: https://openrouter.ai/models
Configuration Presets
Quick Test (5 min, 500MB RAM)
MAX_DEPTH=2
MAX_CHILDREN=3
MAX_CONCURRENT_TASKS=1
SOURCE_COUNT=10
Standard Research (1 hour, 500MB RAM)
MAX_DEPTH=3
MAX_CHILDREN=5
MAX_CONCURRENT_TASKS=1
SOURCE_COUNT=25
Deep Dive (4+ hours, 500MB RAM)
MAX_DEPTH=4
MAX_CHILDREN=8
MAX_CONCURRENT_TASKS=1
SOURCE_COUNT=50
Fast Research (30 min, 2GB RAM)
MAX_DEPTH=3
MAX_CHILDREN=5
MAX_CONCURRENT_TASKS=4
SOURCE_COUNT=25
Multi-Provider Mode (Advanced)
For higher throughput and automatic failover, you can configure multiple LLM providers with per-provider task assignments:
# Multiple OpenRouter keys (load balanced)
OPENROUTER_API_KEYS=sk-or-v1-key1,sk-or-v1-key2,sk-or-v1-key3
OPENROUTER_MODELS=arcee-ai/trinity-large-preview:free
OPENROUTER_TASKS=all # Enable for all task types
# Groq (fast inference, disabled in this example)
GROQ_API_KEYS=gsk_key1,gsk_key2
GROQ_MODELS=llama-3.3-70b-versatile
GROQ_TASKS=none # Keys stored but not used
# Google AI Studio (specific tasks only)
GOOGLE_AI_KEYS=AIza...key1
GOOGLE_AI_MODELS=gemini-2.0-flash-exp
GOOGLE_AI_TASKS=summarization,validation # Only these tasks
# Ollama (local model for high-quality report generation)
OLLAMA_BASE_URLS=http://localhost:11434
OLLAMA_MODELS=llama3.1:70b
OLLAMA_TASKS=report # Only final report synthesis
# Provider fallback order
LLM_FALLBACK_CHAIN=openrouter,groq,google_ai,ollama
Task Types: all, none, summarization, subtopic, validation, report, recommendations, research
Features:
- Load balancing: Calls distributed across all healthy providers for a task type
- Automatic failover: Rate limits trigger instant rerouting to other providers
- Per-provider tasks: Assign different providers to different task types
- Circuit breaker: Failed providers temporarily disabled to prevent cascading failures
Web Search Providers
Configure which search backend(s) to use for fetching web sources:
# Primary search provider
SEARCH_PROVIDER=bing # Default (free, no API key)
# Fallback chain (tried in order when primary fails or returns insufficient results)
SEARCH_FALLBACK_CHAIN=wikipedia,arxiv
Available Providers
| Provider | API Key Required | Free Tier | Best For |
|---|---|---|---|
bing |
No | Unlimited | General search (default) |
brave |
BRAVE_API_KEY |
$5/month | Quality results, privacy |
serpapi |
SERPAPI_API_KEY |
100/month | Google results |
tavily |
TAVILY_API_KEY |
1000/month | AI-optimized search |
exa |
EXA_API_KEY |
$5 credits | Semantic/embeddings search |
wikipedia |
No | Unlimited | Encyclopedia content |
arxiv |
No | Unlimited | Research papers |
Example Configurations
# Free setup (default)
SEARCH_PROVIDER=bing
SEARCH_FALLBACK_CHAIN=wikipedia,arxiv
# Premium setup with multiple fallbacks
SEARCH_PROVIDER=brave
SEARCH_FALLBACK_CHAIN=tavily,serpapi,bing,wikipedia,arxiv
BRAVE_API_KEY=BSA...
TAVILY_API_KEY=tvly-...
SERPAPI_API_KEY=...
# Research-focused (academic papers priority)
SEARCH_PROVIDER=arxiv
SEARCH_FALLBACK_CHAIN=wikipedia,bing
# AI-optimized search
SEARCH_PROVIDER=tavily
SEARCH_FALLBACK_CHAIN=exa,brave,bing,wikipedia
TAVILY_API_KEY=tvly-...
EXA_API_KEY=...
The system tries providers in order until it has enough results. If brave returns 3 results but you need 5, it continues to tavily, then serpapi, etc.
Usage
Basic Usage
python -m rabbithole.cli "Your research topic here"
The system will:
- Load configuration from
.env - Initialize database and connectors
- Create root agent for your topic
- Recursively spawn and process agents
- Generate final report in
research_report.md
Example Topics
# History
python -m rabbithole.cli "What caused the fall of the Roman Empire?"
# Technology comparison
python -m rabbithole.cli "Compare cloud providers AWS, Azure, and GCP for startups"
# Scientific research
python -m rabbithole.cli "Recent advances in quantum computing error correction"
# Product research
python -m rabbithole.cli "Best noise-cancelling headphones under $300 in 2024"
# Philosophy
python -m rabbithole.cli "Effective altruism philosophical arguments"
Monitoring Progress
When PROGRESS_LOG=1 (default), you'll see:
[config] MAX_CONCURRENT_TASKS=1 (controls system-wide parallelism)
[config] connector=web_search
[progress] job=job-abc123 started depth=5 children=12 concurrency=1 max_concurrent_tasks=1
[worker:0] start task=task-xyz depth=1 topic=Machine learning applications
[openrouter] request model=openai/gpt-4o-mini base=https://openrouter.ai/api
[openrouter] response ok model=openai/gpt-4o-mini base=https://openrouter.ai/api tokens=245
[queue] +task=task-abc depth=2 topic=Neural networks in medical imaging
[worker:0] done task=task-xyz docs=50 spawned=12
[progress] [██████████░░░░░░░░░░] 50% tasks: 23/46 (+1 active) llm: 47 calls, 12.5K tokens ETA: 5m 30s
Key metrics:
- Progress bar: Visual indicator with percentage complete
- ETA: Estimated time remaining based on current progress
- pending: Tasks waiting to be processed
- in_progress: Currently running tasks (≤ MAX_CONCURRENT_TASKS)
- done: Completed tasks
- llm_calls: Total API calls made
- llm_tokens: Total tokens used (cost indicator)
Web Search Stats (logged every 5 failures):
[web_stats] searches=100 bing=85% fallback=10% failed=5% | fetches=500 ok=92%
- bing%: Searches that succeeded from Bing directly
- fallback%: Searches that needed Wikipedia/arXiv fallback
- failed%: Searches with no sources found
- fetches ok%: Individual URL content fetch success rate
Output Format
The generated research_report.md contains:
-
Executive Summary (~200 words)
- LLM-synthesized overview of all findings
-
Table of Contents
- Links to all researched topics
-
Detailed Sections
- Each topic gets a section
- Multiple sources per topic (title, URL)
- Summarized content for each source
-
Appendix: Provenance
- Full metadata (topics, depth, sources)
Understanding Sequential Execution
The Problem: Uncontrolled Parallelism
Traditional parallel systems scale resources with tree depth:
Without limits:
├── Agent 1 (+ HTTP + LLM × 50)
├── Agent 2 (+ HTTP + LLM × 50) } All running
├── Agent 3 (+ HTTP + LLM × 50) } simultaneously
└── ...potentially hundreds...
Result: Memory × agents = CRASH with deep trees
The Solution: Global Concurrency Gate
RabbitHole uses MAX_CONCURRENT_TASKS to limit operations:
With MAX_CONCURRENT_TASKS=1:
Agent 1 [Fetch][LLM×50]
Agent 2 [Fetch][LLM×50]
Agent 3...
Result: Memory constant (~500MB) regardless of depth
How It Works
-
ExecutorLimiter (
executor_limiter.py)- Singleton managing global thread pool
- ThreadPoolExecutor(max_workers=MAX_CONCURRENT_TASKS)
- asyncio.Semaphore(MAX_CONCURRENT_TASKS)
-
All I/O operations go through this gate:
- Connector.fetch() → acquires semaphore → executes → releases
- LLM.summarize_async() → acquires semaphore → executes → releases
-
Sequential execution (MAX_CONCURRENT_TASKS=1):
- Only 1 semaphore slot available
- Operations queue and wait for the slot
- No parallel execution possible
Sequential vs Parallel
| MAX_CONCURRENT_TASKS | Behavior | Memory | Speed | Use Case |
|---|---|---|---|---|
| 1 | Fully sequential | 500MB | 1× | <1GB RAM, stability |
| 2 | Limited parallel | 1GB | 1.8× | <2GB RAM, balanced |
| 4 | Limited parallel | 2GB | 3.5× | <4GB RAM, faster |
| 8 | High parallel | 4GB | 6× | <8GB RAM, fastest |
Memory Scaling
Key insight: Memory is constant for a given MAX_CONCURRENT_TASKS, regardless of depth.
depth=2, children=2, MAX_CONCURRENT_TASKS=1
→ ~7 agents, 5 minutes, 500MB
depth=100, children=1000, MAX_CONCURRENT_TASKS=1
→ ~10^300 agents, years, 500MB (still constant!)
Only time increases with depth, not resources.
Scaling Characteristics
| Agents | MAX_CONCURRENT_TASKS=1 | =2 | =4 | =8 |
|---|---|---|---|---|
| 10 | 8 min | 4 min | 2 min | 1 min |
| 40 | 30 min | 15 min | 8 min | 4 min |
| 100 | 75 min | 38 min | 20 min | 10 min |
| 1000 | 12.5 hrs | 6.3 hrs | 3.1 hrs | 1.6 hrs |
Assumes 45 seconds per agent average
Project Structure
RabbitHole/
├── .env # Configuration (YOU EDIT THIS)
├── requirements.txt # Python dependencies
├── README.md # This file
│
├── rabbithole/ # Core package
│ ├── __init__.py # Package marker
│ ├── cli.py # Entry point, loads .env
│ ├── orchestrator.py # Job manager, worker pool
│ ├── agent.py # Agent logic, subtopic derivation
│ ├── llm.py # LLM wrapper (OpenRouter/OpenAI)
│ ├── datastore.py # SQLite persistence
│ ├── report.py # Markdown report generator
│ ├── embeddings.py # Optional vector embeddings
│ ├── executor_limiter.py # Global concurrency control ★
│ ├── logger.py # Colored console logging utility ★
│ ├── runner.py # Standalone single-job runner
│ └── web_search.py # Web search connector
│
├── runtime/ # Runtime data (auto-created)
│ ├── rabbithole/
│ │ ├── state.db # SQLite database (auto-created)
│ │ └── artifacts/ # Cached source documents (SHA-256 filenames)
│ │ └── *.txt
│ └── artifacts/ # Additional artifact storage
│
├── Example_output/ # Example generated reports
│ └── *.md # Sample research reports
│
└── venv/ # Python virtual environment
What Each Folder Contains
rabbithole/ - Core application code
- Main modules for orchestration, agents, LLM, storage
- executor_limiter.py: Controls sequential execution
- logger.py: Colored console logging with category-based formatting
- runner.py: Standalone single-job runner for quick testing
- web_search.py: Web search connector for fetching sources
runtime/ - Runtime data and caches
- state.db: SQLite database with jobs, tasks, agents, results
- artifacts/: Cached raw source documents (named by SHA-256 hash)
- Auto-created on first run
- Safe to delete (will regenerate, but loses history)
Example_output/ - Example research reports
- Sample generated reports for reference
venv/ - Python virtual environment
- Isolated Python packages
- Created with
python -m venv venv - Activate before running
📁 Project Organization (click to expand)
Files You Should Edit
| File | Purpose |
|---|---|
.env |
Your main configuration — API key, research parameters, resource limits |
| Topic argument | When running: python -m rabbithole.cli "Your topic here" |
Files You Might Edit (Advanced)
| File | Purpose |
|---|---|
rabbithole/*.py |
If extending the system |
.gitignore |
If you want to track different files |
Files/Folders You Shouldn't Touch
| Folder | Purpose | Can Delete? |
|---|---|---|
runtime/ |
Runtime data and caches | ✅ Yes (will regenerate) |
Example_output/ |
Example reports | ✅ Yes |
venv/ |
Managed by pip | ✅ Yes (must recreate) |
__pycache__/ |
Python bytecode cache | ✅ Yes |
What is state.db?
SQLite database storing jobs, tasks, agents, and results. Located in runtime/rabbithole/. Created automatically on first run. Safe to delete (will recreate, but loses history).
Cleaning Up
# Clear all runtime data (reports, cache, database)
rm -rf runtime/
# Clear Python cache
find . -type d -name __pycache__ -exec rm -rf {} +
# Clear virtual environment (must reinstall after)
rm -rf venv/
Do NOT Delete
.env— Your configuration and API keyrabbithole/— Core application coderequirements.txt— Dependency listREADME.md— Documentation
Quick Reference
| Folder/File | Purpose | Edit? | Delete? |
|---|---|---|---|
rabbithole/ |
Application code | Advanced | No |
runtime/ |
Runtime data/caches | No | Yes |
Example_output/ |
Example reports | No | Yes |
venv/ |
Virtual environment | No | Yes* |
.env |
Configuration | Yes | No |
requirements.txt |
Dependencies | Advanced | No |
*Can delete venv/ but must recreate and reinstall packages
Performance Tuning
For Limited RAM (4-8GB)
MAX_CONCURRENT_TASKS=1
MAX_DEPTH=2
MAX_CHILDREN=3
SOURCE_COUNT=10
CONCURRENCY=1
Result: ~13 agents, ~10 minutes, ~400MB RAM
For Faster Results (16GB+)
MAX_CONCURRENT_TASKS=4
MAX_DEPTH=3
MAX_CHILDREN=5
SOURCE_COUNT=20
CONCURRENCY=2
Result: ~156 agents, ~30 minutes, ~2GB RAM
For Comprehensive Research (32GB+)
MAX_CONCURRENT_TASKS=8
MAX_DEPTH=4
MAX_CHILDREN=8
SOURCE_COUNT=50
CONCURRENCY=4
Result: ~4,000+ agents, ~2 hours, ~4GB RAM
Agent Count Formula
Total agents ≈ SUM(MAX_CHILDREN^depth for depth in 0..MAX_DEPTH)
Examples:
- depth=2, children=2: 1 + 2 + 4 = 7 agents
- depth=3, children=3: 1 + 3 + 9 + 27 = 40 agents
- depth=3, children=5: 1 + 5 + 25 + 125 = 156 agents
- depth=5, children=12: 1 + 12 + 144 + ... = ~248,832 agents
Estimating Runtime
Time = (Total Agents / MAX_CONCURRENT_TASKS) × Average Time Per Agent
Average Time Per Agent ≈ 30-60 seconds (fetch + LLM calls)
Examples:
- 7 agents, MAX_CONCURRENT_TASKS=1: 7 × 45s = ~5 minutes
- 156 agents, MAX_CONCURRENT_TASKS=1: 156 × 45s = ~2 hours
- 156 agents, MAX_CONCURRENT_TASKS=4: 156/4 × 45s = ~30 minutes
Troubleshooting
No Sources Retrieved
Symptoms: Report says "No online sources were retrieved for this topic"
Solutions:
- Check OpenRouter API key is valid in
.env - Verify network connectivity:
ping openrouter.ai - Check OpenRouter privacy settings: https://openrouter.ai/settings/privacy
- Ensure data policy allows external requests
- Check console logs for specific error messages
Blank or Very Short Reports
Symptoms: Report has few sections or mostly empty content
Solutions:
- Increase
SOURCE_COUNTto 50 or more - Check console for errors during source fetching
- Verify LLM is working:
llm_calls > 0in progress logs - Increase
MAX_DEPTHto get more coverage
Out of Memory Errors
Symptoms: Process crashes, system becomes unresponsive
Solutions:
- Ensure
MAX_CONCURRENT_TASKS=1in.env - Reduce
MAX_DEPTHto 2 or 3 - Reduce
MAX_CHILDRENto 2 or 3 - Lower
SOURCE_COUNTto 10-20 - Close other applications to free RAM
- Consider upgrading RAM if you need deeper research
API Rate Limit Errors
Symptoms: "Rate limit exceeded" or HTTP 429 errors, or "All providers rate limited" warnings
Automatic Handling: The system has coordinated rate limit throttling:
- When one request hits a rate limit, ALL concurrent requests pause together
- This prevents "thundering herd" where multiple requests independently discover limits
- Automatic retry with exponential backoff (up to 60 seconds) before failing
Solutions (if automatic retry fails):
- Set
MAX_CONCURRENT_TASKS=1(slower but stays under limits) - Use free models with higher limits:
arcee-ai/trinity-large-preview:free - Check OpenRouter dashboard for your account limits
- Configure multiple API keys to increase effective rate limits
- Upgrade to paid OpenRouter tier for higher limits
Very Slow Progress
Symptoms: Hours passing with minimal completed agents
Solutions:
- Check
MAX_DEPTHisn't too high (each level multiplies agents exponentially) - Monitor
pendingcount in logs (should decrease over time) - Verify network isn't dropping connections:
ping openrouter.ai - Consider increasing
MAX_CONCURRENT_TASKSif you have RAM - Reduce
SOURCE_COUNTto speed up each agent - Use faster model:
openai/gpt-4o-miniinstead of larger models
Repeated Content in Report
Symptoms: Same information appears multiple times
Solutions:
- System has automatic deduplication, but some repetition is expected
- Agents exploring similar topics may find overlapping sources
- This is normal for broad topics
- Consider reducing
MAX_CHILDRENto diversify topics more - Use more specific research topics
in_progress > MAX_CONCURRENT_TASKS
Symptoms: Progress logs show more tasks running than configured
Diagnosis: Bug in concurrency control implementation
Solution: File an issue with full logs and configuration
Progress Stuck at 0
Symptoms: No tasks being processed, in_progress always 0
Solutions:
- Check console for error messages
- Check OpenRouter API key is correct
- Ensure
.envfile is being loaded (check startup logs)
Architecture
High-Level Flow
CLI → Load .env → Initialize ExecutorLimiter
↓
ThreadPoolExecutor(max_workers=MAX_CONCURRENT_TASKS)
asyncio.Semaphore(MAX_CONCURRENT_TASKS)
↓
Orchestrator (spawn workers)
↓
Worker(s) claim tasks from database
↓
Agent.run()
/ \
/ \
connector.fetch() llm.summarize_async() × N
↓ ↓
Semaphore Semaphore
(acquire) (acquire)
↓ ↓
Executor Executor
(limited) (limited)
↓ ↓
HTTP Request OpenRouter API
↓ ↓
Release Release
Semaphore Semaphore
Component Responsibilities
ExecutorLimiter (executor_limiter.py)
- Singleton pattern
- Creates ThreadPoolExecutor with
max_workers=MAX_CONCURRENT_TASKS - Creates asyncio.Semaphore with same limit
- Provides
get_executor(),get_semaphore(),get_max_tasks()
Orchestrator (orchestrator.py)
- Spawns
CONCURRENCYworker coroutines - Workers claim tasks from datastore
- Workers run agents (blocked by semaphore)
- Tracks progress, handles budgets
Agent (agent.py)
- Fetches sources via connector (blocked by semaphore)
- Summarizes each source via LLM (blocked by semaphore)
- Derives subtopics from summaries
- Spawns child agents via orchestrator.enqueue()
LLM (llm.py)
- Wraps OpenRouter/OpenAI API calls
summarize_async(): Async with concurrency controlsummarize_to_200_words_async(): For report exec summary- Tracks usage: calls, tokens
Web Search (web_search.py)
- Multi-provider search with configurable fallback chain
- Supports: Brave, SerpAPI, Tavily, Exa, Bing, Wikipedia, arXiv
- Uses global executor and semaphore for concurrency control
Datastore (datastore.py)
- SQLite persistence layer
- Tables: jobs, tasks, agents, artifacts, embeddings
claim_next_task(): Race-free task claiming (BEGIN IMMEDIATE)
ReportGenerator (report.py)
- Aggregates agent results
- Generates executive summary
- Writes streaming Markdown report
- Deduplicates topics and summaries
Sequential Execution Timeline
Time (seconds) →
0──────────10─────────20─────────30─────────40─────────50
Agent 1 [Fetch][LLM][LLM][LLM]...[LLM]
Agent 2 [Fetch][LLM]...
Agent 3...
Note: Only ONE operation (fetch or LLM) active at any moment.
Agent 2 waits for Agent 1 to complete all its operations.
Parallel Execution Timeline
Time (seconds) →
0──────────10─────────20
Agent 1 [Fetch][LLM][LLM][LLM]...[LLM]
Agent 2 [Fetch][LLM][LLM][LLM]...[LLM]
Agent 3 [Fetch][LLM][LLM][LLM]...[LLM]
Agent 4 [Fetch][LLM][LLM][LLM]...[LLM]
Agent 5...
Note: Up to MAX_CONCURRENT_TASKS operations can happen simultaneously.
Agents 5+ wait for a slot to open up.
Memory Usage Comparison
Traditional (No Limits):
RAM = N_concurrent_agents × RAM_per_agent × (1 + N_sources)
Example: 100 agents × 10MB × 51 = 51GB → CRASH!
Sequential (MAX_CONCURRENT_TASKS=1):
RAM = 1 × 10MB × 51 = 510MB → Constant!
Parallel (MAX_CONCURRENT_TASKS=4):
RAM = 4 × 10MB × 51 = 2GB → Manageable!
Key Insights
- Semaphore gates ALL I/O operations (fetch, LLM calls)
- ThreadPoolExecutor prevents blocking event loop
- max_workers=1 → sequential via thread serialization
- Semaphore(1) → only 1 async op passes at a time
- Together: complete serialization of all operations
- Memory stays constant regardless of tree depth
- Only time scales with agent count
- Can safely set depth=100, children=1000 without crash
Advanced Topics
Budget Controls
Add to .env to limit costs:
MAX_TOKENS=1000000 # Stop after N tokens
MAX_CALLS=500 # Stop after N API calls
MAX_TIME=3600 # Stop after N seconds
MAX_COST=10.00 # Stop after $N USD (if supported)
When a budget is exceeded, the job stops gracefully and generates a report with completed agents.
Custom Connectors
Create your own connector by implementing:
class CustomConnector:
async def fetch(self, topic: str, n: int = 6):
"""Fetch n documents for topic.
Returns: [{"title": str, "url": str, "text": str}, ...]
"""
from rabbithole.executor_limiter import get_executor, get_semaphore
loop = asyncio.get_event_loop()
executor = get_executor()
semaphore = get_semaphore()
async with semaphore:
return await loop.run_in_executor(executor, self._sync_fetch, topic, n)
def _sync_fetch(self, topic: str, n: int):
# Your sync implementation here
return [{"title": "...", "url": "...", "text": "..."}]
Embeddings (Optional)
Enable vector embeddings for local retrieval:
from rabbithole.embeddings import Embeddings
embeddings = Embeddings()
orch = Orchestrator(..., embeddings=embeddings)
Embeddings are stored in the database for similarity search.
Database Schema
-- Jobs
CREATE TABLE jobs (
job_id TEXT PRIMARY KEY,
topic TEXT,
config TEXT, -- JSON
status TEXT,
created_at TEXT,
updated_at TEXT
);
-- Tasks (pending work items)
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
job_id TEXT,
parent_id TEXT,
topic TEXT,
depth INTEGER,
max_depth INTEGER,
max_children INTEGER,
status TEXT, -- pending, in_progress, done
created_at TEXT
);
-- Agents (completed results)
CREATE TABLE agents (
agent_id TEXT PRIMARY KEY,
job_id TEXT,
parent_id TEXT,
topic TEXT,
depth INTEGER,
result TEXT, -- JSON
created_at TEXT
);
-- Raw artifacts (cached sources)
CREATE TABLE artifacts (
path TEXT PRIMARY KEY,
content TEXT,
created_at TEXT
);
-- Embeddings (optional)
CREATE TABLE embeddings (
id TEXT PRIMARY KEY,
job_id TEXT,
doc_id TEXT,
embedding BLOB,
metadata TEXT, -- JSON
created_at TEXT
);
-- Usage tracking
CREATE TABLE job_usage (
job_id TEXT PRIMARY KEY,
usage TEXT, -- JSON with calls, tokens, etc.
created_at TEXT,
updated_at TEXT
);
Extending the System
Add new LLM provider:
Edit llm.py to add provider detection and API call logic.
Add new report format:
Create new generator class similar to ReportGenerator in report.py.
Add new storage backend:
Implement interface from datastore.py with your storage system.
Add custom agent logic:
Subclass Agent in agent.py and override run() or _derive_subtopics().
Tips and Best Practices
- Start small: Always test with depth=2, children=2 first
- Monitor logs: Watch pending/in_progress/done counts
- Be patient: Deep research takes time in sequential mode
- Use free models: Test with free models before using paid ones
- Sequential is safest: Default MAX_CONCURRENT_TASKS=1 prevents crashes
- Read the output: Check research_report.md quality before scaling up
- Iterate: Adjust depth/children based on initial results
- Save .env: Back up configuration before experimenting
- Budget wisely: Set MAX_TOKENS or MAX_CALLS to prevent runaway costs
- Review topics: Ensure your research topic is specific enough
Support and Resources
- OpenRouter Documentation: https://openrouter.ai/docs
- OpenRouter Models: https://openrouter.ai/models
- OpenRouter API Keys: https://openrouter.ai/keys
- Account Settings: https://openrouter.ai/settings/privacy
- GitHub Issues: (your repo URL here)
Acknowledgments
Built with:
- OpenRouter API for LLM access
- SQLite for persistence
- asyncio for concurrency control
- requests for HTTP
- BeautifulSoup for HTML parsing
Ready to start? Edit .env with your API key and run:
python -m rabbithole.cli "Your fascinating research topic"
Happy researching! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rabbithole_research-0.1.0.tar.gz.
File metadata
- Download URL: rabbithole_research-0.1.0.tar.gz
- Upload date:
- Size: 82.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f65049db2d50e547d03e118b4f538fb02f7b68055e806c4ba663db7297a5320d
|
|
| MD5 |
239943504b4314965636dca97e1a848d
|
|
| BLAKE2b-256 |
1092e684793a301dca92e827a712f434ccf2871df5bf8175f334369e6a0225e2
|
File details
Details for the file rabbithole_research-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rabbithole_research-0.1.0-py3-none-any.whl
- Upload date:
- Size: 65.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5b1be1ed6160c73eed2d64e065d9a0bfef3f38ccb063264f9b1840d4d4836be
|
|
| MD5 |
17f2bdaf1f8e285be1366580683de5b6
|
|
| BLAKE2b-256 |
f3411851779b33b35e9ddf45cad82c9f3e0168953853cad8e1257f40ab55c29f
|