Python SDK for Open Deep Research
Project description
🧠 Deep Research
An AI-powered research assistant that performs comprehensive, autonomous research on any topic
Features • Installation • Quick Start • Examples • Contribute
🌟 Overview
DeepResearch SDK empowers developers with AI-driven research capabilities, enabling applications to conduct deep, iterative research autonomously. Inspired by products like Perplexity AI and Claude's web browsing, DeepResearch combines web search, content extraction, and AI analysis into a unified, easy-to-use API.
Originally a TypeScript implementation, this Python SDK adds parallel web scraping, multiple search providers, and a more developer-friendly interface.
graph TD
subgraph "DeepResearch SDK"
A[Deep Research] --> B[DoclingClient]
A --> C[LLM Integration]
A --> D[Research Callbacks]
B --> E[Brave Search]
B --> F[DuckDuckGo Search]
B --> G[Document Extraction]
C --> H[GPT Models]
C --> I[Claude Models]
C --> J[Other LLMs]
D --> K[Progress Tracking]
D --> L[Source Monitoring]
D --> M[Activity Logging]
end
classDef primary fill:#4285F4,stroke:#333,stroke-width:2px,color:white
classDef secondary fill:#34A853,stroke:#333,stroke-width:2px,color:white
classDef tertiary fill:#FBBC05,stroke:#333,stroke-width:2px,color:white
classDef quaternary fill:#EA4335,stroke:#333,stroke-width:2px,color:white
class A primary
class B,C,D secondary
class E,F,G,H,I,J tertiary
class K,L,M quaternary
📋 Table of Contents
- 🧠 DeepResearch
🚀 Key Features
-
📊 Multiple Search Providers
- Brave Search API for high-quality results
- DuckDuckGo Search as automatic API-key-free fallback
- Fault-tolerant fallback system ensures searches always return results
-
⚡ Parallel Processing
- Extract content from multiple sources simultaneously
- Control concurrency to balance speed and resource usage
-
🔄 Adaptive Research
- Automatic gap identification in research
- Self-guided exploration of topics
- Depth-first approach with backtracking
-
🧩 Modular Architecture
- Easily extensible for custom search providers
- Plug in different LLM backends through LiteLLM
- Event-driven callback system for monitoring progress
-
🛠️ Developer-Friendly
- Async/await interface for integration with modern Python applications
- Type hints throughout for IDE autocompletion
- Comprehensive Pydantic models for structured data
- Rich metadata for search results (provider, publication date, relevance)
- Optional caching system for search and extraction results
📦 Installation
Using pip
# Install from PyPI
pip install deep-research-sdk
Using Poetry
# Add to your Poetry project
poetry add deep-research-sdk
From Source
# Clone the repository
git clone https://github.com/Rishang/deep-research-sdk.git
cd deep-research-sdk
poetry install
🏁 Quick Start
import asyncio
import os
import logging
from deep_research import DeepResearch
from deep_research.utils.docling_client import DoclingClient
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
async def main():
# Get API keys from environment variables
openai_api_key = os.environ.get("OPENAI_API_KEY")
brave_api_key = os.environ.get("BRAVE_SEARCH_API_KEY") # Optional
if not openai_api_key:
print("Error: OPENAI_API_KEY environment variable not set")
return
# Initialize the DeepResearch instance
# Optional: Configure caching for search and extraction results
# If you don't want to use caching, you can simply set `cache_config=None`
from deep_research.utils.cache import CacheConfig
cache_config = CacheConfig(
enabled=True, # Enable caching
ttl_seconds=3600, # Cache entries expire after 1 hour
db_url="sqlite:///docling_cache.sqlite3" # SQLite database for cache
)
researcher = DeepResearch(
docling_client=DoclingClient(
cache_config=cache_config,
brave_api_key=brave_api_key, # Optional: Brave Search API key if None it will use DuckDuckGo with no API key
),
llm_api_key=openai_api_key,
research_model="gpt-4o-mini", # Advanced research model
reasoning_model="o3-mini", # More efficient model for reasoning
max_depth=3, # Maximum research depth
time_limit_minutes=2 # Time limit in minutes
)
# Perform research
result = await researcher.research("The impact of quantum computing on cryptography")
# Process results
if result.success:
print("\n==== RESEARCH SUCCESSFUL ====")
print(f"Found {len(result.data['findings'])} pieces of information")
print(f"Analysis:\n{result.data['analysis']}")
else:
print(f"Research failed: {result.error}")
if __name__ == "__main__":
asyncio.run(main())
🔍 Quick Demo
Want to see the DeepResearch SDK in action quickly? Follow these steps:
- Set up your environment variables:
# [Optional] Brave Search API key if not provided it will use DuckDuckGo with no API key required
export BRAVE_SEARCH_API_KEY="your-brave-api-key"
# Set your LLM API keys in your environment
export OPENAI_API_KEY="your-openai-api-key"
- Clone the repository:
git clone https://github.com/Rishang/deep-research-sdk.git
cd deep-research-sdk
- Run the included demo script:
python example.py
You'll see research progress in real-time, and within about a minute, get an AI-generated analysis of the benefits of regular exercise.
🔬 How It Works
DeepResearch implements an iterative research process:
- Initialization: Configure models, search providers and parameters
- Search: Find relevant sources on the topic from multiple providers
- Extraction: Process and extract content from top sources in parallel
- Analysis: Analyze findings, identify knowledge gaps and plan next steps
- Iteration: Continue research with refined focus based on identified gaps
- Synthesis: Generate comprehensive analysis with citations
flowchart LR
A[Initialization] --> B[Search]
B --> C[Extraction]
C --> D[Analysis]
D --> E{Continue?}
E -->|Yes| F[Refine Focus]
F --> B
E -->|No| G[Final Synthesis]
style A fill:#4285F4,stroke:#333,stroke-width:2px,color:white
style B fill:#34A853,stroke:#333,stroke-width:2px,color:white
style C fill:#FBBC05,stroke:#333,stroke-width:2px,color:white
style D fill:#EA4335,stroke:#333,stroke-width:2px,color:white
style E fill:#673AB7,stroke:#333,stroke-width:2px,color:white
style F fill:#FF9800,stroke:#333,stroke-width:2px,color:white
style G fill:#2196F3,stroke:#333,stroke-width:2px,color:white
📊 Usage Examples
Search Results with Metadata
The DeepResearch SDK returns search results with rich metadata, including:
from deep_research.utils.docling_client import DoclingClient
client = DoclingClient()
# Get search results
search_results = await client.search("artificial intelligence")
# Access metadata in search results
for result in search_results.data:
print(f"Title: {result.title}")
print(f"URL: {result.url}")
print(f"Description: {result.description}")
print(f"Provider: {result.provider}") # Which search engine provided this result
print(f"Date: {result.date}") # Publication date when available
print(f"Relevance: {result.relevance}")
Using the Cache System
from deep_research.utils.cache import CacheConfig
from deep_research.utils.docling_client import DoclingClient
# Configure the cache with SQLite (default)
cache_config = CacheConfig(
enabled=True, # Enable caching
ttl_seconds=3600, # Cache for 1 hour
db_url="sqlite:///docling_cache.db", # Use SQLite
create_tables=True # Create tables if they don't exist
)
# Initialize client with caching
client = DoclingClient(
cache_config=cache_config
)
# Search results will be cached and reused for identical queries
search_result = await client.search("quantum computing")
# Second call uses cached data
search_result = await client.search("quantum computing")
# To disable caching completely, either:
# 1. Don't provide a cache_config:
client_no_cache = DoclingClient() # No caching
# For MySQL/MariaDB backend instead of SQLite
# First install pymysql: pip install pymysql
mysql_config = CacheConfig(
enabled=True,
db_url="mysql+pymysql://username:password@localhost/docling_cache"
)
Advanced: Using Structured Models for Cache Keys
from pydantic import BaseModel
from deep_research.utils.cache import cache
# Define which parameters should be used for caching
class SearchParams(BaseModel):
query: str
max_results: int = 10
@cache(structure=SearchParams)
async def search_function(self, query: str, max_results: int = 10):
# Only query and max_results will be used for the cache key
# Other parameters will be ignored for caching purposes
return results
Custom Research Parameters
# Configure research with custom parameters
result = await researcher.research(
topic="Emerging trends in renewable energy storage",
max_tokens=3000, # Control output length
temperature=0.7 # Add more creativity to analysis
)
Using with Different LLM Providers
DeepResearch uses LiteLLM, which supports multiple LLM providers:
# Use Anthropic Claude models
researcher = DeepResearch(
# ... other parameters
research_model="anthropic/claude-3-opus-20240229",
reasoning_model="anthropic/claude-3-haiku-20240307",
)
Scheduling Research Tasks
# Run multiple research tasks concurrently
async def research_multiple_topics():
topics = ["AI safety", "Climate adaptation", "Future of work"]
tasks = [researcher.research(topic, max_depth=2) for topic in topics]
results = await asyncio.gather(*tasks)
for topic, result in zip(topics, results):
print(f"Research on {topic}: {'Success' if result.success else 'Failed'}")
🔄 Custom Callbacks
Monitor and track research progress by implementing custom callbacks:
from deep_research.core.callbacks import ResearchCallback
from deep_research.models import ActivityItem, SourceItem
class MyCallback(ResearchCallback):
async def on_activity(self, activity: ActivityItem) -> None:
# Handle activity updates (search, extract, analyze)
print(f"Activity: {activity.type} - {activity.message}")
async def on_source(self, source: SourceItem) -> None:
# Handle discovered sources
print(f"Source: {source.title} ({source.url})")
async def on_depth_change(self, current, maximum, completed_steps, total_steps) -> None:
# Track research depth progress
progress = int(completed_steps / total_steps * 100) if total_steps > 0 else 0
print(f"Depth: {current}/{maximum} - Progress: {progress}%")
async def on_progress_init(self, max_depth: int, total_steps: int) -> None:
# Handle research initialization
print(f"Initialized with max depth {max_depth} and {total_steps} steps")
async def on_finish(self, content: str) -> None:
# Handle research completion
print(f"Research complete! Result length: {len(content)} characters")
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deep_research_sdk-0.1.1.tar.gz.
File metadata
- Download URL: deep_research_sdk-0.1.1.tar.gz
- Upload date:
- Size: 30.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
407a1c33a56707148590a3bf42d5b5755572ded891444da1d432bd4e36aea1f4
|
|
| MD5 |
75ff758f56161cfe55abd7677ac6af31
|
|
| BLAKE2b-256 |
3cb128e864f27ae0d2a0ce86ca4edb4b1bf182af0cc23cb4d14491532d548902
|
File details
Details for the file deep_research_sdk-0.1.1-py3-none-any.whl.
File metadata
- Download URL: deep_research_sdk-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16912b12639b008a99f7e24b897a753d53dbfcbdcf6fe49546c96f27d8c50d9d
|
|
| MD5 |
704afce86bc70970d8da828f33eb8a40
|
|
| BLAKE2b-256 |
f5fa8a04a41c5cb0560c13c91acbac7f4785fd2b5e411e899b241ec2e741eadb
|