Python SDK for Open Deep Research

Project description

🧠 Deep Research

Python Version

An AI-powered research assistant that performs comprehensive, autonomous research on any topic

Features • Installation • Quick Start • Examples • Contribute

🌟 Overview

DeepResearch SDK empowers developers with AI-driven research capabilities, enabling applications to conduct deep, iterative research autonomously. Inspired by products like Perplexity AI and Claude's web browsing, DeepResearch combines web search, content extraction, and AI analysis into a unified, easy-to-use API.

Originally a TypeScript implementation, this Python SDK adds parallel web scraping, multiple search providers, and a more developer-friendly interface.

graph TD
    subgraph "DeepResearch SDK"
        A[Deep Research] --> B[DoclingClient]
        A --> C[LLM Integration]
        A --> D[Research Callbacks]

        B --> E[Brave Search]
        B --> F[DuckDuckGo Search]
        B --> G[Document Extraction]

        C --> H[GPT Models]
        C --> I[Claude Models]
        C --> J[Other LLMs]

        D --> K[Progress Tracking]
        D --> L[Source Monitoring]
        D --> M[Activity Logging]
    end

    classDef primary fill:#4285F4,stroke:#333,stroke-width:2px,color:white
    classDef secondary fill:#34A853,stroke:#333,stroke-width:2px,color:white
    classDef tertiary fill:#FBBC05,stroke:#333,stroke-width:2px,color:white
    classDef quaternary fill:#EA4335,stroke:#333,stroke-width:2px,color:white

    class A primary
    class B,C,D secondary
    class E,F,G,H,I,J tertiary
    class K,L,M quaternary

📋 Table of Contents

🧠 DeepResearch

🚀 Key Features

📊 Multiple Search Providers
- Brave Search API for high-quality results
- DuckDuckGo Search as automatic API-key-free fallback
- Fault-tolerant fallback system ensures searches always return results
⚡ Parallel Processing
- Extract content from multiple sources simultaneously
- Control concurrency to balance speed and resource usage
🔄 Adaptive Research
- Automatic gap identification in research
- Self-guided exploration of topics
- Depth-first approach with backtracking
🧩 Modular Architecture
- Easily extensible for custom search providers
- Plug in different LLM backends through LiteLLM
- Event-driven callback system for monitoring progress
🛠️ Developer-Friendly
- Async/await interface for integration with modern Python applications
- Type hints throughout for IDE autocompletion
- Comprehensive Pydantic models for structured data
- Rich metadata for search results (provider, publication date, relevance)
- Optional caching system for search and extraction results

📦 Installation

Using pip

# Install from PyPI
pip install deep-research-sdk

Using Poetry

# Add to your Poetry project
poetry add deep-research-sdk

From Source

# Clone the repository
git clone https://github.com/Rishang/deep-research-sdk.git
cd deep-research-sdk

poetry install

🏁 Quick Start

import asyncio
import os
import logging
from deep_research import DeepResearch
from deep_research.utils.docling_client import DoclingClient

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

async def main():
    # Get API keys from environment variables
    openai_api_key = os.environ.get("OPENAI_API_KEY")
    brave_api_key = os.environ.get("BRAVE_SEARCH_API_KEY")  # Optional

    if not openai_api_key:
        print("Error: OPENAI_API_KEY environment variable not set")
        return

    # Initialize the DeepResearch instance
    # Optional: Configure caching for search and extraction results
    # If you don't want to use caching, you can simply set `cache_config=None`
    from deep_research.utils.cache import CacheConfig
    
    cache_config = CacheConfig(
        enabled=True,                  # Enable caching
        ttl_seconds=3600,              # Cache entries expire after 1 hour
        db_url="sqlite:///docling_cache.sqlite3"  # SQLite database for cache
    )

    researcher = DeepResearch(
        docling_client=DoclingClient(
            cache_config=cache_config,
            brave_api_key=brave_api_key, # Optional: Brave Search API key if None it will use DuckDuckGo with no API key
        ),
        llm_api_key=openai_api_key,
        research_model="gpt-4o-mini",      # Advanced research model
        reasoning_model="o3-mini",         # More efficient model for reasoning
        max_depth=3,                       # Maximum research depth
        time_limit_minutes=2              # Time limit in minutes
    )

    # Perform research
    result = await researcher.research("The impact of quantum computing on cryptography")

    # Process results
    if result.success:
        print("\n==== RESEARCH SUCCESSFUL ====")
        print(f"Found {len(result.data['findings'])} pieces of information")
        print(f"Analysis:\n{result.data['analysis']}")
    else:
        print(f"Research failed: {result.error}")

if __name__ == "__main__":
    asyncio.run(main())

🔍 Quick Demo

Want to see the DeepResearch SDK in action quickly? Follow these steps:

Set up your environment variables:

# [Optional] Brave Search API key if not provided it will use DuckDuckGo with no API key required
export BRAVE_SEARCH_API_KEY="your-brave-api-key"

# Set your LLM API keys in your environment
export OPENAI_API_KEY="your-openai-api-key"

Clone the repository:

git clone https://github.com/Rishang/deep-research-sdk.git
cd deep-research-sdk

Run the included demo script:

python example.py

You'll see research progress in real-time, and within about a minute, get an AI-generated analysis of the benefits of regular exercise.

🔬 How It Works

DeepResearch implements an iterative research process:

Initialization: Configure models, search providers and parameters
Search: Find relevant sources on the topic from multiple providers
Extraction: Process and extract content from top sources in parallel
Analysis: Analyze findings, identify knowledge gaps and plan next steps
Iteration: Continue research with refined focus based on identified gaps
Synthesis: Generate comprehensive analysis with citations

flowchart LR
    A[Initialization] --> B[Search]
    B --> C[Extraction]
    C --> D[Analysis]
    D --> E{Continue?}
    E -->|Yes| F[Refine Focus]
    F --> B
    E -->|No| G[Final Synthesis]

    style A fill:#4285F4,stroke:#333,stroke-width:2px,color:white
    style B fill:#34A853,stroke:#333,stroke-width:2px,color:white
    style C fill:#FBBC05,stroke:#333,stroke-width:2px,color:white
    style D fill:#EA4335,stroke:#333,stroke-width:2px,color:white
    style E fill:#673AB7,stroke:#333,stroke-width:2px,color:white
    style F fill:#FF9800,stroke:#333,stroke-width:2px,color:white
    style G fill:#2196F3,stroke:#333,stroke-width:2px,color:white

📊 Usage Examples

Search Results with Metadata

The DeepResearch SDK returns search results with rich metadata, including:

from deep_research.utils.docling_client import DoclingClient

client = DoclingClient()

# Get search results
search_results = await client.search("artificial intelligence")

# Access metadata in search results
for result in search_results.data:
    print(f"Title: {result.title}")
    print(f"URL: {result.url}")
    print(f"Description: {result.description}")
    print(f"Provider: {result.provider}")  # Which search engine provided this result
    print(f"Date: {result.date}")          # Publication date when available
    print(f"Relevance: {result.relevance}")

Using the Cache System

from deep_research.utils.cache import CacheConfig
from deep_research.utils.docling_client import DoclingClient

# Configure the cache with SQLite (default)
cache_config = CacheConfig(
    enabled=True,                           # Enable caching
    ttl_seconds=3600,                       # Cache for 1 hour
    db_url="sqlite:///docling_cache.db",    # Use SQLite
    create_tables=True                      # Create tables if they don't exist
)

# Initialize client with caching
client = DoclingClient(
    cache_config=cache_config
)

# Search results will be cached and reused for identical queries
search_result = await client.search("quantum computing")
# Second call uses cached data
search_result = await client.search("quantum computing")

# To disable caching completely, either:
# 1. Don't provide a cache_config:
client_no_cache = DoclingClient()  # No caching


# For MySQL/MariaDB backend instead of SQLite
# First install pymysql: pip install pymysql
mysql_config = CacheConfig(
    enabled=True,
    db_url="mysql+pymysql://username:password@localhost/docling_cache"
)

Advanced: Using Structured Models for Cache Keys

from pydantic import BaseModel
from deep_research.utils.cache import cache

# Define which parameters should be used for caching
class SearchParams(BaseModel):
    query: str
    max_results: int = 10

@cache(structure=SearchParams)
async def search_function(self, query: str, max_results: int = 10):
    # Only query and max_results will be used for the cache key
    # Other parameters will be ignored for caching purposes
    return results

Custom Research Parameters

# Configure research with custom parameters
result = await researcher.research(
    topic="Emerging trends in renewable energy storage",
    max_tokens=3000,           # Control output length
    temperature=0.7            # Add more creativity to analysis
)

Using with Different LLM Providers

DeepResearch uses LiteLLM, which supports multiple LLM providers:

# Use Anthropic Claude models
researcher = DeepResearch(
    # ... other parameters
    research_model="anthropic/claude-3-opus-20240229",
    reasoning_model="anthropic/claude-3-haiku-20240307",
)

Scheduling Research Tasks

# Run multiple research tasks concurrently
async def research_multiple_topics():
    topics = ["AI safety", "Climate adaptation", "Future of work"]
    tasks = [researcher.research(topic, max_depth=2) for topic in topics]
    results = await asyncio.gather(*tasks)

    for topic, result in zip(topics, results):
        print(f"Research on {topic}: {'Success' if result.success else 'Failed'}")

🔄 Custom Callbacks

Monitor and track research progress by implementing custom callbacks:

from deep_research.core.callbacks import ResearchCallback
from deep_research.models import ActivityItem, SourceItem

class MyCallback(ResearchCallback):
    async def on_activity(self, activity: ActivityItem) -> None:
        # Handle activity updates (search, extract, analyze)
        print(f"Activity: {activity.type} - {activity.message}")

    async def on_source(self, source: SourceItem) -> None:
        # Handle discovered sources
        print(f"Source: {source.title} ({source.url})")

    async def on_depth_change(self, current, maximum, completed_steps, total_steps) -> None:
        # Track research depth progress
        progress = int(completed_steps / total_steps * 100) if total_steps > 0 else 0
        print(f"Depth: {current}/{maximum} - Progress: {progress}%")

    async def on_progress_init(self, max_depth: int, total_steps: int) -> None:
        # Handle research initialization
        print(f"Initialized with max depth {max_depth} and {total_steps} steps")

    async def on_finish(self, content: str) -> None:
        # Handle research completion
        print(f"Research complete! Result length: {len(content)} characters")

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

If you find DeepResearch SDK useful, please consider giving it a star on GitHub!

Project details

Release history Release notifications | RSS feed

0.1.2

Apr 11, 2025

This version

0.1.1

Apr 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_research_sdk-0.1.1.tar.gz (30.7 kB view details)

Uploaded Apr 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deep_research_sdk-0.1.1-py3-none-any.whl (27.8 kB view details)

Uploaded Apr 7, 2025 Python 3

File details

Details for the file deep_research_sdk-0.1.1.tar.gz.

File metadata

Download URL: deep_research_sdk-0.1.1.tar.gz
Upload date: Apr 7, 2025
Size: 30.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.12

File hashes

Hashes for deep_research_sdk-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`407a1c33a56707148590a3bf42d5b5755572ded891444da1d432bd4e36aea1f4`
MD5	`75ff758f56161cfe55abd7677ac6af31`
BLAKE2b-256	`3cb128e864f27ae0d2a0ce86ca4edb4b1bf182af0cc23cb4d14491532d548902`

See more details on using hashes here.

File details

Details for the file deep_research_sdk-0.1.1-py3-none-any.whl.

File metadata

Download URL: deep_research_sdk-0.1.1-py3-none-any.whl
Upload date: Apr 7, 2025
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.12

File hashes

Hashes for deep_research_sdk-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`16912b12639b008a99f7e24b897a753d53dbfcbdcf6fe49546c96f27d8c50d9d`
MD5	`704afce86bc70970d8da828f33eb8a40`
BLAKE2b-256	`f5fa8a04a41c5cb0560c13c91acbac7f4785fd2b5e411e899b241ec2e741eadb`

See more details on using hashes here.

deep-research-sdk 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🧠 Deep Research

🌟 Overview

📋 Table of Contents

🚀 Key Features

📦 Installation

Using pip

Using Poetry

From Source

🏁 Quick Start

🔍 Quick Demo

🔬 How It Works

📊 Usage Examples

Search Results with Metadata

Using the Cache System

Advanced: Using Structured Models for Cache Keys

Custom Research Parameters

Using with Different LLM Providers

Scheduling Research Tasks

🔄 Custom Callbacks

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes