Skip to main content

Python SDK for Open Deep Research

Project description

🧠 Deep Research

Python Version

An AI-powered research assistant that performs comprehensive, autonomous research on any topic

FeaturesInstallationQuick StartExamplesContribute

🌟 Overview

DeepResearch SDK empowers developers with AI-driven research capabilities, enabling applications to conduct deep, iterative research autonomously. Inspired by products like Perplexity AI and Claude's web browsing, DeepResearch combines web search, content extraction, and AI analysis into a unified, easy-to-use API.

Originally a TypeScript implementation, this Python SDK adds parallel web scraping, multiple search providers, and a more developer-friendly interface.

graph TD
    subgraph "DeepResearch SDK"
        A[Deep Research] --> B[DoclingClient]
        A --> C[LLM Integration]
        A --> D[Research Callbacks]

        B --> E[Brave Search]
        B --> F[DuckDuckGo Search]
        B --> G[Document Extraction]

        C --> H[GPT Models]
        C --> I[Claude Models]
        C --> J[Other LLMs]

        D --> K[Progress Tracking]
        D --> L[Source Monitoring]
        D --> M[Activity Logging]
    end

    classDef primary fill:#4285F4,stroke:#333,stroke-width:2px,color:white
    classDef secondary fill:#34A853,stroke:#333,stroke-width:2px,color:white
    classDef tertiary fill:#FBBC05,stroke:#333,stroke-width:2px,color:white
    classDef quaternary fill:#EA4335,stroke:#333,stroke-width:2px,color:white

    class A primary
    class B,C,D secondary
    class E,F,G,H,I,J tertiary
    class K,L,M quaternary

📋 Table of Contents

🚀 Key Features

  • 📊 Multiple Search Providers

    • Brave Search API for high-quality results
    • DuckDuckGo Search as automatic API-key-free fallback
    • Fault-tolerant fallback system ensures searches always return results
  • ⚡ Parallel Processing

    • Extract content from multiple sources simultaneously
    • Control concurrency to balance speed and resource usage
  • 🔄 Adaptive Research

    • Automatic gap identification in research
    • Self-guided exploration of topics
    • Depth-first approach with backtracking
  • 🧩 Modular Architecture

    • Easily extensible for custom search providers
    • Plug in different LLM backends through LiteLLM
    • Event-driven callback system for monitoring progress
  • 🛠️ Developer-Friendly

    • Async/await interface for integration with modern Python applications
    • Type hints throughout for IDE autocompletion
    • Comprehensive Pydantic models for structured data
    • Rich metadata for search results (provider, publication date, relevance)
    • Optional caching system for search and extraction results

📦 Installation

Using pip

# Install from PyPI
pip install deep-research-sdk

Using Poetry

# Add to your Poetry project
poetry add deep-research-sdk

From Source

# Clone the repository
git clone https://github.com/Rishang/deep-research-sdk.git
cd deep-research-sdk

poetry install

🏁 Quick Start

import asyncio
import os
import logging
from deep_research import DeepResearch
from deep_research.utils.docling_client import DoclingClient

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

async def main():
    # Get API keys from environment variables
    openai_api_key = os.environ.get("OPENAI_API_KEY")
    brave_api_key = os.environ.get("BRAVE_SEARCH_API_KEY")  # Optional

    if not openai_api_key:
        print("Error: OPENAI_API_KEY environment variable not set")
        return

    # Initialize the DeepResearch instance
    # Optional: Configure caching for search and extraction results
    # If you don't want to use caching, you can simply set `cache_config=None`
    from deep_research.utils.cache import CacheConfig
    
    cache_config = CacheConfig(
        enabled=True,                  # Enable caching
        ttl_seconds=3600,              # Cache entries expire after 1 hour
        db_url="sqlite:///docling_cache.sqlite3"  # SQLite database for cache
    )

    researcher = DeepResearch(
        docling_client=DoclingClient(
            cache_config=cache_config,
            brave_api_key=brave_api_key, # Optional: Brave Search API key if None it will use DuckDuckGo with no API key
        ),
        llm_api_key=openai_api_key,
        research_model="gpt-4o-mini",      # Advanced research model
        reasoning_model="o3-mini",         # More efficient model for reasoning
        max_depth=3,                       # Maximum research depth
        time_limit_minutes=2              # Time limit in minutes
    )

    # Perform research
    result = await researcher.research("The impact of quantum computing on cryptography")

    # Process results
    if result.success:
        print("\n==== RESEARCH SUCCESSFUL ====")
        print(f"Found {len(result.data['findings'])} pieces of information")
        print(f"Analysis:\n{result.data['analysis']}")
    else:
        print(f"Research failed: {result.error}")

if __name__ == "__main__":
    asyncio.run(main())

🔍 Quick Demo

Want to see the DeepResearch SDK in action quickly? Follow these steps:

  1. Set up your environment variables:
# [Optional] Brave Search API key if not provided it will use DuckDuckGo with no API key required
export BRAVE_SEARCH_API_KEY="your-brave-api-key"

# Set your LLM API keys in your environment
export OPENAI_API_KEY="your-openai-api-key"
  1. Clone the repository:
git clone https://github.com/Rishang/deep-research-sdk.git
cd deep-research-sdk
  1. Run the included demo script:
python example.py

You'll see research progress in real-time, and within about a minute, get an AI-generated analysis of the benefits of regular exercise.

🔬 How It Works

DeepResearch implements an iterative research process:

  1. Initialization: Configure models, search providers and parameters
  2. Search: Find relevant sources on the topic from multiple providers
  3. Extraction: Process and extract content from top sources in parallel
  4. Analysis: Analyze findings, identify knowledge gaps and plan next steps
  5. Iteration: Continue research with refined focus based on identified gaps
  6. Synthesis: Generate comprehensive analysis with citations
flowchart LR
    A[Initialization] --> B[Search]
    B --> C[Extraction]
    C --> D[Analysis]
    D --> E{Continue?}
    E -->|Yes| F[Refine Focus]
    F --> B
    E -->|No| G[Final Synthesis]

    style A fill:#4285F4,stroke:#333,stroke-width:2px,color:white
    style B fill:#34A853,stroke:#333,stroke-width:2px,color:white
    style C fill:#FBBC05,stroke:#333,stroke-width:2px,color:white
    style D fill:#EA4335,stroke:#333,stroke-width:2px,color:white
    style E fill:#673AB7,stroke:#333,stroke-width:2px,color:white
    style F fill:#FF9800,stroke:#333,stroke-width:2px,color:white
    style G fill:#2196F3,stroke:#333,stroke-width:2px,color:white

📊 Usage Examples

Search Results with Metadata

The DeepResearch SDK returns search results with rich metadata, including:

from deep_research.utils.docling_client import DoclingClient

client = DoclingClient()

# Get search results
search_results = await client.search("artificial intelligence")

# Access metadata in search results
for result in search_results.data:
    print(f"Title: {result.title}")
    print(f"URL: {result.url}")
    print(f"Description: {result.description}")
    print(f"Provider: {result.provider}")  # Which search engine provided this result
    print(f"Date: {result.date}")          # Publication date when available
    print(f"Relevance: {result.relevance}")

Using the Cache System

from deep_research.utils.cache import CacheConfig
from deep_research.utils.docling_client import DoclingClient

# Configure the cache with SQLite (default)
cache_config = CacheConfig(
    enabled=True,                           # Enable caching
    ttl_seconds=3600,                       # Cache for 1 hour
    db_url="sqlite:///docling_cache.db",    # Use SQLite
    create_tables=True                      # Create tables if they don't exist
)

# Initialize client with caching
client = DoclingClient(
    cache_config=cache_config
)

# Search results will be cached and reused for identical queries
search_result = await client.search("quantum computing")
# Second call uses cached data
search_result = await client.search("quantum computing")

# To disable caching completely, either:
# 1. Don't provide a cache_config:
client_no_cache = DoclingClient()  # No caching


# For MySQL/MariaDB backend instead of SQLite
# First install pymysql: pip install pymysql
mysql_config = CacheConfig(
    enabled=True,
    db_url="mysql+pymysql://username:password@localhost/docling_cache"
)

Advanced: Using Structured Models for Cache Keys

from pydantic import BaseModel
from deep_research.utils.cache import cache

# Define which parameters should be used for caching
class SearchParams(BaseModel):
    query: str
    max_results: int = 10

@cache(structure=SearchParams)
async def search_function(self, query: str, max_results: int = 10):
    # Only query and max_results will be used for the cache key
    # Other parameters will be ignored for caching purposes
    return results

Custom Research Parameters

# Configure research with custom parameters
result = await researcher.research(
    topic="Emerging trends in renewable energy storage",
    max_tokens=3000,           # Control output length
    temperature=0.7            # Add more creativity to analysis
)

Using with Different LLM Providers

DeepResearch uses LiteLLM, which supports multiple LLM providers:

# Use Anthropic Claude models
researcher = DeepResearch(
    # ... other parameters
    research_model="anthropic/claude-3-opus-20240229",
    reasoning_model="anthropic/claude-3-haiku-20240307",
)

Scheduling Research Tasks

# Run multiple research tasks concurrently
async def research_multiple_topics():
    topics = ["AI safety", "Climate adaptation", "Future of work"]
    tasks = [researcher.research(topic, max_depth=2) for topic in topics]
    results = await asyncio.gather(*tasks)

    for topic, result in zip(topics, results):
        print(f"Research on {topic}: {'Success' if result.success else 'Failed'}")

🔄 Custom Callbacks

Monitor and track research progress by implementing custom callbacks:

from deep_research.core.callbacks import ResearchCallback
from deep_research.models import ActivityItem, SourceItem

class MyCallback(ResearchCallback):
    async def on_activity(self, activity: ActivityItem) -> None:
        # Handle activity updates (search, extract, analyze)
        print(f"Activity: {activity.type} - {activity.message}")

    async def on_source(self, source: SourceItem) -> None:
        # Handle discovered sources
        print(f"Source: {source.title} ({source.url})")

    async def on_depth_change(self, current, maximum, completed_steps, total_steps) -> None:
        # Track research depth progress
        progress = int(completed_steps / total_steps * 100) if total_steps > 0 else 0
        print(f"Depth: {current}/{maximum} - Progress: {progress}%")

    async def on_progress_init(self, max_depth: int, total_steps: int) -> None:
        # Handle research initialization
        print(f"Initialized with max depth {max_depth} and {total_steps} steps")

    async def on_finish(self, content: str) -> None:
        # Handle research completion
        print(f"Research complete! Result length: {len(content)} characters")

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


If you find DeepResearch SDK useful, please consider giving it a star on GitHub!

GitHub stars

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_research_sdk-0.1.1.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deep_research_sdk-0.1.1-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file deep_research_sdk-0.1.1.tar.gz.

File metadata

  • Download URL: deep_research_sdk-0.1.1.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for deep_research_sdk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 407a1c33a56707148590a3bf42d5b5755572ded891444da1d432bd4e36aea1f4
MD5 75ff758f56161cfe55abd7677ac6af31
BLAKE2b-256 3cb128e864f27ae0d2a0ce86ca4edb4b1bf182af0cc23cb4d14491532d548902

See more details on using hashes here.

File details

Details for the file deep_research_sdk-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for deep_research_sdk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 16912b12639b008a99f7e24b897a753d53dbfcbdcf6fe49546c96f27d8c50d9d
MD5 704afce86bc70970d8da828f33eb8a40
BLAKE2b-256 f5fa8a04a41c5cb0560c13c91acbac7f4785fd2b5e411e899b241ec2e741eadb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page