Skip to main content

Autonomous research capabilities for LangChain models

Project description

LangChain DeepResearch

PyPI version License: MIT Python Version

A LangChain integration that provides autonomous, recursive research capabilities powered by any LangChain-compatible LLM.

🔍 Features

  • Model Agnostic: Works with any LangChain-compatible LLM (OpenAI, Anthropic, Llama, etc.)
  • Time-Aware Research: Automatically manages research time constraints
  • Recursive Exploration: Follows leads discovered during research to explore topics in depth
  • Adaptive Strategies: Adjusts research breadth and depth based on time and findings
  • Comprehensive Reporting: Generates structured research reports with citations
  • Customizable System Prompts: Control the personality, focus, and output style at each stage of research

📦 Installation

pip install langchain-deepresearch

Or install from source:

git clone https://github.com/doganarif/langchain-deepresearch.git
cd langchain-deepresearch
pip install -e .

🚀 Quick Start

import asyncio
from langchain_openai import ChatOpenAI
from langchain_deepresearch import DeepResearcher

async def main():
    # Initialize any LangChain model
    llm = ChatOpenAI(model="gpt-3.5-turbo")
    
    # Create the researcher with your LangChain model
    researcher = DeepResearcher(
        llm=llm,
        google_api_key="your-google-api-key",  # Or use environment variable
        google_cx="your-google-cx-id"          # Or use environment variable
    )
    
    # Run the research
    result = await researcher.research(
        query="Latest advancements in fusion energy", 
        breadth=3,    # Number of parallel searches
        depth=2       # Depth of recursive exploration
    )
    
    # Access results
    if result["success"]:
        print(result["report"])  # Markdown report
        print(f"Sources consulted: {len(result['visited_urls'])}")
        print(f"Insights gathered: {len(result['learnings'])}")

if __name__ == "__main__":
    asyncio.run(main())

🔄 Works with Any LangChain LLM

Use it with any LangChain-compatible model:

# With OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")
researcher = DeepResearcher(llm=llm)

# With Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-opus-20240229")
researcher = DeepResearcher(llm=llm)

# With Hugging Face models
from langchain_huggingface import HuggingFaceEndpoint
llm = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.2")
researcher = DeepResearcher(llm=llm)

⚙️ Configuration

You'll need to configure search API keys:

# Configure with explicit parameters
researcher = DeepResearcher(
    llm=llm,
    google_api_key="your-google-api-key",
    google_cx="your-google-cx-id",
    max_time_seconds=2400,  # 40 minutes (default)
    min_learnings_required=8  # Minimum insights before completion
)

# Or use environment variables
# GOOGLE_API_KEY
# GOOGLE_CX

🎭 Customizing System Prompts

You can customize the system prompts used at different stages of the research process:

Global Customization (applied to all research)

# Define custom system prompts
custom_prompts = {
    # Prompt for generating search queries
    "query_generation": """You are a venture capital analyst researching a market opportunity.
    Create specific search queries to gather competitive intelligence, market size data, 
    growth trends, and regulatory concerns...""",
    
    # Prompt for analyzing search results
    "result_analysis": """You are a venture capital analyst evaluating a potential investment.
    Extract key financial data, competitive advantages, and market positioning...""",
    
    # Prompt for generating the final report
    "report_generation": """You are a senior investment analyst at a top venture capital firm.
    Create a comprehensive investment analysis report with clear investment recommendation..."""
}

# Initialize with custom prompts
researcher = DeepResearcher(
    llm=llm,
    google_api_key="your-key",
    google_cx="your-cx-id",
    system_prompts=custom_prompts  # Apply to all research
)

Per-Query Customization

# These prompts apply only to this specific research query
academic_prompts = {
    "query_generation": """You are a scientific researcher preparing a literature review...""",
    "report_generation": """You are writing a scientific literature review for an academic journal..."""
}

result = await researcher.research(
    query="Recent advances in quantum computing",
    system_prompts=academic_prompts  # Override just for this query
)

🔍 How It Works

  1. Query Analysis: The agent breaks down the research topic into specific search queries
  2. Multiple Search Paths: Executes multiple parallel search paths to gather diverse information
  3. Content Extraction: Analyzes search results to extract key learnings and insights
  4. Recursive Exploration: Uses discovered information to generate follow-up searches for deeper exploration
  5. Report Generation: Synthesizes all findings into a cohesive research report

📋 Advanced Usage

Quick Research

For faster, less in-depth research:

result = await researcher.quick_research(
    query="Carbon capture technologies",  
    time_limit=300  # 5 minutes (default)
)

Advanced Research Parameters

result = await researcher.research(
    query="Advances in protein folding algorithms",
    breadth=5,                    # More parallel searches
    depth=3,                      # Deeper exploration
    time_limit=3600,              # Longer time limit (1 hour)
    min_learnings_required=12,    # Require more learnings
    max_searches=250,             # Allow more searches
    system_prompts=custom_prompts # Custom system prompts
)

Using with LangChain Chains and Agents

from langchain.agents import initialize_agent, Tool
from langchain_deepresearch import DeepResearcher

# Initialize a DeepResearcher instance
researcher = DeepResearcher(llm=llm)

# Create a Tool for agents
research_tool = Tool(
    name="DeepResearch",
    description="Thoroughly researches a topic and generates a comprehensive report",
    func=lambda query: researcher.research(query, breadth=3, depth=2)
)

# Use it in an agent
agent = initialize_agent([research_tool, ...], llm, agent="zero-shot-react-description")

📚 Example Use Cases

Venture Capital Analysis

vc_prompts = {
    "report_generation": """You are a senior investment analyst at a top venture capital firm.
    Create a comprehensive investment analysis report with clear investment recommendation..."""
}

result = await researcher.research(
    query="Market opportunity for carbon capture startups",
    system_prompts=vc_prompts
)

Academic Literature Review

academic_prompts = {
    "query_generation": """You are a scientific researcher preparing a literature review.
    Generate precise academic search queries that will find peer-reviewed papers...""",
    "report_generation": """You are writing a scientific literature review for an academic journal..."""
}

result = await researcher.research(
    query="Recent advances in quantum error correction",
    system_prompts=academic_prompts
)

Legal Research

legal_prompts = {
    "query_generation": """You are a legal researcher at a top law firm.
    Generate precise legal search queries designed to find relevant case law and statutes...""",
    "report_generation": """You are a senior legal associate preparing a legal memorandum..."""
}

result = await researcher.research(
    query="Legal implications of AI-generated content",
    system_prompts=legal_prompts
)

Technical Documentation

technical_prompts = {
    "query_generation": """You are a senior software engineer researching a technical topic...""",
    "report_generation": """You are writing technical documentation for a development team..."""
}

result = await researcher.research(
    query="Implementing distributed tracing in microservices",
    system_prompts=technical_prompts
)

📝 API Reference

DeepResearcher Class

DeepResearcher(
    llm,                           # LangChain model (required)
    google_api_key=None,           # Google Search API key
    google_cx=None,                # Google Custom Search CX ID
    firecrawl_api_key=None,        # Optional Firecrawl API key
    firecrawl_url=None,            # Optional Firecrawl URL
    max_time_seconds=2400,         # Max research time (40 min)
    min_research_time_seconds=180, # Min research time (3 min)
    min_learnings_required=8,      # Min insights needed
    max_searches=200,              # Max searches to perform
    verbose=False,                 # Verbose logging
    system_prompts=None            # Custom system prompts
)

Research Method

await researcher.research(
    query,                         # Research query (required)
    breadth=3,                     # Parallel searches
    depth=2,                       # Recursive depth
    time_limit=None,               # Specific time limit
    report_model=None,             # Alternative model for report
    min_learnings_required=None,   # Override min learnings
    max_searches=None,             # Override max searches
    system_prompts=None            # Query-specific prompts
)

Quick Research Method

await researcher.quick_research(
    query,                         # Research query (required)
    time_limit=300,                # 5 minutes (default)
    system_prompts=None            # Custom system prompts
)

🙋 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • This project uses LangChain for LLM integration
  • Powered by Google Programmable Search Engine for web searches

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_deepresearch-0.1.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_deepresearch-0.1.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file langchain_deepresearch-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_deepresearch-0.1.0.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for langchain_deepresearch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a79fce4cb7581fbdc92e976687e9da7a5a6110270b2763e239f664def50058b7
MD5 aacd212f1042580915cf87757b0ce16f
BLAKE2b-256 c0b57f78ceb7254d41a8910a8a033fa01d33289c19c41e3c56835fea86d851ab

See more details on using hashes here.

File details

Details for the file langchain_deepresearch-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_deepresearch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a0b72cf0d965b1096c40c4d6c3a92a941bb91d4e5afcb00b03126d28ecf6d2e
MD5 6efb8338ff005d1d32b5cfe21b1830e1
BLAKE2b-256 541430ca6b5b204a6c86f616fca2be00fde9745f5712ecd837eb9b44ef8a2ad7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page