Autonomous research capabilities for LangChain models
Project description
LangChain DeepResearch
A LangChain integration that provides autonomous, recursive research capabilities powered by any LangChain-compatible LLM.
🔍 Features
- Model Agnostic: Works with any LangChain-compatible LLM (OpenAI, Anthropic, Llama, etc.)
- Time-Aware Research: Automatically manages research time constraints
- Recursive Exploration: Follows leads discovered during research to explore topics in depth
- Adaptive Strategies: Adjusts research breadth and depth based on time and findings
- Comprehensive Reporting: Generates structured research reports with citations
- Customizable System Prompts: Control the personality, focus, and output style at each stage of research
📦 Installation
pip install langchain-deepresearch
Or install from source:
git clone https://github.com/doganarif/langchain-deepresearch.git
cd langchain-deepresearch
pip install -e .
🚀 Quick Start
import asyncio
from langchain_openai import ChatOpenAI
from langchain_deepresearch import DeepResearcher
async def main():
# Initialize any LangChain model
llm = ChatOpenAI(model="gpt-3.5-turbo")
# Create the researcher with your LangChain model
researcher = DeepResearcher(
llm=llm,
google_api_key="your-google-api-key", # Or use environment variable
google_cx="your-google-cx-id" # Or use environment variable
)
# Run the research
result = await researcher.research(
query="Latest advancements in fusion energy",
breadth=3, # Number of parallel searches
depth=2 # Depth of recursive exploration
)
# Access results
if result["success"]:
print(result["report"]) # Markdown report
print(f"Sources consulted: {len(result['visited_urls'])}")
print(f"Insights gathered: {len(result['learnings'])}")
if __name__ == "__main__":
asyncio.run(main())
🔄 Works with Any LangChain LLM
Use it with any LangChain-compatible model:
# With OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")
researcher = DeepResearcher(llm=llm)
# With Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-opus-20240229")
researcher = DeepResearcher(llm=llm)
# With Hugging Face models
from langchain_huggingface import HuggingFaceEndpoint
llm = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.2")
researcher = DeepResearcher(llm=llm)
⚙️ Configuration
You'll need to configure search API keys:
# Configure with explicit parameters
researcher = DeepResearcher(
llm=llm,
google_api_key="your-google-api-key",
google_cx="your-google-cx-id",
max_time_seconds=2400, # 40 minutes (default)
min_learnings_required=8 # Minimum insights before completion
)
# Or use environment variables
# GOOGLE_API_KEY
# GOOGLE_CX
🎭 Customizing System Prompts
You can customize the system prompts used at different stages of the research process:
Global Customization (applied to all research)
# Define custom system prompts
custom_prompts = {
# Prompt for generating search queries
"query_generation": """You are a venture capital analyst researching a market opportunity.
Create specific search queries to gather competitive intelligence, market size data,
growth trends, and regulatory concerns...""",
# Prompt for analyzing search results
"result_analysis": """You are a venture capital analyst evaluating a potential investment.
Extract key financial data, competitive advantages, and market positioning...""",
# Prompt for generating the final report
"report_generation": """You are a senior investment analyst at a top venture capital firm.
Create a comprehensive investment analysis report with clear investment recommendation..."""
}
# Initialize with custom prompts
researcher = DeepResearcher(
llm=llm,
google_api_key="your-key",
google_cx="your-cx-id",
system_prompts=custom_prompts # Apply to all research
)
Per-Query Customization
# These prompts apply only to this specific research query
academic_prompts = {
"query_generation": """You are a scientific researcher preparing a literature review...""",
"report_generation": """You are writing a scientific literature review for an academic journal..."""
}
result = await researcher.research(
query="Recent advances in quantum computing",
system_prompts=academic_prompts # Override just for this query
)
🔍 How It Works
- Query Analysis: The agent breaks down the research topic into specific search queries
- Multiple Search Paths: Executes multiple parallel search paths to gather diverse information
- Content Extraction: Analyzes search results to extract key learnings and insights
- Recursive Exploration: Uses discovered information to generate follow-up searches for deeper exploration
- Report Generation: Synthesizes all findings into a cohesive research report
📋 Advanced Usage
Quick Research
For faster, less in-depth research:
result = await researcher.quick_research(
query="Carbon capture technologies",
time_limit=300 # 5 minutes (default)
)
Advanced Research Parameters
result = await researcher.research(
query="Advances in protein folding algorithms",
breadth=5, # More parallel searches
depth=3, # Deeper exploration
time_limit=3600, # Longer time limit (1 hour)
min_learnings_required=12, # Require more learnings
max_searches=250, # Allow more searches
system_prompts=custom_prompts # Custom system prompts
)
Using with LangChain Chains and Agents
from langchain.agents import initialize_agent, Tool
from langchain_deepresearch import DeepResearcher
# Initialize a DeepResearcher instance
researcher = DeepResearcher(llm=llm)
# Create a Tool for agents
research_tool = Tool(
name="DeepResearch",
description="Thoroughly researches a topic and generates a comprehensive report",
func=lambda query: researcher.research(query, breadth=3, depth=2)
)
# Use it in an agent
agent = initialize_agent([research_tool, ...], llm, agent="zero-shot-react-description")
📚 Example Use Cases
Venture Capital Analysis
vc_prompts = {
"report_generation": """You are a senior investment analyst at a top venture capital firm.
Create a comprehensive investment analysis report with clear investment recommendation..."""
}
result = await researcher.research(
query="Market opportunity for carbon capture startups",
system_prompts=vc_prompts
)
Academic Literature Review
academic_prompts = {
"query_generation": """You are a scientific researcher preparing a literature review.
Generate precise academic search queries that will find peer-reviewed papers...""",
"report_generation": """You are writing a scientific literature review for an academic journal..."""
}
result = await researcher.research(
query="Recent advances in quantum error correction",
system_prompts=academic_prompts
)
Legal Research
legal_prompts = {
"query_generation": """You are a legal researcher at a top law firm.
Generate precise legal search queries designed to find relevant case law and statutes...""",
"report_generation": """You are a senior legal associate preparing a legal memorandum..."""
}
result = await researcher.research(
query="Legal implications of AI-generated content",
system_prompts=legal_prompts
)
Technical Documentation
technical_prompts = {
"query_generation": """You are a senior software engineer researching a technical topic...""",
"report_generation": """You are writing technical documentation for a development team..."""
}
result = await researcher.research(
query="Implementing distributed tracing in microservices",
system_prompts=technical_prompts
)
📝 API Reference
DeepResearcher Class
DeepResearcher(
llm, # LangChain model (required)
google_api_key=None, # Google Search API key
google_cx=None, # Google Custom Search CX ID
firecrawl_api_key=None, # Optional Firecrawl API key
firecrawl_url=None, # Optional Firecrawl URL
max_time_seconds=2400, # Max research time (40 min)
min_research_time_seconds=180, # Min research time (3 min)
min_learnings_required=8, # Min insights needed
max_searches=200, # Max searches to perform
verbose=False, # Verbose logging
system_prompts=None # Custom system prompts
)
Research Method
await researcher.research(
query, # Research query (required)
breadth=3, # Parallel searches
depth=2, # Recursive depth
time_limit=None, # Specific time limit
report_model=None, # Alternative model for report
min_learnings_required=None, # Override min learnings
max_searches=None, # Override max searches
system_prompts=None # Query-specific prompts
)
Quick Research Method
await researcher.quick_research(
query, # Research query (required)
time_limit=300, # 5 minutes (default)
system_prompts=None # Custom system prompts
)
🙋 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- This project uses LangChain for LLM integration
- Powered by Google Programmable Search Engine for web searches
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_deepresearch-0.1.0.tar.gz.
File metadata
- Download URL: langchain_deepresearch-0.1.0.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a79fce4cb7581fbdc92e976687e9da7a5a6110270b2763e239f664def50058b7
|
|
| MD5 |
aacd212f1042580915cf87757b0ce16f
|
|
| BLAKE2b-256 |
c0b57f78ceb7254d41a8910a8a033fa01d33289c19c41e3c56835fea86d851ab
|
File details
Details for the file langchain_deepresearch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_deepresearch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a0b72cf0d965b1096c40c4d6c3a92a941bb91d4e5afcb00b03126d28ecf6d2e
|
|
| MD5 |
6efb8338ff005d1d32b5cfe21b1830e1
|
|
| BLAKE2b-256 |
541430ca6b5b204a6c86f616fca2be00fde9745f5712ecd837eb9b44ef8a2ad7
|