Deep research system with LangChain and LangGraph
Project description
Shandu: Advanced Research System Architecture
This directory contains the core architecture of the Shandu deep research system. Our modular design separates concerns and enables future extensibility while maintaining clean, testable code.
๐ System Architecture
Shandu implements a sophisticated state-based workflow using LangGraph and LangChain to create a robust, extensible research system:
shandu/
โโโ __init__.py # Package initialization
โโโ cli.py # Command-line interface
โโโ config.py # Configuration management
โโโ prompts.py # Centralized prompt templates
โโโ agents/ # Research agent implementations
โ โโโ __init__.py
โ โโโ agent.py # LangChain-based agent
โ โโโ langgraph_agent.py # LangGraph state-based agent
โ โโโ graph/ # Graph workflow components
โ โ โโโ __init__.py
โ โ โโโ builder.py # Graph construction
โ โ โโโ wrapper.py # Async function wrappers
โ โโโ nodes/ # Graph node implementations
โ โ โโโ __init__.py
โ โ โโโ initialize.py # Research initialization
โ โ โโโ reflect.py # Research reflection
โ โ โโโ search.py # Content search and analysis
โ โ โโโ ... # Other node implementations
โ โโโ processors/ # Content processing
โ โ โโโ __init__.py
โ โ โโโ content_processor.py # Content extraction
โ โ โโโ report_generator.py # Report generation
โ โโโ utils/ # Agent utilities
โ โโโ __init__.py
โ โโโ agent_utils.py # Helper functions
โโโ research/ # Research orchestration
โ โโโ __init__.py
โ โโโ researcher.py # Result management
โโโ scraper/ # Web scraping functionality
โ โโโ __init__.py
โ โโโ scraper.py # Ethical web scraper
โโโ search/ # Search functionality
โโโ __init__.py
โโโ ai_search.py # AI-powered search
โโโ search.py # Multi-engine search
๐ LangGraph Research Workflow
Shandu's research process follows a sophisticated state-based workflow:
- Initialize: Define research query, parameters, and create a research plan
- Reflect: Analyze current findings and identify knowledge gaps
- Generate Queries: Create targeted search queries based on analysis
- Search: Execute search queries and collect results
- Smart Source Selection: Filter and prioritize the most valuable sources
- Format Citations: Prepare properly formatted citations for all sources
- Generate Initial Report: Create a first draft of the research report
- Enhance Report: Add depth, detail, and proper structure
- Expand Key Sections: Further develop important sections through multi-step synthesis
- Finalize Report: Apply final formatting and quality checks
๐ง Advanced Technical Features
State-Based Research With LangGraph
Our LangGraph implementation provides several key advantages:
- Clear State Transitions: Each research phase has well-defined inputs and outputs
- Conditional Logic: Dynamically determines next steps based on current state
- Circular Flow: Supports recursive exploration until depth conditions are met
- Parallel Processing: Handles concurrent operations for efficiency
- Error Resilience: Continues functioning even if individual steps encounter issues
Enhanced Content Processing
Shandu implements sophisticated content processing:
- Content Relevance Filtering: Uses AI to determine if content is relevant to the research query
- Source Reliability Assessment: Evaluates sources for credibility and authority
- Main Content Extraction: Identifies and extracts the primary content from web pages
- Content Analysis Pipeline: Multi-step analysis for key information extraction
- Theme Identification: Automatically discovers and organizes thematic elements
Advanced Report Generation
Our multi-step report generation process ensures high-quality output:
- Theme Extraction: Identifies key themes across all research
- Initial Report Generation: Creates a structured first draft
- Report Enhancement: Adds depth, citations, and improved organization
- Key Section Expansion: Further develops the most important sections
- Citation Management: Ensures proper attribution of all sources
- Final Cleanup: Removes artifacts and ensures consistent formatting
๐ป API Details
ResearchGraph Class
class ResearchGraph:
"""
State-based research workflow using LangGraph.
Provides a structured approach to deep research with multiple stages.
"""
def __init__(
self,
llm: Optional[ChatOpenAI] = None,
searcher: Optional[UnifiedSearcher] = None,
scraper: Optional[WebScraper] = None,
temperature: float = 0.5,
date: Optional[str] = None
)
async def research(
self,
query: str,
depth: int = 2,
breadth: int = 4,
progress_callback: Optional[Callable] = None,
include_objective: bool = False,
detail_level: str = "high"
) -> ResearchResult
def research_sync(
self,
query: str,
depth: int = 2,
breadth: int = 4,
progress_callback: Optional[Callable] = None,
include_objective: bool = False,
detail_level: str = "high"
) -> ResearchResult
AISearcher Class
class AISearcher:
"""
AI-powered search with content scraping for deeper insights.
"""
def __init__(
self,
llm: Optional[ChatOpenAI] = None,
searcher: Optional[UnifiedSearcher] = None,
scraper: Optional[WebScraper] = None,
max_results: int = 10,
max_pages_to_scrape: int = 3
)
async def search(
self,
query: str,
engines: Optional[List[str]] = None,
detailed: bool = False,
enable_scraping: bool = True
) -> AISearchResult
๐ Integration Points
Shandu is designed for easy integration:
- CLI Interface: Command-line tools for direct usage
- Python API: Clean, well-documented API for integration into other applications
- Extensible Components: Easy to add new search engines, scrapers, or processing steps
- Custom LLM Support: Works with any LangChain-compatible LLM
- Callback System: Progress tracking and event hooks
๐ Implementation Details
Prompt Engineering
Shandu uses carefully crafted prompts for:
- Query clarification
- Research planning
- Content analysis
- Source evaluation
- Report generation
- Citation formatting
Async Processing
Extensive use of async/await patterns for:
- Parallel search execution
- Concurrent web scraping
- Efficient content processing
- Responsive UI updates
Caching System
Multi-level caching for:
- Search results
- Scraped content
- Content analysis
- LLM responses
๐ฌ Research Algorithm
Our research algorithm optimizes for:
- Breadth: Exploring multiple relevant sub-topics
- Depth: Drilling down into important details
- Convergence: Focusing on the most relevant information
- Coverage: Ensuring comprehensive topic exploration
- Source Quality: Prioritizing reliable, authoritative sources
- Synthesis: Creating coherent, well-structured reports
For more information on using Shandu, see the main README.md file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shandu-1.5.2.tar.gz.
File metadata
- Download URL: shandu-1.5.2.tar.gz
- Upload date:
- Size: 114.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f474c78e059bd308266845e3de6dced71e131bd99395029a0659d8c0d9e04f8d
|
|
| MD5 |
0e89141631d7ae58532e3547f4b304b1
|
|
| BLAKE2b-256 |
eefd87dec48e1216ffda01d08b7256369727c2a8ec42ac10b85bbc9aa8b79c5b
|
File details
Details for the file shandu-1.5.2-py3-none-any.whl.
File metadata
- Download URL: shandu-1.5.2-py3-none-any.whl
- Upload date:
- Size: 94.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2320805a5c4c9a355fa5326b3f460d1cd5ac43ea69eb58d402526c3497759895
|
|
| MD5 |
6ebad8f2213b13d81d9423687a3fce4e
|
|
| BLAKE2b-256 |
eb8e1fae6f9eef30b3b78abe2659617264846b10f1f54d62e46b328dbde27d56
|