Skip to main content

Deep research system with LangChain and LangGraph

Project description

Shandu: Advanced Research System Architecture

This directory contains the core architecture of the Shandu deep research system. Our modular design separates concerns and enables future extensibility while maintaining clean, testable code.

๐Ÿ“Š System Architecture

Shandu implements a sophisticated state-based workflow using LangGraph and LangChain to create a robust, extensible research system:

shandu/
โ”œโ”€โ”€ __init__.py           # Package initialization
โ”œโ”€โ”€ cli.py                # Command-line interface
โ”œโ”€โ”€ config.py             # Configuration management
โ”œโ”€โ”€ prompts.py            # Centralized prompt templates
โ”œโ”€โ”€ agents/               # Research agent implementations
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ agent.py          # LangChain-based agent
โ”‚   โ”œโ”€โ”€ langgraph_agent.py # LangGraph state-based agent
โ”‚   โ”œโ”€โ”€ graph/            # Graph workflow components
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ builder.py    # Graph construction
โ”‚   โ”‚   โ””โ”€โ”€ wrapper.py    # Async function wrappers
โ”‚   โ”œโ”€โ”€ nodes/            # Graph node implementations
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ initialize.py # Research initialization
โ”‚   โ”‚   โ”œโ”€โ”€ reflect.py    # Research reflection
โ”‚   โ”‚   โ”œโ”€โ”€ search.py     # Content search and analysis
โ”‚   โ”‚   โ””โ”€โ”€ ...           # Other node implementations
โ”‚   โ”œโ”€โ”€ processors/       # Content processing
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ content_processor.py # Content extraction
โ”‚   โ”‚   โ””โ”€โ”€ report_generator.py  # Report generation
โ”‚   โ””โ”€โ”€ utils/            # Agent utilities
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ agent_utils.py # Helper functions
โ”œโ”€โ”€ research/             # Research orchestration
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ researcher.py     # Result management
โ”œโ”€โ”€ scraper/              # Web scraping functionality
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ scraper.py        # Ethical web scraper
โ””โ”€โ”€ search/               # Search functionality
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ ai_search.py      # AI-powered search
    โ””โ”€โ”€ search.py         # Multi-engine search

๐Ÿ”„ LangGraph Research Workflow

Shandu's research process follows a sophisticated state-based workflow:

  1. Initialize: Define research query, parameters, and create a research plan
  2. Reflect: Analyze current findings and identify knowledge gaps
  3. Generate Queries: Create targeted search queries based on analysis
  4. Search: Execute search queries and collect results
  5. Smart Source Selection: Filter and prioritize the most valuable sources
  6. Format Citations: Prepare properly formatted citations for all sources
  7. Generate Initial Report: Create a first draft of the research report
  8. Enhance Report: Add depth, detail, and proper structure
  9. Expand Key Sections: Further develop important sections through multi-step synthesis
  10. Finalize Report: Apply final formatting and quality checks

๐Ÿง  Advanced Technical Features

State-Based Research With LangGraph

Our LangGraph implementation provides several key advantages:

  • Clear State Transitions: Each research phase has well-defined inputs and outputs
  • Conditional Logic: Dynamically determines next steps based on current state
  • Circular Flow: Supports recursive exploration until depth conditions are met
  • Parallel Processing: Handles concurrent operations for efficiency
  • Error Resilience: Continues functioning even if individual steps encounter issues

Enhanced Content Processing

Shandu implements sophisticated content processing:

  • Content Relevance Filtering: Uses AI to determine if content is relevant to the research query
  • Source Reliability Assessment: Evaluates sources for credibility and authority
  • Main Content Extraction: Identifies and extracts the primary content from web pages
  • Content Analysis Pipeline: Multi-step analysis for key information extraction
  • Theme Identification: Automatically discovers and organizes thematic elements

Advanced Report Generation

Our multi-step report generation process ensures high-quality output:

  1. Theme Extraction: Identifies key themes across all research
  2. Initial Report Generation: Creates a structured first draft
  3. Report Enhancement: Adds depth, citations, and improved organization
  4. Key Section Expansion: Further develops the most important sections
  5. Citation Management: Ensures proper attribution of all sources
  6. Final Cleanup: Removes artifacts and ensures consistent formatting

๐Ÿ’ป API Details

ResearchGraph Class

class ResearchGraph:
    """
    State-based research workflow using LangGraph.
    Provides a structured approach to deep research with multiple stages.
    """
    def __init__(
        self, 
        llm: Optional[ChatOpenAI] = None, 
        searcher: Optional[UnifiedSearcher] = None, 
        scraper: Optional[WebScraper] = None, 
        temperature: float = 0.5,
        date: Optional[str] = None
    )
    
    async def research(
        self, 
        query: str, 
        depth: int = 2, 
        breadth: int = 4, 
        progress_callback: Optional[Callable] = None,
        include_objective: bool = False,
        detail_level: str = "high" 
    ) -> ResearchResult
    
    def research_sync(
        self, 
        query: str, 
        depth: int = 2, 
        breadth: int = 4, 
        progress_callback: Optional[Callable] = None,
        include_objective: bool = False,
        detail_level: str = "high"
    ) -> ResearchResult

AISearcher Class

class AISearcher:
    """
    AI-powered search with content scraping for deeper insights.
    """
    def __init__(
        self,
        llm: Optional[ChatOpenAI] = None,
        searcher: Optional[UnifiedSearcher] = None,
        scraper: Optional[WebScraper] = None,
        max_results: int = 10,
        max_pages_to_scrape: int = 3
    )
    
    async def search(
        self, 
        query: str,
        engines: Optional[List[str]] = None,
        detailed: bool = False,
        enable_scraping: bool = True
    ) -> AISearchResult

๐Ÿ”Œ Integration Points

Shandu is designed for easy integration:

  • CLI Interface: Command-line tools for direct usage
  • Python API: Clean, well-documented API for integration into other applications
  • Extensible Components: Easy to add new search engines, scrapers, or processing steps
  • Custom LLM Support: Works with any LangChain-compatible LLM
  • Callback System: Progress tracking and event hooks

๐Ÿ” Implementation Details

Prompt Engineering

Shandu uses carefully crafted prompts for:

  • Query clarification
  • Research planning
  • Content analysis
  • Source evaluation
  • Report generation
  • Citation formatting

Async Processing

Extensive use of async/await patterns for:

  • Parallel search execution
  • Concurrent web scraping
  • Efficient content processing
  • Responsive UI updates

Caching System

Multi-level caching for:

  • Search results
  • Scraped content
  • Content analysis
  • LLM responses

๐Ÿ”ฌ Research Algorithm

Our research algorithm optimizes for:

  1. Breadth: Exploring multiple relevant sub-topics
  2. Depth: Drilling down into important details
  3. Convergence: Focusing on the most relevant information
  4. Coverage: Ensuring comprehensive topic exploration
  5. Source Quality: Prioritizing reliable, authoritative sources
  6. Synthesis: Creating coherent, well-structured reports

For more information on using Shandu, see the main README.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shandu-1.5.2.tar.gz (114.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shandu-1.5.2-py3-none-any.whl (94.7 kB view details)

Uploaded Python 3

File details

Details for the file shandu-1.5.2.tar.gz.

File metadata

  • Download URL: shandu-1.5.2.tar.gz
  • Upload date:
  • Size: 114.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shandu-1.5.2.tar.gz
Algorithm Hash digest
SHA256 f474c78e059bd308266845e3de6dced71e131bd99395029a0659d8c0d9e04f8d
MD5 0e89141631d7ae58532e3547f4b304b1
BLAKE2b-256 eefd87dec48e1216ffda01d08b7256369727c2a8ec42ac10b85bbc9aa8b79c5b

See more details on using hashes here.

File details

Details for the file shandu-1.5.2-py3-none-any.whl.

File metadata

  • Download URL: shandu-1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 94.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shandu-1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2320805a5c4c9a355fa5326b3f460d1cd5ac43ea69eb58d402526c3497759895
MD5 6ebad8f2213b13d81d9423687a3fce4e
BLAKE2b-256 eb8e1fae6f9eef30b3b78abe2659617264846b10f1f54d62e46b328dbde27d56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page