Skip to main content

Deep research system with LangChain and LangGraph

Project description

Shandu: Advanced Research System Architecture

This directory contains the core architecture of the Shandu deep research system. Our modular design separates concerns and enables future extensibility while maintaining clean, testable code.

๐Ÿ“Š System Architecture

Shandu implements a sophisticated state-based workflow using LangGraph and LangChain to create a robust, extensible research system:

shandu/
โ”œโ”€โ”€ __init__.py           # Package initialization
โ”œโ”€โ”€ cli.py                # Command-line interface
โ”œโ”€โ”€ config.py             # Configuration management
โ”œโ”€โ”€ prompts.py            # Centralized prompt templates
โ”œโ”€โ”€ agents/               # Research agent implementations
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ agent.py          # LangChain-based agent
โ”‚   โ”œโ”€โ”€ langgraph_agent.py # LangGraph state-based agent
โ”‚   โ”œโ”€โ”€ graph/            # Graph workflow components
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ builder.py    # Graph construction
โ”‚   โ”‚   โ””โ”€โ”€ wrapper.py    # Async function wrappers
โ”‚   โ”œโ”€โ”€ nodes/            # Graph node implementations
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ initialize.py # Research initialization
โ”‚   โ”‚   โ”œโ”€โ”€ reflect.py    # Research reflection
โ”‚   โ”‚   โ”œโ”€โ”€ search.py     # Content search and analysis
โ”‚   โ”‚   โ””โ”€โ”€ ...           # Other node implementations
โ”‚   โ”œโ”€โ”€ processors/       # Content processing
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ content_processor.py # Content extraction
โ”‚   โ”‚   โ””โ”€โ”€ report_generator.py  # Report generation
โ”‚   โ””โ”€โ”€ utils/            # Agent utilities
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ agent_utils.py # Helper functions
โ”œโ”€โ”€ research/             # Research orchestration
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ researcher.py     # Result management
โ”œโ”€โ”€ scraper/              # Web scraping functionality
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ scraper.py        # Ethical web scraper
โ””โ”€โ”€ search/               # Search functionality
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ ai_search.py      # AI-powered search
    โ””โ”€โ”€ search.py         # Multi-engine search

๐Ÿ”„ LangGraph Research Workflow

Shandu's research process follows a sophisticated state-based workflow:

  1. Initialize: Define research query, parameters, and create a research plan
  2. Reflect: Analyze current findings and identify knowledge gaps
  3. Generate Queries: Create targeted search queries based on analysis
  4. Search: Execute search queries and collect results
  5. Smart Source Selection: Filter and prioritize the most valuable sources
  6. Format Citations: Prepare properly formatted citations for all sources
  7. Generate Initial Report: Create a first draft of the research report
  8. Enhance Report: Add depth, detail, and proper structure
  9. Expand Key Sections: Further develop important sections through multi-step synthesis
  10. Finalize Report: Apply final formatting and quality checks

๐Ÿง  Advanced Technical Features

State-Based Research With LangGraph

Our LangGraph implementation provides several key advantages:

  • Clear State Transitions: Each research phase has well-defined inputs and outputs
  • Conditional Logic: Dynamically determines next steps based on current state
  • Circular Flow: Supports recursive exploration until depth conditions are met
  • Parallel Processing: Handles concurrent operations for efficiency
  • Error Resilience: Continues functioning even if individual steps encounter issues

Enhanced Content Processing

Shandu implements sophisticated content processing:

  • Content Relevance Filtering: Uses AI to determine if content is relevant to the research query
  • Source Reliability Assessment: Evaluates sources for credibility and authority
  • Main Content Extraction: Identifies and extracts the primary content from web pages
  • Content Analysis Pipeline: Multi-step analysis for key information extraction
  • Theme Identification: Automatically discovers and organizes thematic elements

Advanced Report Generation

Our multi-step report generation process ensures high-quality output:

  1. Theme Extraction: Identifies key themes across all research
  2. Initial Report Generation: Creates a structured first draft
  3. Report Enhancement: Adds depth, citations, and improved organization
  4. Key Section Expansion: Further develops the most important sections
  5. Citation Management: Ensures proper attribution of all sources
  6. Final Cleanup: Removes artifacts and ensures consistent formatting

๐Ÿ’ป API Details

ResearchGraph Class

class ResearchGraph:
    """
    State-based research workflow using LangGraph.
    Provides a structured approach to deep research with multiple stages.
    """
    def __init__(
        self, 
        llm: Optional[ChatOpenAI] = None, 
        searcher: Optional[UnifiedSearcher] = None, 
        scraper: Optional[WebScraper] = None, 
        temperature: float = 0.5,
        date: Optional[str] = None
    )
    
    async def research(
        self, 
        query: str, 
        depth: int = 2, 
        breadth: int = 4, 
        progress_callback: Optional[Callable] = None,
        include_objective: bool = False,
        detail_level: str = "high" 
    ) -> ResearchResult
    
    def research_sync(
        self, 
        query: str, 
        depth: int = 2, 
        breadth: int = 4, 
        progress_callback: Optional[Callable] = None,
        include_objective: bool = False,
        detail_level: str = "high"
    ) -> ResearchResult

AISearcher Class

class AISearcher:
    """
    AI-powered search with content scraping for deeper insights.
    """
    def __init__(
        self,
        llm: Optional[ChatOpenAI] = None,
        searcher: Optional[UnifiedSearcher] = None,
        scraper: Optional[WebScraper] = None,
        max_results: int = 10,
        max_pages_to_scrape: int = 3
    )
    
    async def search(
        self, 
        query: str,
        engines: Optional[List[str]] = None,
        detailed: bool = False,
        enable_scraping: bool = True
    ) -> AISearchResult

๐Ÿ”Œ Integration Points

Shandu is designed for easy integration:

  • CLI Interface: Command-line tools for direct usage
  • Python API: Clean, well-documented API for integration into other applications
  • Extensible Components: Easy to add new search engines, scrapers, or processing steps
  • Custom LLM Support: Works with any LangChain-compatible LLM
  • Callback System: Progress tracking and event hooks

๐Ÿ” Implementation Details

Prompt Engineering

Shandu uses carefully crafted prompts for:

  • Query clarification
  • Research planning
  • Content analysis
  • Source evaluation
  • Report generation
  • Citation formatting

Async Processing

Extensive use of async/await patterns for:

  • Parallel search execution
  • Concurrent web scraping
  • Efficient content processing
  • Responsive UI updates

Caching System

Multi-level caching for:

  • Search results
  • Scraped content
  • Content analysis
  • LLM responses

๐Ÿ”ฌ Research Algorithm

Our research algorithm optimizes for:

  1. Breadth: Exploring multiple relevant sub-topics
  2. Depth: Drilling down into important details
  3. Convergence: Focusing on the most relevant information
  4. Coverage: Ensuring comprehensive topic exploration
  5. Source Quality: Prioritizing reliable, authoritative sources
  6. Synthesis: Creating coherent, well-structured reports

For more information on using Shandu, see the main README.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shandu-2.0.0.tar.gz (138.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shandu-2.0.0-py3-none-any.whl (98.7 kB view details)

Uploaded Python 3

File details

Details for the file shandu-2.0.0.tar.gz.

File metadata

  • Download URL: shandu-2.0.0.tar.gz
  • Upload date:
  • Size: 138.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shandu-2.0.0.tar.gz
Algorithm Hash digest
SHA256 704ddf14a598e2c9c53daeacf7d4be0a5b450df0e60c3279e91bd6f86283639e
MD5 0afd04176d51b9eb303197599e464253
BLAKE2b-256 8f37c2c181c7faf3088e060fb42da918b912447dbc5d4cba141eb32e11dd5bc7

See more details on using hashes here.

File details

Details for the file shandu-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: shandu-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 98.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shandu-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e257e933ba0bf0330c39db10c782f58009c448a354d4ea7eac32d00707e65400
MD5 97399e9c7f5c4c9e42240a84b29ae231
BLAKE2b-256 d80ad5103834c858eb268dad665074830b550ca35f1032b24ae87782c6da9121

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page