Deep research system with LangChain and LangGraph

These details have not been verified by PyPI

Project links

Project description

Shandu: Advanced Research System Architecture

This directory contains the core architecture of the Shandu deep research system. Our modular design separates concerns and enables future extensibility while maintaining clean, testable code.

📊 System Architecture

Shandu implements a sophisticated state-based workflow using LangGraph and LangChain to create a robust, extensible research system:

shandu/
├── __init__.py           # Package initialization
├── cli.py                # Command-line interface
├── config.py             # Configuration management
├── prompts.py            # Centralized prompt templates
├── agents/               # Research agent implementations
│   ├── __init__.py
│   ├── agent.py          # LangChain-based agent
│   ├── langgraph_agent.py # LangGraph state-based agent
│   ├── graph/            # Graph workflow components
│   │   ├── __init__.py
│   │   ├── builder.py    # Graph construction
│   │   └── wrapper.py    # Async function wrappers
│   ├── nodes/            # Graph node implementations
│   │   ├── __init__.py
│   │   ├── initialize.py # Research initialization
│   │   ├── reflect.py    # Research reflection
│   │   ├── search.py     # Content search and analysis
│   │   └── ...           # Other node implementations
│   ├── processors/       # Content processing
│   │   ├── __init__.py
│   │   ├── content_processor.py # Content extraction
│   │   └── report_generator.py  # Report generation
│   └── utils/            # Agent utilities
│       ├── __init__.py
│       └── agent_utils.py # Helper functions
├── research/             # Research orchestration
│   ├── __init__.py
│   └── researcher.py     # Result management
├── scraper/              # Web scraping functionality
│   ├── __init__.py
│   └── scraper.py        # Ethical web scraper
└── search/               # Search functionality
    ├── __init__.py
    ├── ai_search.py      # AI-powered search
    └── search.py         # Multi-engine search

🔄 LangGraph Research Workflow

Shandu's research process follows a sophisticated state-based workflow:

Initialize: Define research query, parameters, and create a research plan
Reflect: Analyze current findings and identify knowledge gaps
Generate Queries: Create targeted search queries based on analysis
Search: Execute search queries and collect results
Smart Source Selection: Filter and prioritize the most valuable sources
Format Citations: Prepare properly formatted citations for all sources
Generate Initial Report: Create a first draft of the research report
Enhance Report: Add depth, detail, and proper structure
Expand Key Sections: Further develop important sections through multi-step synthesis
Finalize Report: Apply final formatting and quality checks

🧠 Advanced Technical Features

State-Based Research With LangGraph

Our LangGraph implementation provides several key advantages:

Clear State Transitions: Each research phase has well-defined inputs and outputs
Conditional Logic: Dynamically determines next steps based on current state
Circular Flow: Supports recursive exploration until depth conditions are met
Parallel Processing: Handles concurrent operations for efficiency
Error Resilience: Continues functioning even if individual steps encounter issues

Enhanced Content Processing

Shandu implements sophisticated content processing:

Content Relevance Filtering: Uses AI to determine if content is relevant to the research query
Source Reliability Assessment: Evaluates sources for credibility and authority
Main Content Extraction: Identifies and extracts the primary content from web pages
Content Analysis Pipeline: Multi-step analysis for key information extraction
Theme Identification: Automatically discovers and organizes thematic elements

Advanced Report Generation

Our multi-step report generation process ensures high-quality output:

Theme Extraction: Identifies key themes across all research
Initial Report Generation: Creates a structured first draft
Report Enhancement: Adds depth, citations, and improved organization
Key Section Expansion: Further develops the most important sections
Citation Management: Ensures proper attribution of all sources
Final Cleanup: Removes artifacts and ensures consistent formatting

💻 API Details

ResearchGraph Class

class ResearchGraph:
    """
    State-based research workflow using LangGraph.
    Provides a structured approach to deep research with multiple stages.
    """
    def __init__(
        self, 
        llm: Optional[ChatOpenAI] = None, 
        searcher: Optional[UnifiedSearcher] = None, 
        scraper: Optional[WebScraper] = None, 
        temperature: float = 0.5,
        date: Optional[str] = None
    )
    
    async def research(
        self, 
        query: str, 
        depth: int = 2, 
        breadth: int = 4, 
        progress_callback: Optional[Callable] = None,
        include_objective: bool = False,
        detail_level: str = "high" 
    ) -> ResearchResult
    
    def research_sync(
        self, 
        query: str, 
        depth: int = 2, 
        breadth: int = 4, 
        progress_callback: Optional[Callable] = None,
        include_objective: bool = False,
        detail_level: str = "high"
    ) -> ResearchResult

AISearcher Class

class AISearcher:
    """
    AI-powered search with content scraping for deeper insights.
    """
    def __init__(
        self,
        llm: Optional[ChatOpenAI] = None,
        searcher: Optional[UnifiedSearcher] = None,
        scraper: Optional[WebScraper] = None,
        max_results: int = 10,
        max_pages_to_scrape: int = 3
    )
    
    async def search(
        self, 
        query: str,
        engines: Optional[List[str]] = None,
        detailed: bool = False,
        enable_scraping: bool = True
    ) -> AISearchResult

🔌 Integration Points

Shandu is designed for easy integration:

CLI Interface: Command-line tools for direct usage
Python API: Clean, well-documented API for integration into other applications
Extensible Components: Easy to add new search engines, scrapers, or processing steps
Custom LLM Support: Works with any LangChain-compatible LLM
Callback System: Progress tracking and event hooks

🔍 Implementation Details

Prompt Engineering

Shandu uses carefully crafted prompts for:

Query clarification
Research planning
Content analysis
Source evaluation
Report generation
Citation formatting

Async Processing

Extensive use of async/await patterns for:

Parallel search execution
Concurrent web scraping
Efficient content processing
Responsive UI updates

Caching System

Multi-level caching for:

Search results
Scraped content
Content analysis
LLM responses

🔬 Research Algorithm

Our research algorithm optimizes for:

Breadth: Exploring multiple relevant sub-topics
Depth: Drilling down into important details
Convergence: Focusing on the most relevant information
Coverage: Ensuring comprehensive topic exploration
Source Quality: Prioritizing reliable, authoritative sources
Synthesis: Creating coherent, well-structured reports

For more information on using Shandu, see the main README.md file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.9

May 20, 2026

3.0.8

May 2, 2026

3.0.7

May 2, 2026

3.0.6

Mar 2, 2026

3.0.5

Mar 2, 2026

3.0.4

Mar 1, 2026

3.0.3

Feb 21, 2026

3.0.2

Feb 21, 2026

3.0.1

Feb 21, 2026

3.0.0

Feb 21, 2026

This version

2.0.0

May 1, 2025

1.5.2

Mar 15, 2025

1.0.0

Mar 3, 2025

0.1.1

Feb 26, 2025

0.1.0

Feb 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shandu-2.0.0.tar.gz (138.7 kB view details)

Uploaded May 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shandu-2.0.0-py3-none-any.whl (98.7 kB view details)

Uploaded May 1, 2025 Python 3

File details

Details for the file shandu-2.0.0.tar.gz.

File metadata

Download URL: shandu-2.0.0.tar.gz
Upload date: May 1, 2025
Size: 138.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shandu-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`704ddf14a598e2c9c53daeacf7d4be0a5b450df0e60c3279e91bd6f86283639e`
MD5	`0afd04176d51b9eb303197599e464253`
BLAKE2b-256	`8f37c2c181c7faf3088e060fb42da918b912447dbc5d4cba141eb32e11dd5bc7`

See more details on using hashes here.

File details

Details for the file shandu-2.0.0-py3-none-any.whl.

File metadata

Download URL: shandu-2.0.0-py3-none-any.whl
Upload date: May 1, 2025
Size: 98.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shandu-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e257e933ba0bf0330c39db10c782f58009c448a354d4ea7eac32d00707e65400`
MD5	`97399e9c7f5c4c9e42240a84b29ae231`
BLAKE2b-256	`d80ad5103834c858eb268dad665074830b550ca35f1032b24ae87782c6da9121`

See more details on using hashes here.

shandu 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Shandu: Advanced Research System Architecture

📊 System Architecture

🔄 LangGraph Research Workflow

🧠 Advanced Technical Features

State-Based Research With LangGraph

Enhanced Content Processing

Advanced Report Generation

💻 API Details

ResearchGraph Class

AISearcher Class

🔌 Integration Points

🔍 Implementation Details

Prompt Engineering

Async Processing

Caching System

🔬 Research Algorithm

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes