Skip to main content

Web Information Retrieval Assistant - aggregate, analyze, and brief on web information

Project description

ShouChao (手抄) - Web Information Retrieval Assistant

Aggregates news from 100+ major media sources across 10 languages, converts articles to structured markdown, indexes them into a ChromaDB knowledge base, and provides AI-powered briefings and analysis for investment, immigration, and study abroad scenarios.

Features

  • 10-Language Coverage: Chinese, English, Japanese, French, Russian, German, Italian, Spanish, Portuguese, Korean
  • 100+ News Sources: Reuters, BBC, NHK, Le Monde, TASS, DW, ANSA, El Pais, Folha, Yonhap, and many more
  • Multiple Reader Backends: requests, curl_cffi, DrissionPage, Playwright with human-like browsing behavior
  • RSS + Web Reading: RSS feeds for efficient discovery, web reading for full articles
  • Markdown Storage: Articles saved as {lang}/{site}/{date}/{title}.md with YAML front matter
  • ChromaDB Knowledge Base: GangDan-compatible vector database for semantic search
  • AI Analysis: Investment, immigration, study abroad, and general news analysis via Ollama
  • News Briefings: Daily, weekly, and domain-specific briefings with LLM summarization
  • Three Interfaces: CLI, GUI (tkinter), and Web (Flask) dashboard
  • i18n: Full 10-language UI support
  • Stock Market Treemap: Real-time global market heatmap (A-Share, HK, US) with sector visualization

Requirements

  • Python >= 3.10
  • Ollama (for AI features: analysis, briefings, semantic search)

Installation

pip install shouchao

Or install from source:

git clone https://github.com/cycleuser/ShouChao.git
cd ShouChao
pip install -e .

Optional dependencies

pip install shouchao[all]        # All optional modules + readability
pip install shouchao[curl]       # curl_cffi for better browser simulation
pip install shouchao[browser]    # DrissionPage (system Chrome)
pip install shouchao[readability] # Better content extraction

Quick Start

# List available news sources
shouchao sources --language en

# Fetch news
shouchao fetch --language en --max-articles 50

# Generate daily briefing
shouchao briefing --language zh --output daily_briefing.md

# Start web dashboard
shouchao web

Stock Market Heatmap

# Open stock market treemap visualization
# Access at http://localhost:5000/market
shouchao web
# Then navigate to /market or click "股市热力图" in sidebar

The heatmap shows:

  • A-Share (Shanghai/Shenzhen): 30+ sectors, real-time data from East Money API
  • HK Stocks: Major sectors and companies
  • US Stocks: NASDAQ/NYSE tech giants and major indices
  • Global View: Combined view of all markets

Features:

  • Color-coded performance (Red=Up, Green=Down - China convention)
  • Market-cap weighted sizing
  • Sector grouping
  • Click for stock details
  • Real-time refresh

Usage

CLI Options

Command Description
shouchao fetch Fetch news from sources
shouchao search "query" Search indexed news
shouchao briefing Generate news briefings
shouchao analyze "query" Analyze news for scenarios
shouchao index Index articles into ChromaDB
shouchao sources List/manage news sources
shouchao config View/update configuration
shouchao web Start Flask web server
shouchao gui Launch tkinter GUI

Global Flags

Flag Description
-V, --version Show version
-v, --verbose Verbose output
--json JSON output
-q, --quiet Suppress non-essential output
--data-dir PATH Custom data directory

Fetch Examples

shouchao fetch --language zh --max 20              # Chinese news
shouchao fetch --language en --source "Reuters"     # Specific source
shouchao fetch --fetcher curl                       # Use curl_cffi backend
shouchao fetch --language ja,ko --max 5             # Multiple languages

Analysis Scenarios

shouchao analyze "Impact of new EU AI Act" --scenario investment
shouchao analyze "Canada immigration policy 2026" --scenario immigration
shouchao analyze "UK university tuition changes" --scenario study_abroad
shouchao analyze "Global semiconductor trends" --scenario general

Python API

from shouchao import fetch_news, search_news, analyze_news, list_sources

# List sources
result = list_sources(language="en")
print(result.data["count"])  # Number of English sources

# Fetch news
result = fetch_news(language="en", max_articles=10)
print(result.data["fetched"])  # Articles fetched

# Search
result = search_news(query="climate change", top_k=5)
for r in result.data["results"]:
    print(r["metadata"]["title"])

# Analyze
result = analyze_news(query="market trends", scenario="investment")
print(result.data["content"])

Agent Integration (OpenAI Function Calling)

ShouChao exposes OpenAI-compatible tools for LLM agents:

from shouchao.tools import TOOLS, dispatch

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=TOOLS,
)

result = dispatch(
    tool_call.function.name,
    tool_call.function.arguments,
)

CLI Help

CLI Help

Project Structure

shouchao/
├── core/
│   ├── config.py        # Configuration management
│   ├── sources.py       # 100+ news source registry
│   ├── fetcher.py       # HTTP fetcher backends
│   ├── rss.py           # RSS/Atom feed parser
│   ├── converter.py     # HTML-to-Markdown pipeline
│   ├── storage.py       # Article file storage
│   ├── indexer.py       # ChromaDB indexer
│   ├── ollama_client.py # Ollama API client
│   ├── analyzer.py      # LLM analysis engine
│   ├── briefing.py      # Briefing generator
│   └── market_map.py    # Stock market treemap data
├── cli.py               # CLI interface
├── gui.py               # Tkinter GUI
├── app.py               # Flask web server
├── api.py               # Python API
├── tools.py             # OpenAI tools
└── i18n.py              # 10-language translations

Development

git clone https://github.com/cycleuser/ShouChao.git
cd ShouChao
pip install -e ".[dev]"
python -m pytest tests/test_unified_api.py -v

License

GPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shouchao-0.2.1.tar.gz (157.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shouchao-0.2.1-py3-none-any.whl (165.8 kB view details)

Uploaded Python 3

File details

Details for the file shouchao-0.2.1.tar.gz.

File metadata

  • Download URL: shouchao-0.2.1.tar.gz
  • Upload date:
  • Size: 157.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for shouchao-0.2.1.tar.gz
Algorithm Hash digest
SHA256 289bd1e6fcbafa7bc4abb029cae8254887929684d075298f1fb616dc840b7c69
MD5 e0037f350ca6c91c92875b544a0df8fe
BLAKE2b-256 085d1cd80aa5bfa4aad26b9887e9c7d239a7c47ce91fc3f60eeb38ab0b37a597

See more details on using hashes here.

File details

Details for the file shouchao-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: shouchao-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 165.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for shouchao-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 46c3771cbf4b821aaa9bf9b1b7db7c86fa0b0e7cbf8f70a477f70667606be30f
MD5 a311bfec7854a31f76d89ad540420966
BLAKE2b-256 a355b636920e54059997a72fa0239ae69fe5064bff4989b394bea6dc688c0ef6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page