Web Information Retrieval Assistant - aggregate, analyze, and brief on web information
Project description
ShouChao (手抄) - Web Information Retrieval Assistant
Aggregates news from 100+ major media sources across 10 languages, converts articles to structured markdown, indexes them into a ChromaDB knowledge base, and provides AI-powered briefings and analysis for investment, immigration, and study abroad scenarios.
Features
- 10-Language Coverage: Chinese, English, Japanese, French, Russian, German, Italian, Spanish, Portuguese, Korean
- 100+ News Sources: Reuters, BBC, NHK, Le Monde, TASS, DW, ANSA, El Pais, Folha, Yonhap, and many more
- Multiple Reader Backends: requests, curl_cffi, DrissionPage, Playwright with human-like browsing behavior
- RSS + Web Reading: RSS feeds for efficient discovery, web reading for full articles
- Markdown Storage: Articles saved as
{lang}/{site}/{date}/{title}.mdwith YAML front matter - ChromaDB Knowledge Base: GangDan-compatible vector database for semantic search
- AI Analysis: Investment, immigration, study abroad, and general news analysis via Ollama
- News Briefings: Daily, weekly, and domain-specific briefings with LLM summarization
- Three Interfaces: CLI, GUI (tkinter), and Web (Flask) dashboard
- i18n: Full 10-language UI support
- Stock Market Treemap: Real-time global market heatmap (A-Share, HK, US) with sector visualization
Requirements
- Python >= 3.10
- Ollama (for AI features: analysis, briefings, semantic search)
Installation
pip install shouchao
Or install from source:
git clone https://github.com/cycleuser/ShouChao.git
cd ShouChao
pip install -e .
Optional dependencies
pip install shouchao[all] # All optional modules + readability
pip install shouchao[curl] # curl_cffi for better browser simulation
pip install shouchao[browser] # DrissionPage (system Chrome)
pip install shouchao[readability] # Better content extraction
Quick Start
# List available news sources
shouchao sources --language en
# Fetch news
shouchao fetch --language en --max-articles 50
# Generate daily briefing
shouchao briefing --language zh --output daily_briefing.md
# Start web dashboard
shouchao web
Stock Market Heatmap
# Open stock market treemap visualization
# Access at http://localhost:5000/market
shouchao web
# Then navigate to /market or click "股市热力图" in sidebar
The heatmap shows:
- A-Share (Shanghai/Shenzhen): 30+ sectors, real-time data from East Money API
- HK Stocks: Major sectors and companies
- US Stocks: NASDAQ/NYSE tech giants and major indices
- Global View: Combined view of all markets
Features:
- Color-coded performance (Red=Up, Green=Down - China convention)
- Market-cap weighted sizing
- Sector grouping
- Click for stock details
- Real-time refresh
Usage
CLI Options
| Command | Description |
|---|---|
shouchao fetch |
Fetch news from sources |
shouchao search "query" |
Search indexed news |
shouchao briefing |
Generate news briefings |
shouchao analyze "query" |
Analyze news for scenarios |
shouchao index |
Index articles into ChromaDB |
shouchao sources |
List/manage news sources |
shouchao config |
View/update configuration |
shouchao web |
Start Flask web server |
shouchao gui |
Launch tkinter GUI |
Global Flags
| Flag | Description |
|---|---|
-V, --version |
Show version |
-v, --verbose |
Verbose output |
--json |
JSON output |
-q, --quiet |
Suppress non-essential output |
--data-dir PATH |
Custom data directory |
Fetch Examples
shouchao fetch --language zh --max 20 # Chinese news
shouchao fetch --language en --source "Reuters" # Specific source
shouchao fetch --fetcher curl # Use curl_cffi backend
shouchao fetch --language ja,ko --max 5 # Multiple languages
Analysis Scenarios
shouchao analyze "Impact of new EU AI Act" --scenario investment
shouchao analyze "Canada immigration policy 2026" --scenario immigration
shouchao analyze "UK university tuition changes" --scenario study_abroad
shouchao analyze "Global semiconductor trends" --scenario general
Python API
from shouchao import fetch_news, search_news, analyze_news, list_sources
# List sources
result = list_sources(language="en")
print(result.data["count"]) # Number of English sources
# Fetch news
result = fetch_news(language="en", max_articles=10)
print(result.data["fetched"]) # Articles fetched
# Search
result = search_news(query="climate change", top_k=5)
for r in result.data["results"]:
print(r["metadata"]["title"])
# Analyze
result = analyze_news(query="market trends", scenario="investment")
print(result.data["content"])
Agent Integration (OpenAI Function Calling)
ShouChao exposes OpenAI-compatible tools for LLM agents:
from shouchao.tools import TOOLS, dispatch
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
result = dispatch(
tool_call.function.name,
tool_call.function.arguments,
)
CLI Help
Project Structure
shouchao/
├── core/
│ ├── config.py # Configuration management
│ ├── sources.py # 100+ news source registry
│ ├── fetcher.py # HTTP fetcher backends
│ ├── rss.py # RSS/Atom feed parser
│ ├── converter.py # HTML-to-Markdown pipeline
│ ├── storage.py # Article file storage
│ ├── indexer.py # ChromaDB indexer
│ ├── ollama_client.py # Ollama API client
│ ├── analyzer.py # LLM analysis engine
│ ├── briefing.py # Briefing generator
│ └── market_map.py # Stock market treemap data
├── cli.py # CLI interface
├── gui.py # Tkinter GUI
├── app.py # Flask web server
├── api.py # Python API
├── tools.py # OpenAI tools
└── i18n.py # 10-language translations
Development
git clone https://github.com/cycleuser/ShouChao.git
cd ShouChao
pip install -e ".[dev]"
python -m pytest tests/test_unified_api.py -v
License
GPL-3.0-or-later
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shouchao-0.2.1.tar.gz.
File metadata
- Download URL: shouchao-0.2.1.tar.gz
- Upload date:
- Size: 157.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
289bd1e6fcbafa7bc4abb029cae8254887929684d075298f1fb616dc840b7c69
|
|
| MD5 |
e0037f350ca6c91c92875b544a0df8fe
|
|
| BLAKE2b-256 |
085d1cd80aa5bfa4aad26b9887e9c7d239a7c47ce91fc3f60eeb38ab0b37a597
|
File details
Details for the file shouchao-0.2.1-py3-none-any.whl.
File metadata
- Download URL: shouchao-0.2.1-py3-none-any.whl
- Upload date:
- Size: 165.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46c3771cbf4b821aaa9bf9b1b7db7c86fa0b0e7cbf8f70a477f70667606be30f
|
|
| MD5 |
a311bfec7854a31f76d89ad540420966
|
|
| BLAKE2b-256 |
a355b636920e54059997a72fa0239ae69fe5064bff4989b394bea6dc688c0ef6
|