Global News Intelligence Platform - aggregate, analyze, and brief on world news
Project description
ShouChao (守巢) - Global News Intelligence Platform
Aggregates news from 100+ major media sources across 10 languages, converts articles to structured markdown, indexes them into a ChromaDB knowledge base, and provides AI-powered briefings and analysis for investment, immigration, and study abroad scenarios.
Features
- 10-Language Coverage: Chinese, English, Japanese, French, Russian, German, Italian, Spanish, Portuguese, Korean
- 100+ News Sources: Reuters, BBC, NHK, Le Monde, TASS, DW, ANSA, El Pais, Folha, Yonhap, and many more
- Multiple Fetcher Backends: requests, curl_cffi, DrissionPage, Playwright with human-like browsing behavior
- RSS + Web Scraping: RSS feeds for efficient discovery, web scraping for full articles
- Markdown Storage: Articles saved as
{lang}/{site}/{date}/{title}.mdwith YAML front matter - ChromaDB Knowledge Base: GangDan-compatible vector database for semantic search
- AI Analysis: Investment, immigration, study abroad, and general news analysis via Ollama
- News Briefings: Daily, weekly, and domain-specific briefings with LLM summarization
- Three Interfaces: CLI, GUI (tkinter), and Web (Flask) dashboard
- i18n: Full 10-language UI support
Requirements
- Python >= 3.10
- Ollama (for AI features: analysis, briefings, semantic search)
Installation
pip install shouchao
Or install from source:
git clone https://github.com/cycleuser/ShouChao.git
cd ShouChao
pip install -e .
Optional dependencies
pip install shouchao[all] # All optional fetchers + readability
pip install shouchao[curl] # curl_cffi for better bot evasion
pip install shouchao[browser] # DrissionPage (system Chrome)
pip install shouchao[readability] # Better content extraction
Quick Start
# List available news sources
shouchao sources --language en
# Fetch news articles
shouchao fetch --language en --max 10
# Search indexed news
shouchao search "AI regulation"
# Generate a daily briefing (requires Ollama)
shouchao briefing --type daily
# Analyze news for investment impact (requires Ollama)
shouchao analyze "EU policy changes" --scenario investment
# Start web dashboard
shouchao web --port 5001
# Launch GUI
shouchao gui
Usage
CLI Options
| Command | Description |
|---|---|
shouchao fetch |
Fetch news from sources |
shouchao search "query" |
Search indexed news |
shouchao briefing |
Generate news briefings |
shouchao analyze "query" |
Analyze news for scenarios |
shouchao index |
Index articles into ChromaDB |
shouchao sources |
List/manage news sources |
shouchao config |
View/update configuration |
shouchao web |
Start Flask web server |
shouchao gui |
Launch tkinter GUI |
Global Flags
| Flag | Description |
|---|---|
-V, --version |
Show version |
-v, --verbose |
Verbose output |
--json |
JSON output |
-q, --quiet |
Suppress non-essential output |
--data-dir PATH |
Custom data directory |
Fetch Examples
shouchao fetch --language zh --max 20 # Chinese news
shouchao fetch --language en --source "Reuters" # Specific source
shouchao fetch --fetcher curl # Use curl_cffi backend
shouchao fetch --language ja,ko --max 5 # Multiple languages
Analysis Scenarios
shouchao analyze "Impact of new EU AI Act" --scenario investment
shouchao analyze "Canada immigration policy 2026" --scenario immigration
shouchao analyze "UK university tuition changes" --scenario study_abroad
shouchao analyze "Global semiconductor trends" --scenario general
Python API
from shouchao import fetch_news, search_news, analyze_news, list_sources
# List sources
result = list_sources(language="en")
print(result.data["count"]) # Number of English sources
# Fetch news
result = fetch_news(language="en", max_articles=10)
print(result.data["fetched"]) # Articles fetched
# Search
result = search_news(query="climate change", top_k=5)
for r in result.data["results"]:
print(r["metadata"]["title"])
# Analyze
result = analyze_news(query="market trends", scenario="investment")
print(result.data["content"])
Agent Integration (OpenAI Function Calling)
ShouChao exposes OpenAI-compatible tools for LLM agents:
from shouchao.tools import TOOLS, dispatch
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
result = dispatch(
tool_call.function.name,
tool_call.function.arguments,
)
CLI Help
Project Structure
shouchao/
├── core/
│ ├── config.py # Configuration management
│ ├── sources.py # 100+ news source registry
│ ├── fetcher.py # HTTP fetcher backends
│ ├── rss.py # RSS/Atom feed parser
│ ├── converter.py # HTML-to-Markdown pipeline
│ ├── storage.py # Article file storage
│ ├── indexer.py # ChromaDB indexer
│ ├── ollama_client.py # Ollama API client
│ ├── analyzer.py # LLM analysis engine
│ └── briefing.py # Briefing generator
├── cli.py # CLI interface
├── gui.py # Tkinter GUI
├── app.py # Flask web server
├── api.py # Python API
├── tools.py # OpenAI tools
└── i18n.py # 10-language translations
Development
git clone https://github.com/cycleuser/ShouChao.git
cd ShouChao
pip install -e ".[dev]"
python -m pytest tests/test_unified_api.py -v
License
GPL-3.0-or-later
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shouchao-0.1.0.tar.gz.
File metadata
- Download URL: shouchao-0.1.0.tar.gz
- Upload date:
- Size: 70.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
039f8e517f7269bf2715a0c9765f28cf22c4d858384f0778035583a4d3928ee5
|
|
| MD5 |
cc09f296927a817f1a1ef4873be038b3
|
|
| BLAKE2b-256 |
47404be23b4ddd33f867a3872a21db5616cede81becaf1b6c694b0065ed060a4
|
File details
Details for the file shouchao-0.1.0-py3-none-any.whl.
File metadata
- Download URL: shouchao-0.1.0-py3-none-any.whl
- Upload date:
- Size: 75.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fa789e5e1842a4f5ffc0e22668d2acdc4b1894da8e5057d38e049cb1c403e07
|
|
| MD5 |
7a9a9f62fcfc420f7e76ba09a0753f57
|
|
| BLAKE2b-256 |
1717dcb0bd88e97d7a86956cba0ca903daee32f17cad209b93d6b4195f7c1dcd
|