Autonomous news ingestion and verification pipeline with AI agents
Project description
๐๏ธ news48
Autonomous news ingestion & verification pipeline with self-learning AI agents
Collect โ Download โ Parse โ Fact-check โ on repeat, with agents that learn.
๐ Table of Contents
- What Is It?
- Web Interface
- Pipeline
- Agents
- Autonomous Operation
- CLI Reference
- Quick Start
- Docker
- MCP Integration
- Development
- License
๐ What Is It?
news48 is a self-hosted news pipeline that:
- Ingests RSS/Atom feeds from sources you choose
- Downloads full article content (with anti-bot bypass)
- Parses unstructured HTML into structured data via LLM
- Fact-checks claims against external evidence
- Purges stale data on a 48-hour retention window
All of this runs autonomously through four AI agents that schedule themselves via Dramatiq + Periodiq. The agents also learn from mistakes โ saving lessons that carry across runs so they get smarter over time.
๐ Web Interface
news48 ships a FastAPI web interface that displays the last 48 hours of verified news. In Docker it's available at http://localhost:8765 (dev) or http://localhost:8000 (prod).
Pages
| Route | Description |
|---|---|
/ |
Homepage โ top 10 stories, trending topics, expiring articles |
/all |
All stories with optional tone filter (?sentiment=positive|neutral|negative) |
/category/{slug} |
Category view with tone filter (e.g. /category/politics?sentiment=negative) |
/article/{id}/{slug} |
Article detail with fact-check breakdown and related coverage |
/cluster/{slug} |
Topic cluster โ all stories sharing a tag |
/docs |
Architecture overview & MCP setup guide (loads autonomous operation score) |
/monitor |
Internal monitor dashboard โ server-rendered pipeline stats (noindex) |
/api/stats |
JSON stats API (requires Authorization: Bearer <mcp-api-key>) |
/sitemap.xml |
Auto-generated XML sitemap |
/robots.txt |
Robots file with sitemap reference |
/llms.txt |
Machine-readable system description for AI assistants |
/health |
Health check endpoint ({"status": "ok"}) |
Features
- AI-rewritten summaries โ clear, plain-English summaries for every parsed story
- Fact-check breakdown โ per-claim verdicts (verified, disputed, mixed, unverifiable) with evidence
- Tone filter โ filter stories by sentiment (positive, neutral, negative) across all pages
- Trending topics โ auto-generated topic clusters from article tags
- Expiring stories โ catch time-sensitive reporting before it leaves the 48-hour window
- Deduplication โ same story from multiple sources is shown once per category
- Category normalization โ consistent category names (e.g.
artificial-intelligenceandartificial intelligenceare merged) - SEO-friendly โ canonical URLs, Open Graph tags, JSON-LD structured data, XML sitemap
- Rate limiting โ 120 req/min general, 20 req/min for search
- Security headers โ X-Content-Type-Options, X-Frame-Options, CSP, Referrer-Policy
๐ Pipeline
seed.txt โโโบ seed โโโบ fetch โโโบ download โโโบ parse โโโบ fact-check
โ โ โ โ โ
โผ โผ โผ โผ โผ
DB feeds DB articles HTML โ MD structured verdicts
data
| Stage | Command | What it does |
|---|---|---|
| ๐ฑ Seed | news48 seed seed.txt |
Load feed URLs into the database |
| ๐ก Fetch | news48 fetch |
Pull RSS/Atom entries โ store as articles |
| โฌ๏ธ Download | news48 download |
Fetch full article HTML (with bypass) |
| ๐งฉ Parse | news48 parse |
Extract title, summary, categories, sentiment via LLM |
| ๐ฌ Fact-check | news48 fact-check |
Verify claims against evidence, record verdicts |
| ๐งน Cleanup | news48 cleanup purge |
Remove articles older than 48 hours |
| ๐งน Summaries | news48 cleanup summaries |
Clean truncation markers from summaries |
Most commands support --json for machine-readable output and --limit to control batch size.
๐ค Agents
Four agents run on schedules through Periodiq โ Redis โ Dramatiq:
| Agent | Cron | What it does |
|---|---|---|
| Sentinel | */5 * * * * |
Monitors health, creates fix plans, deletes bad feeds |
| Executor | * * * * * |
Claims a plan, runs its steps, verifies outcomes |
| Parser | * * * * * |
Claims articles, runs LLM parsing autonomously |
| Fact-checker | */10 * * * * |
Verifies claims, searches evidence, records verdicts |
๐ง Self-Learning
Agents save lessons when they discover something useful. On the next run, all accumulated lessons are injected into every agent's prompt:
Run 1: Executor fails with wrong timeout โ discovers 600s works โ saves lesson
Run 2: Executor starts with "timeout for fact-check should be 600s" already loaded
Lessons are stored in data/lessons.db (SQLite), cross-pollinated across agents, and human-auditable.
news48 lessons list # view all
news48 lessons list --agent executor --json # filter by agent
news48 lessons add -a executor -c "Timing" -l "Use 600s timeout for fact-checks"
๐ฏ Autonomous Operation
news48 is designed to run without human intervention. Every capability is measured by a rigorous Autonomous Operation Score โ a weighted assessment across six dimensions that evaluates how well the system starts, monitors, heals, scales, optimizes, and contains errors on its own.
Current Score: 4.9 / 5.0 โ Autonomous
System runs unattended for extended periods. Human involvement only for strategic decisions.
| Dimension | Weight | Score | What it measures |
|---|---|---|---|
| Self-starting | 15% | 5.0 | Zero-to-running without human help โ auto-seeding, startup recovery, Periodiq scheduling |
| Self-monitoring | 20% | 4.7 | Health observation, structured reports, email alerts, canonical thresholds |
| Self-healing | 25% | 5.0 | Background download/parse/feed loops, plan deadlock healing, retry logic |
| Self-scaling | 10% | 5.0 | Parallel wave execution, concurrent actor instances, batch-repeat patterns |
| Self-optimizing | 10% | 5.0 | Cross-run lesson persistence, feed curation, plan deduplication |
| Error containment | 20% | 5.0 | Quality gates, retry limits, failure isolation, runtime timeouts, error taxonomy |
How It Works
Every checkpoint is enforced at one of two layers:
[code]โ Deterministic Python code paths (e.g.,StartupRecoveryMiddlewareruns recovery on every worker boot)[instruction]โ Agent skill files that define rules the LLM follows (e.g.,fail-safely.mdmandates breaking loops after 5 repeated errors)[both]โ Defence in depth with code + instructions working together
The full methodology โ all 40 checkpoints, verification procedures, and scoring rubric โ is documented in docs/autonomous-operation-score.md. The latest assessment is loaded dynamically on the /docs page.
Score Levels
| Range | Level | Meaning |
|---|---|---|
| 4.5 โ 5.0 | Autonomous | Runs unattended. Human only for strategic decisions. |
| 3.5 โ 4.4 | Supervised | Autonomous under normal conditions. Human needed for edge cases. |
| 2.5 โ 3.4 | Assisted | Handles happy path. Human needed for failures and recovery. |
| 1.5 โ 2.4 | Manual | Frequent human input required. |
| 1.0 โ 1.4 | Prototype | Only runs with direct supervision. |
๐ CLI Reference
Pipeline Commands
news48 seed <file> # Load feed URLs from file
news48 fetch # Pull RSS/Atom feeds
news48 download # Download article content
news48 parse # Parse articles with LLM
news48 fact-check # Fact-check parsed articles
news48 briefing # Structured news briefing (top stories, trending, stats)
news48 doctor # Health check of all external services
news48 stats # Show system statistics
Resource Management
# Feeds
news48 feeds list # List all feeds
news48 feeds add <url> # Add a feed
news48 feeds info <url-or-id> # Feed details
news48 feeds update <url-or-id> -t "Title" # Update metadata
news48 feeds delete <url-or-id> # Delete feed + articles
news48 feeds rss --hours 48 --output feed.xml # Generate RSS
# Articles
news48 articles list --status parsed # List by status
news48 articles info <id-or-url> # Article details
news48 articles content <id-or-url> # Show content
news48 articles update <id> --content-file <path> # Update fields
news48 articles delete <id-or-url> # Delete article
news48 articles reset <id> --all # Reset failure flags
news48 articles feature <id> # Mark as featured
news48 articles breaking <id> # Mark as breaking
news48 articles check <id> -s verified # Set fact-check result
news48 articles claims <id> # Show per-claim verdicts
# Fetches
news48 fetches list # View fetch history
Search
news48 search articles "climate change" # Full-text search
news48 search articles "election" --sentiment negative -l 5 # Filtered
Agents & Plans
news48 agents status # Queue depths + cron schedules
news48 agents run -a parser # Run one agent (enqueue to Dramatiq)
news48 agents run -a parser --inline # Run inline (debug, no Redis needed)
news48 plans list # List all plans
news48 plans list -s pending # Filter by status
news48 plans show <plan-id> # Show plan details
news48 plans cancel <plan-id> # Cancel a plan
news48 plans remediate --apply # Repair plan corruption
Observability
news48 lessons list # View agent lessons
Retention & Health
news48 cleanup status # Retention policy stats
news48 cleanup purge # Purge old articles (default: 48h)
news48 cleanup purge --dry-run # Preview without deleting
news48 cleanup health # Database connectivity check
news48 cleanup summaries # Clean truncation markers from summaries
news48 cleanup summaries --dry-run # Preview summary cleaning
Web & MCP
news48 mcp serve # Start MCP server (stdio)
news48 mcp create-key --label "Dev" # Create API key
news48 mcp list-keys # List active keys
news48 mcp revoke-key <key> # Revoke a key
๐ก Tip: Append
--jsonto any command for machine-readable output.
๐ Quick Start
Prerequisites
Install
Option A โ Docker (recommended):
One-liner install:
curl -fsSL https://raw.githubusercontent.com/malvavisc0/news48/master/scripts/install.sh | bash
Or clone and run manually:
git clone https://github.com/malvavisc0/news48.git && cd news48
./scripts/install.sh
The interactive installer clones the repository, checks prerequisites, prompts for deployment mode (GPU or external LLM), generates secure passwords, and launches all services.
Option B โ Local (uv):
git clone https://github.com/malvavisc0/news48.git && cd news48
uv sync --extra all
cp .env.example .env
# Edit .env with your API keys (see table below)
uv run news48 --help
Extras:
uv sync --extra cli # CLI + agents only
uv sync --extra web # Web server only
uv sync --extra all # Everything
Environment Variables
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
โ | SQLAlchemy database URL (MySQL) |
BYPARR_API_URL |
โ | Byparr service URL |
API_BASE |
โ | LLM API base URL |
API_KEY |
โ | LLM API key |
MODEL |
โ | Model identifier |
REDIS_URL |
Redis URL for Dramatiq (required for agents) | |
SEARXNG_URL |
SearXNG for fact-checker evidence search | |
CONTEXT_WINDOW |
Context window size (default: 1048576) | |
WEB_HOST |
Web server host (default: 0.0.0.0) |
|
WEB_PORT |
Web server port (default: 8000) |
|
PARSER_CONCURRENCY |
Parser agent concurrency (default: 8) |
|
ARTICLE_EXTRACTION |
Body extraction mode: trafilatura or none (default: trafilatura) |
|
SMTP_HOST |
SMTP host for sentinel email alerts | |
SMTP_PORT |
SMTP port (default: 587) | |
SMTP_USER |
SMTP username | |
SMTP_PASS |
SMTP password | |
SMTP_FROM |
Sender email address | |
MONITOR_EMAIL_TO |
Recipient for sentinel alerts | |
HF_TOKEN |
HuggingFace token for gated models (Docker GPU deployments) |
Run It
# 1. Seed feeds
uv run news48 seed seed.txt
# 2. Fetch articles
uv run news48 fetch
# 3. Download content
uv run news48 download --limit 10
# 4. Parse with LLM
uv run news48 parse --limit 10
# 5. Check stats
uv run news48 stats
๐ณ Docker
news48 runs entirely in Docker with separate containers for each service.
Services
| Service | Port | Role |
|---|---|---|
web |
8000 | FastAPI web interface |
mysql |
3306 | Primary database |
redis |
6379 | Dramatiq broker + RedisInsight (8001) |
dramatiq-worker |
โ | Executes agents and pipeline actors |
periodiq-scheduler |
โ | Enqueues scheduled work |
searxng |
8080โ | Meta-search engine |
byparr |
8191โ | Anti-bot bypass |
dozzle |
9999 | Container log viewer (dev) |
โ internal only
Development
# Start with live reload
docker compose up
# Web UI โ http://localhost:8765
# RedisInsight โ http://localhost:8001
# Dozzle โ http://localhost:9999
# Run CLI inside container
docker compose exec dramatiq-worker news48 stats
docker compose exec dramatiq-worker news48 feeds list
# Logs
docker compose logs -f dramatiq-worker
docker compose logs -f web
# Stop
docker compose down # keep data
docker compose down -v # fresh start
Production
# Start
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Web UI โ http://localhost:8000
# Backup
docker compose exec mysql mysqldump -unews48 -pnews48 news48 > backup.sql
# Update
docker compose -f docker-compose.yml -f docker-compose.prod.yml build --no-cache
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Stop
docker compose -f docker-compose.yml -f docker-compose.prod.yml down
External LLM (no local GPU)
Use docker-compose.external-llm.yml to skip the built-in llama.cpp server and point at any OpenAI-compatible endpoint instead:
# Set API_BASE in .env (examples):
# API_BASE=http://host:7070/v1 # host llama.cpp
# API_BASE=https://api.openai.com/v1 # OpenAI
# API_BASE=https://api.groq.com/openai/v1 # Groq
# API_BASE=http://host:11434/v1 # Ollama
docker compose -f docker-compose.yml -f docker-compose.external-llm.yml up -d
This disables the Docker llama.cpp container and model download โ ideal for hosted LLM APIs or when running llama.cpp on the host machine.
Seeding in Docker
The sentinel agent auto-detects an empty database and creates a seed plan โ so if seed.txt is in the image, seeding happens automatically.
# Manual seed
docker compose exec dramatiq-worker news48 seed /app/seed.txt
# Verify
docker compose exec dramatiq-worker news48 feeds list
Worker Observability
- RedisInsight โ
http://localhost:8001โ inspect queues and broker state - Dozzle โ
http://localhost:9999โ container log viewer - CLI โ
news48 agents status --jsonโ queue depths and cron schedules
๐ MCP Integration
news48 exposes tools via the Model Context Protocol so AI assistants can interact with your pipeline.
Local Server (stdio)
No auth required โ ideal for Claude Desktop, Cursor, etc.
uv run news48 mcp serve
{
"mcpServers": {
"news48": {
"command": "news48",
"args": ["mcp", "serve"]
}
}
}
Tools: get_briefing, search_news, get_article, browse_category, list_categories, list_countries
Remote Endpoint (HTTP)
The web app exposes an authenticated endpoint at /mcp/. Keys are stored in Redis.
# Create a key
uv run news48 mcp create-key --label "Claude Desktop"
# โ Created MCP API key: n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...
# โ ๏ธ Copy it now โ it can't be retrieved later
# List keys (masked)
uv run news48 mcp list-keys
# Revoke a key
uv run news48 mcp revoke-key n48-...
{
"mcpServers": {
"news48-remote": {
"url": "https://your-domain.com/mcp/",
"headers": {
"Authorization": "Bearer n48-your-api-key-here"
}
}
}
}
Tools: get_briefing, search_news, get_article, browse_category, list_categories, list_countries
๐ All keys are prefixed
n48-for secret scanner detection. If Redis is unreachable, all MCP requests are denied (fail-closed).
๐งฌ Development
# Run tests
uv run pytest
# Format
uv run black .
uv run isort .
๐ License
MIT โ see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file news48-0.3.2.tar.gz.
File metadata
- Download URL: news48-0.3.2.tar.gz
- Upload date:
- Size: 246.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1279b3f2f1505769af7a9e95c29f19fe8a84ca64daab36a2ccc13a96556ab398
|
|
| MD5 |
e656ef4db07a916502ce842b7b360525
|
|
| BLAKE2b-256 |
f0245698aa9410aa4c5472c0b5f2a7af47a72c0a282c2b2baa37fb1c5ae20c93
|
Provenance
The following attestation bundles were made for news48-0.3.2.tar.gz:
Publisher:
ci.yml on malvavisc0/news48
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
news48-0.3.2.tar.gz -
Subject digest:
1279b3f2f1505769af7a9e95c29f19fe8a84ca64daab36a2ccc13a96556ab398 - Sigstore transparency entry: 1396937559
- Sigstore integration time:
-
Permalink:
malvavisc0/news48@f730a525d1d3415abcb993da299434766d88587b -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/malvavisc0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@f730a525d1d3415abcb993da299434766d88587b -
Trigger Event:
push
-
Statement type:
File details
Details for the file news48-0.3.2-py3-none-any.whl.
File metadata
- Download URL: news48-0.3.2-py3-none-any.whl
- Upload date:
- Size: 269.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c39e78d5f2fd35348c6a6092b10a8502856e127390701c4865ef4c9f0bbefa87
|
|
| MD5 |
74413928b1268dbc036638ec8abbf9e5
|
|
| BLAKE2b-256 |
370c42d71d10b34de9f1823c499602400fe64cf1982cb9104b90bdbad0da1ddf
|
Provenance
The following attestation bundles were made for news48-0.3.2-py3-none-any.whl:
Publisher:
ci.yml on malvavisc0/news48
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
news48-0.3.2-py3-none-any.whl -
Subject digest:
c39e78d5f2fd35348c6a6092b10a8502856e127390701c4865ef4c9f0bbefa87 - Sigstore transparency entry: 1396937573
- Sigstore integration time:
-
Permalink:
malvavisc0/news48@f730a525d1d3415abcb993da299434766d88587b -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/malvavisc0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@f730a525d1d3415abcb993da299434766d88587b -
Trigger Event:
push
-
Statement type: