Autonomous news ingestion and verification pipeline with AI agents
Project description
news48 collects feed entries, downloads article pages, parses structured content with an LLM, applies retention policy, and continuously coordinates recurring work through scheduled agents โ agents that learn from their mistakes and get smarter over time.
Table of Contents
- Features
- Architecture
- Self-Learning Agents
- Quick Start
- Usage
- Docker
- MCP Integration
- Development
- License
โจ Features
| ๐ก Feed Ingestion | RSS and Atom sources with automatic deduplication |
| ๐ Article Pipeline | End-to-end lifecycle: fetch โ download โ parse |
| ๐งช Fact-Checking | Integrated verification workflow with verdict storage |
| ๐งน Retention & Health | Automated cleanup and database health tooling |
| ๐ค Autonomous Agents | Sentinel, executor, parser, and fact-checker run on schedules |
| ๐ง Self-Learning | Agents persist lessons across runs and improve over time |
๐๏ธ Architecture
Four scheduled agents run through Periodiq-scheduled Dramatiq actors backed by Redis:
โโโโโโโโโโโโโโโ
โ Periodiq โ
โ cron enqueueโ
โโโโโโโโฌโโโโโโโ
โโโโโโโโโฌโโโโโโโโผโโโโโโโโ
โผ โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โSentinelโโExecutorโโParserโโFact-checkerโ
โobservesโโ runs โโparsesโโ verifies โ
โโโโโฌโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโ
โโโโโโโโโฌโโโโโโโโ โ
โผ โผ
Redis queues .lessons.md
โ (shared memory)
โผ
Dramatiq workers + news48 CLI & tools
| Agent | Role |
|---|---|
| Sentinel | Observes system health, evaluates thresholds, creates fix plans |
| Executor | Claims a plan, executes steps, verifies outcomes |
| Parser | Claims downloaded articles and parses them autonomously |
| Fact-checker | Verifies claims by searching evidence and recording verdicts |
Source: Dramatiq actors in
news48/core/agents/actors.py, Periodiq cron schedules innews48/core/agents/actors.py, CLI entry points innews48/cli/commands/agents.py.
๐ง Self-Learning Agents
news48 agents learn from their mistakes and accumulate knowledge across runs. When an agent discovers something โ correct command syntax, a process insight, a feed-specific quirk, or an error recovery technique โ it saves the lesson to .lessons.md. On every subsequent run, all accumulated lessons are loaded into every agent's prompt.
Run 1: Executor fails with wrong timeout โ discovers 600s works โ saves lesson
Run 2: Executor starts with "timeout for fact-check should be 600s" already loaded
How it works:
- Save โ agents call
save_lessonwhenever they discover something worth remembering - Load โ
compose_agent_instructions()reads.lessons.mdand injects lessons into the system prompt - Cross-pollination โ all agents see all lessons (executor learns from sentinel, fact-checker learns from parser)
- Idempotent โ duplicate lessons are automatically skipped
- Human-auditable โ plain markdown, easy to read and prune
What agents learn:
| Category | Examples |
|---|---|
| Command Syntax | Correct flags, arguments, timeout values |
| Process Insights | How workflows actually behave in practice |
| Feed Quirks | Non-standard date formats, rate limits, HTML structures |
| Error Recovery | What fixes specific error conditions |
| Best Practices | Patterns that lead to better outcomes |
| Timing & Thresholds | Correct batch sizes, intervals, limits |
The lessons file is gitignored (instance-specific). See
news48/core/agents/skills/shared/lessons-learned.mdfor the agent-facing skill documentation.
๐ Quick Start
Prerequisites
Installation
# 1. Install dependencies (CLI + web)
uv sync --extra all
# 2. Configure environment
cp .env.example .env
Install extras:
uv sync --extra cli # CLI + agents only
uv sync --extra web # Web server only
uv sync --extra all # Everything
Edit .env and set the required variables:
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
โ | SQLAlchemy database URL for MySQL |
REDIS_URL |
Redis broker URL for Dramatiq + Periodiq | |
BYPARR_API_URL |
โ | Byparr service URL |
API_BASE |
โ | LLM API base URL |
API_KEY |
โ | LLM API key |
MODEL |
โ | Model identifier |
CONTEXT_WINDOW |
Context window size (default: 1048576) | |
SEARXNG_URL |
SearXNG instance for search | |
SMTP_HOST |
SMTP server for sentinel email alerts | |
SMTP_PORT |
SMTP port (default: 587) | |
SMTP_USER |
SMTP username | |
SMTP_PASS |
SMTP password | |
SMTP_FROM |
Sender email address | |
MONITOR_EMAIL_TO |
Recipient for sentinel alerts |
# 3. Verify installation
uv run news48 --help
๐ Usage
Manual Pipeline
# Seed feeds from a file
uv run news48 seed seed.txt
# Run pipeline stages
uv run news48 fetch
uv run news48 download --limit 10
uv run news48 parse --limit 10
# Inspect system state
uv run news48 stats --json
uv run news48 cleanup status --json
uv run news48 cleanup health --json
# Manage lessons learned
uv run news48 lessons list # view all lessons
uv run news48 lessons list --agent executor --json # filter by agent
uv run news48 lessons add --agent executor \
--category "Command Syntax" \
--lesson "Use timeout=600 for fact-check" # add manually
Agent Operations
One-shot runs:
uv run news48 agents run --agent sentinel
uv run news48 agents run --agent executor
uv run news48 agents run --agent parser
uv run news48 agents run --agent fact_checker
Continuous autonomous mode:
dramatiq news48.core.agents.actors --processes 1 --threads 8 # run workers
periodiq news48.core.agents.actors # enqueue cron tasks
uv run news48 agents status --json # inspect queues and schedules
agents start, agents stop, and agents dashboard are no longer part of the operational model. Docker manages worker lifecycle, while Redis stores queue state.
๐ณ Docker
news48 can run entirely in Docker with separate containers for the web interface, MySQL, Redis, Dramatiq workers, Periodiq scheduler, SearXNG, and Byparr.
Prerequisites
- Docker and Docker Compose
- An OpenAI-compatible LLM endpoint
- API keys configured in
.env
Setup
# One-time setup
cp .env.example .env
# Edit .env with your API keys
Seeding the Database
Before the pipeline can fetch articles, the database needs feed URLs. Create a seed.txt file in the project root with one RSS/Atom URL per line:
# seed.txt
https://feeds.arstechnica.com/arstechnica/index
https://feeds.bbci.co.uk/news/rss.xml
https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml
When the worker stack starts, the sentinel agent automatically detects an empty database and creates a seed plan for the executor. The executor then runs news48 seed seed.txt โ so seeding happens automatically as long as seed.txt is accessible inside the worker container.
Development โ the project root is mounted at /app, so seed.txt is automatically available at /app/seed.txt.
Production โ the project root is not mounted. Either:
- Build the image with
seed.txtpresent in the project root (it is not excluded by.dockerignoreand gets copied viaCOPY . .), or - Mount it at runtime:
docker compose -f docker-compose.yml -f docker-compose.prod.yml run --rm \
-v ./seed.txt:/app/seed.txt:ro \
dramatiq-worker news48 seed /app/seed.txt
You can also seed manually at any time:
# Development
docker compose exec dramatiq-worker news48 seed /app/seed.txt
# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml exec \
dramatiq-worker news48 seed /app/seed.txt
Verify feeds were added:
docker compose exec dramatiq-worker news48 feeds list
Docker Development
# Start all services with live reload
docker compose up
# Web UI available at http://localhost:8765
# Code changes auto-reload via volume mount
# RedisInsight available at http://localhost:8001
# Dozzle logs UI available at http://localhost:9999
# Run CLI commands
docker compose exec dramatiq-worker news48 stats
docker compose exec dramatiq-worker news48 feeds list
# Run one-off commands
docker compose run --rm dramatiq-worker news48 seed /app/seed.txt
# View logs
docker compose logs -f dramatiq-worker
docker compose logs -f periodiq-scheduler
docker compose logs -f web
docker compose logs -f redis
# Stop everything
docker compose down
# Fresh start (removes data)
docker compose down -v
Docker Production
# Start production stack
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Web UI available at http://localhost:8000
# Check status
docker compose ps
docker compose logs -f web
# Backup MySQL database
docker compose exec mysql mysqldump -unews48 -pnews48 news48 > backup.sql
# Update to new version
docker compose -f docker-compose.yml -f docker-compose.prod.yml build --no-cache
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Stop production
docker compose -f docker-compose.yml -f docker-compose.prod.yml down
Worker Observability
There is no dedicated Dramatiq admin UI in the current stack. Use the included Docker and Redis tooling instead:
- RedisInsight at
http://localhost:8001โ inspect Redis keys, queues, and broker state - Dozzle at
http://localhost:9999in development โ inspect container logs fordramatiq-workerandperiodiq-scheduler uv run news48 agents status --jsonโ inspect queue and schedule state from the CLI
This means Dramatiq execution is currently observed through Redis, logs, and the CLI rather than through a standalone Dramatiq dashboard.
Architecture
| Service | Image | Port | Role |
|---|---|---|---|
web |
news48-web (built) | 8000 | FastAPI web interface |
mysql |
mysql:8.0 | 3306 | Primary relational database |
redis |
redis/redis-stack | 6379 / 8001 | Dramatiq broker + RedisInsight |
dramatiq-worker |
news48-worker (built) | none | Executes agents and pipeline actors |
periodiq-scheduler |
news48-worker (built) | none | Enqueues scheduled agent and pipeline work |
searxng |
searxng/searxng:latest | 8080 (internal) | Meta-search engine |
dozzle |
amir20/dozzle:latest | 8080 | Container log viewer |
byparr |
ghcr.io/thephaseless/byparr:main | 8191 (internal) | Anti-bot bypass |
๐ MCP Integration
news48 exposes operations via the Model Context Protocol so AI assistants can interact with your news pipeline. There are two MCP interfaces: a local server (stdio, no auth) for development, and a remote endpoint (Streamable HTTP, auth required) for production.
Local MCP Server (stdio)
The local MCP server communicates over stdio and requires no authentication โ ideal for local AI assistants like Claude Desktop or Cursor:
uv run news48 mcp serve
Configure your AI assistant to use it:
{
"mcpServers": {
"news48": {
"command": "news48",
"args": ["mcp", "serve"]
}
}
}
Available tools: fetch_feeds, list_feeds, search_articles, get_article_detail, get_stats, parse_article
Remote MCP Endpoint (Streamable HTTP)
The web app exposes an authenticated MCP endpoint at /mcp/ protected by API keys stored in Redis. Before connecting, you need to create a key.
Creating an API Key
# Local development
uv run news48 mcp create-key --label "Claude Desktop"
# Output: Created MCP API key: n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...
# Store this key securely โ it cannot be retrieved later.
# Inside Docker
docker compose exec dramatiq-worker news48 mcp create-key --label "My Assistant"
Important: The full key is only displayed once at creation time. Copy it immediately and store it securely. The
list-keyscommand only shows masked keys (e.g.,n48-aBcD...vXyZ).
Managing API Keys
# List all active keys (masked)
uv run news48 mcp list-keys
# Revoke a key (takes effect immediately, no restart needed)
uv run news48 mcp revoke-key n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...
# In Docker
docker compose exec dramatiq-worker news48 mcp list-keys
docker compose exec dramatiq-worker news48 mcp revoke-key n48-...
Keys are stored in a Redis SET (mcp:keys) for O(1) lookup. Revoking a key removes it from Redis instantly โ no application restart required. Keys persist across Redis restarts (Redis is configured with AOF + periodic snapshots).
Connecting a Remote MCP Client
Once you have a key, configure your AI assistant:
{
"mcpServers": {
"news48-remote": {
"url": "https://your-domain.com/mcp/",
"headers": {
"Authorization": "Bearer n48-your-api-key-here"
}
}
}
}
Available tools: browse_articles, get_topic_clusters, article_detail, web_stats
Testing the Endpoint
# Without auth โ should return 401
curl -s -o /dev/null -w "%{http_code}" -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'
# With valid auth โ should return 200
curl -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-H "Authorization: Bearer n48-your-key-here" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'
Security: All keys are prefixed
n48-for easy detection in secret scanners. The web service requiresREDIS_URLto verify keys โ if Redis is unreachable, all MCP requests are denied (fail-closed).
๐งฌ Development
# Run test suite
uv run pytest
# Format code
uv run black .
uv run isort .
Key test suites cover agent behavior, planner tools, lessons learned, streaming, and database claim paths.
๐ License
MIT โ see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file news48-0.1.1.tar.gz.
File metadata
- Download URL: news48-0.1.1.tar.gz
- Upload date:
- Size: 175.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56751d431ab574ffb0a034d583cbd449b88973a15ee441bc865b05246049e809
|
|
| MD5 |
3b2868e8107bb0e503922fc7424a0c99
|
|
| BLAKE2b-256 |
1503ac63e8519e0b10c3d5e68541f2cfd31a0b13ef233b605fefe95fde4c36e8
|
Provenance
The following attestation bundles were made for news48-0.1.1.tar.gz:
Publisher:
ci.yml on malvavisc0/news48
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
news48-0.1.1.tar.gz -
Subject digest:
56751d431ab574ffb0a034d583cbd449b88973a15ee441bc865b05246049e809 - Sigstore transparency entry: 1364793795
- Sigstore integration time:
-
Permalink:
malvavisc0/news48@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/malvavisc0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file news48-0.1.1-py3-none-any.whl.
File metadata
- Download URL: news48-0.1.1-py3-none-any.whl
- Upload date:
- Size: 198.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cd7bce4aa2b95d0346ddbc681d22a6d4fe0dde60f8baaf6730e313b045345ed
|
|
| MD5 |
04553be3155e80e741a7a652c6fb0369
|
|
| BLAKE2b-256 |
03fafd5567dd3fd25f9978de29bb2c6ebd907c1761b7db99167de1974e49f203
|
Provenance
The following attestation bundles were made for news48-0.1.1-py3-none-any.whl:
Publisher:
ci.yml on malvavisc0/news48
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
news48-0.1.1-py3-none-any.whl -
Subject digest:
0cd7bce4aa2b95d0346ddbc681d22a6d4fe0dde60f8baaf6730e313b045345ed - Sigstore transparency entry: 1364793823
- Sigstore integration time:
-
Permalink:
malvavisc0/news48@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/malvavisc0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9 -
Trigger Event:
push
-
Statement type: