Skip to main content

Autonomous news ingestion and verification pipeline with AI agents

Project description

๐Ÿ—ž๏ธ news48

Autonomous news ingestion and verification pipeline with self-learning AI agents

Python 3.12+ License: MIT uv


news48 collects feed entries, downloads article pages, parses structured content with an LLM, applies retention policy, and continuously coordinates recurring work through scheduled agents โ€” agents that learn from their mistakes and get smarter over time.

Table of Contents

โœจ Features

๐Ÿ“ก Feed Ingestion RSS and Atom sources with automatic deduplication
๐Ÿ”„ Article Pipeline End-to-end lifecycle: fetch โ†’ download โ†’ parse
๐Ÿงช Fact-Checking Integrated verification workflow with verdict storage
๐Ÿงน Retention & Health Automated cleanup and database health tooling
๐Ÿค– Autonomous Agents Sentinel, executor, parser, and fact-checker run on schedules
๐Ÿง  Self-Learning Agents persist lessons across runs and improve over time

๐Ÿ—๏ธ Architecture

Four scheduled agents run through Periodiq-scheduled Dramatiq actors backed by Redis:

                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  Periodiq   โ”‚
                    โ”‚ cron enqueueโ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ–ผ       โ–ผ       โ–ผ       โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”Œโ”€โ”€โ”€โ”€โ”€โ”โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚Sentinelโ”‚โ”‚Executorโ”‚โ”‚Parserโ”‚โ”‚Fact-checkerโ”‚
        โ”‚observesโ”‚โ”‚  runs  โ”‚โ”‚parsesโ”‚โ”‚  verifies  โ”‚
        โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜โ””โ”€โ”€โ”ฌโ”€โ”€โ”˜โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚
                    โ–ผ                  โ–ผ
                Redis queues      .lessons.md
                    โ”‚            (shared memory)
                    โ–ผ
           Dramatiq workers + news48 CLI & tools
Agent Role
Sentinel Observes system health, evaluates thresholds, creates fix plans
Executor Claims a plan, executes steps, verifies outcomes
Parser Claims downloaded articles and parses them autonomously
Fact-checker Verifies claims by searching evidence and recording verdicts

Source: Dramatiq actors in news48/core/agents/actors.py, Periodiq cron schedules in news48/core/agents/actors.py, CLI entry points in news48/cli/commands/agents.py.

๐Ÿง  Self-Learning Agents

news48 agents learn from their mistakes and accumulate knowledge across runs. When an agent discovers something โ€” correct command syntax, a process insight, a feed-specific quirk, or an error recovery technique โ€” it saves the lesson to .lessons.md. On every subsequent run, all accumulated lessons are loaded into every agent's prompt.

Run 1:  Executor fails with wrong timeout โ†’ discovers 600s works โ†’ saves lesson
Run 2:  Executor starts with "timeout for fact-check should be 600s" already loaded

How it works:

  • Save โ€” agents call save_lesson whenever they discover something worth remembering
  • Load โ€” compose_agent_instructions() reads .lessons.md and injects lessons into the system prompt
  • Cross-pollination โ€” all agents see all lessons (executor learns from sentinel, fact-checker learns from parser)
  • Idempotent โ€” duplicate lessons are automatically skipped
  • Human-auditable โ€” plain markdown, easy to read and prune

What agents learn:

Category Examples
Command Syntax Correct flags, arguments, timeout values
Process Insights How workflows actually behave in practice
Feed Quirks Non-standard date formats, rate limits, HTML structures
Error Recovery What fixes specific error conditions
Best Practices Patterns that lead to better outcomes
Timing & Thresholds Correct batch sizes, intervals, limits

The lessons file is gitignored (instance-specific). See news48/core/agents/skills/shared/lessons-learned.md for the agent-facing skill documentation.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.12+
  • uv package manager
  • An OpenAI-compatible LLM endpoint
  • A Byparr instance for downloading

Installation

# 1. Install dependencies (CLI + web)
uv sync --extra all

# 2. Configure environment
cp .env.example .env

Install extras:

uv sync --extra cli    # CLI + agents only
uv sync --extra web    # Web server only
uv sync --extra all    # Everything

Edit .env and set the required variables:

Variable Required Description
DATABASE_URL โœ… SQLAlchemy database URL for MySQL
REDIS_URL Redis broker URL for Dramatiq + Periodiq
BYPARR_API_URL โœ… Byparr service URL
API_BASE โœ… LLM API base URL
API_KEY โœ… LLM API key
MODEL โœ… Model identifier
CONTEXT_WINDOW Context window size (default: 1048576)
SEARXNG_URL SearXNG instance for search
SMTP_HOST SMTP server for sentinel email alerts
SMTP_PORT SMTP port (default: 587)
SMTP_USER SMTP username
SMTP_PASS SMTP password
SMTP_FROM Sender email address
MONITOR_EMAIL_TO Recipient for sentinel alerts
# 3. Verify installation
uv run news48 --help

๐Ÿ“– Usage

Manual Pipeline

# Seed feeds from a file
uv run news48 seed seed.txt

# Run pipeline stages
uv run news48 fetch
uv run news48 download --limit 10
uv run news48 parse --limit 10

# Inspect system state
uv run news48 stats --json
uv run news48 cleanup status --json
uv run news48 cleanup health --json

# Manage lessons learned
uv run news48 lessons list                          # view all lessons
uv run news48 lessons list --agent executor --json  # filter by agent
uv run news48 lessons add --agent executor \
  --category "Command Syntax" \
  --lesson "Use timeout=600 for fact-check"         # add manually

Agent Operations

One-shot runs:

uv run news48 agents run --agent sentinel
uv run news48 agents run --agent executor
uv run news48 agents run --agent parser
uv run news48 agents run --agent fact_checker

Continuous autonomous mode:

dramatiq news48.core.agents.actors --processes 1 --threads 8  # run workers
periodiq news48.core.agents.actors                               # enqueue cron tasks
uv run news48 agents status --json                  # inspect queues and schedules

agents start, agents stop, and agents dashboard are no longer part of the operational model. Docker manages worker lifecycle, while Redis stores queue state.

๐Ÿณ Docker

news48 can run entirely in Docker with separate containers for the web interface, MySQL, Redis, Dramatiq workers, Periodiq scheduler, SearXNG, and Byparr.

Prerequisites

Setup

# One-time setup
cp .env.example .env
# Edit .env with your API keys

Seeding the Database

Before the pipeline can fetch articles, the database needs feed URLs. Create a seed.txt file in the project root with one RSS/Atom URL per line:

# seed.txt
https://feeds.arstechnica.com/arstechnica/index
https://feeds.bbci.co.uk/news/rss.xml
https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml

When the worker stack starts, the sentinel agent automatically detects an empty database and creates a seed plan for the executor. The executor then runs news48 seed seed.txt โ€” so seeding happens automatically as long as seed.txt is accessible inside the worker container.

Development โ€” the project root is mounted at /app, so seed.txt is automatically available at /app/seed.txt.

Production โ€” the project root is not mounted. Either:

  • Build the image with seed.txt present in the project root (it is not excluded by .dockerignore and gets copied via COPY . .), or
  • Mount it at runtime:
docker compose -f docker-compose.yml -f docker-compose.prod.yml run --rm \
  -v ./seed.txt:/app/seed.txt:ro \
  dramatiq-worker news48 seed /app/seed.txt

You can also seed manually at any time:

# Development
docker compose exec dramatiq-worker news48 seed /app/seed.txt

# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml exec \
  dramatiq-worker news48 seed /app/seed.txt

Verify feeds were added:

docker compose exec dramatiq-worker news48 feeds list

Docker Development

# Start all services with live reload
docker compose up

# Web UI available at http://localhost:8765
# Code changes auto-reload via volume mount
# RedisInsight available at http://localhost:8001
# Dozzle logs UI available at http://localhost:9999

# Run CLI commands
docker compose exec dramatiq-worker news48 stats
docker compose exec dramatiq-worker news48 feeds list

# Run one-off commands
docker compose run --rm dramatiq-worker news48 seed /app/seed.txt

# View logs
docker compose logs -f dramatiq-worker
docker compose logs -f periodiq-scheduler
docker compose logs -f web
docker compose logs -f redis

# Stop everything
docker compose down

# Fresh start (removes data)
docker compose down -v

Docker Production

# Start production stack
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# Web UI available at http://localhost:8000

# Check status
docker compose ps
docker compose logs -f web

# Backup MySQL database
docker compose exec mysql mysqldump -unews48 -pnews48 news48 > backup.sql

# Update to new version
docker compose -f docker-compose.yml -f docker-compose.prod.yml build --no-cache
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# Stop production
docker compose -f docker-compose.yml -f docker-compose.prod.yml down

Worker Observability

There is no dedicated Dramatiq admin UI in the current stack. Use the included Docker and Redis tooling instead:

This means Dramatiq execution is currently observed through Redis, logs, and the CLI rather than through a standalone Dramatiq dashboard.

Architecture

Service Image Port Role
web news48-web (built) 8000 FastAPI web interface
mysql mysql:8.0 3306 Primary relational database
redis redis/redis-stack 6379 / 8001 Dramatiq broker + RedisInsight
dramatiq-worker news48-worker (built) none Executes agents and pipeline actors
periodiq-scheduler news48-worker (built) none Enqueues scheduled agent and pipeline work
searxng searxng/searxng:latest 8080 (internal) Meta-search engine
dozzle amir20/dozzle:latest 8080 Container log viewer
byparr ghcr.io/thephaseless/byparr:main 8191 (internal) Anti-bot bypass

๐Ÿ”Œ MCP Integration

news48 exposes operations via the Model Context Protocol so AI assistants can interact with your news pipeline. There are two MCP interfaces: a local server (stdio, no auth) for development, and a remote endpoint (Streamable HTTP, auth required) for production.

Local MCP Server (stdio)

The local MCP server communicates over stdio and requires no authentication โ€” ideal for local AI assistants like Claude Desktop or Cursor:

uv run news48 mcp serve

Configure your AI assistant to use it:

{
  "mcpServers": {
    "news48": {
      "command": "news48",
      "args": ["mcp", "serve"]
    }
  }
}

Available tools: fetch_feeds, list_feeds, search_articles, get_article_detail, get_stats, parse_article

Remote MCP Endpoint (Streamable HTTP)

The web app exposes an authenticated MCP endpoint at /mcp/ protected by API keys stored in Redis. Before connecting, you need to create a key.

Creating an API Key

# Local development
uv run news48 mcp create-key --label "Claude Desktop"
# Output: Created MCP API key: n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...
# Store this key securely โ€” it cannot be retrieved later.

# Inside Docker
docker compose exec dramatiq-worker news48 mcp create-key --label "My Assistant"

Important: The full key is only displayed once at creation time. Copy it immediately and store it securely. The list-keys command only shows masked keys (e.g., n48-aBcD...vXyZ).

Managing API Keys

# List all active keys (masked)
uv run news48 mcp list-keys

# Revoke a key (takes effect immediately, no restart needed)
uv run news48 mcp revoke-key n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...

# In Docker
docker compose exec dramatiq-worker news48 mcp list-keys
docker compose exec dramatiq-worker news48 mcp revoke-key n48-...

Keys are stored in a Redis SET (mcp:keys) for O(1) lookup. Revoking a key removes it from Redis instantly โ€” no application restart required. Keys persist across Redis restarts (Redis is configured with AOF + periodic snapshots).

Connecting a Remote MCP Client

Once you have a key, configure your AI assistant:

{
  "mcpServers": {
    "news48-remote": {
      "url": "https://your-domain.com/mcp/",
      "headers": {
        "Authorization": "Bearer n48-your-api-key-here"
      }
    }
  }
}

Available tools: browse_articles, get_topic_clusters, article_detail, web_stats

Testing the Endpoint

# Without auth โ€” should return 401
curl -s -o /dev/null -w "%{http_code}" -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'

# With valid auth โ€” should return 200
curl -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer n48-your-key-here" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'

Security: All keys are prefixed n48- for easy detection in secret scanners. The web service requires REDIS_URL to verify keys โ€” if Redis is unreachable, all MCP requests are denied (fail-closed).

๐Ÿงฌ Development

# Run test suite
uv run pytest

# Format code
uv run black .
uv run isort .

Key test suites cover agent behavior, planner tools, lessons learned, streaming, and database claim paths.

๐Ÿ“„ License

MIT โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news48-0.1.2.tar.gz (176.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

news48-0.1.2-py3-none-any.whl (198.8 kB view details)

Uploaded Python 3

File details

Details for the file news48-0.1.2.tar.gz.

File metadata

  • Download URL: news48-0.1.2.tar.gz
  • Upload date:
  • Size: 176.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for news48-0.1.2.tar.gz
Algorithm Hash digest
SHA256 cb108d54c051784420defbd72c53550ed620126bef0d950e5f01965440d7cc02
MD5 b619fa2456aeab35c673fea6604ad84f
BLAKE2b-256 05eca361f241ad4d142dd2eadd93cb8bfec4741e65715dea3f8500ffc8bb16a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for news48-0.1.2.tar.gz:

Publisher: ci.yml on malvavisc0/news48

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file news48-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: news48-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 198.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for news48-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f8ca5edb1f933ac5a744e0b3cca8f3cb58cb03e1c5defed7331fa12809a344eb
MD5 128454c5a505d88f6b0819d8432ed011
BLAKE2b-256 fadf7b9b48f95ceba9e85ec6cd9b4867cc4138ed4f259cccf4e06b2b46410a7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for news48-0.1.2-py3-none-any.whl:

Publisher: ci.yml on malvavisc0/news48

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page