Autonomous news ingestion and verification pipeline with AI agents

Project description

🗞️ news48

Autonomous news ingestion and verification pipeline with self-learning AI agents

news48 collects feed entries, downloads article pages, parses structured content with an LLM, applies retention policy, and continuously coordinates recurring work through scheduled agents — agents that learn from their mistakes and get smarter over time.

Features
Architecture
Self-Learning Agents
Quick Start
Usage
- Manual Pipeline
- Agent Operations
Docker
MCP Integration
- Local MCP Server
- Remote MCP Endpoint
Development
License

✨ Features


📡 Feed Ingestion	RSS and Atom sources with automatic deduplication
🔄 Article Pipeline	End-to-end lifecycle: fetch → download → parse
🧪 Fact-Checking	Integrated verification workflow with verdict storage
🧹 Retention & Health	Automated cleanup and database health tooling
🤖 Autonomous Agents	Sentinel, executor, parser, and fact-checker run on schedules
🧠 Self-Learning	Agents persist lessons across runs and improve over time

🏗️ Architecture

Four scheduled agents run through Periodiq-scheduled Dramatiq actors backed by Redis:

                    ┌─────────────┐
                    │  Periodiq   │
                    │ cron enqueue│
                    └──────┬──────┘
           ┌───────┬───────┼───────┐
            ▼       ▼       ▼       ▼
        ┌────────┐┌────────┐┌─────┐┌────────────┐
        │Sentinel││Executor││Parser││Fact-checker│
        │observes││  runs  ││parses││  verifies  │
        └───┬────┘└───┬────┘└──┬──┘└─────┬──────┘
            └───────┬───────┘          │
                    ▼                  ▼
                Redis queues      .lessons.md
                    │            (shared memory)
                    ▼
           Dramatiq workers + news48 CLI & tools

Agent	Role
Sentinel	Observes system health, evaluates thresholds, creates fix plans
Executor	Claims a plan, executes steps, verifies outcomes
Parser	Claims downloaded articles and parses them autonomously
Fact-checker	Verifies claims by searching evidence and recording verdicts

Source: Dramatiq actors in news48/core/agents/actors.py, Periodiq cron schedules in news48/core/agents/actors.py, CLI entry points in news48/cli/commands/agents.py.

🧠 Self-Learning Agents

news48 agents learn from their mistakes and accumulate knowledge across runs. When an agent discovers something — correct command syntax, a process insight, a feed-specific quirk, or an error recovery technique — it saves the lesson to .lessons.md. On every subsequent run, all accumulated lessons are loaded into every agent's prompt.

Run 1:  Executor fails with wrong timeout → discovers 600s works → saves lesson
Run 2:  Executor starts with "timeout for fact-check should be 600s" already loaded

How it works:

Save — agents call save_lesson whenever they discover something worth remembering
Load — compose_agent_instructions() reads .lessons.md and injects lessons into the system prompt
Cross-pollination — all agents see all lessons (executor learns from sentinel, fact-checker learns from parser)
Idempotent — duplicate lessons are automatically skipped
Human-auditable — plain markdown, easy to read and prune

What agents learn:

Category	Examples
Command Syntax	Correct flags, arguments, timeout values
Process Insights	How workflows actually behave in practice
Feed Quirks	Non-standard date formats, rate limits, HTML structures
Error Recovery	What fixes specific error conditions
Best Practices	Patterns that lead to better outcomes
Timing & Thresholds	Correct batch sizes, intervals, limits

The lessons file is gitignored (instance-specific). See news48/core/agents/skills/shared/lessons-learned.md for the agent-facing skill documentation.

🚀 Quick Start

Prerequisites

Python 3.12+
uv package manager
An OpenAI-compatible LLM endpoint
A Byparr instance for downloading

Installation

# 1. Install dependencies (CLI + web)
uv sync --extra all

# 2. Configure environment
cp .env.example .env

Install extras:

uv sync --extra cli    # CLI + agents only
uv sync --extra web    # Web server only
uv sync --extra all    # Everything

Edit .env and set the required variables:

Variable	Required	Description
`DATABASE_URL`	✅	SQLAlchemy database URL for MySQL
`REDIS_URL`		Redis broker URL for Dramatiq + Periodiq
`BYPARR_API_URL`	✅	Byparr service URL
`API_BASE`	✅	LLM API base URL
`API_KEY`	✅	LLM API key
`MODEL`	✅	Model identifier
`CONTEXT_WINDOW`		Context window size (default: 1048576)
`SEARXNG_URL`		SearXNG instance for search
`SMTP_HOST`		SMTP server for sentinel email alerts
`SMTP_PORT`		SMTP port (default: 587)
`SMTP_USER`		SMTP username
`SMTP_PASS`		SMTP password
`SMTP_FROM`		Sender email address
`MONITOR_EMAIL_TO`		Recipient for sentinel alerts

# 3. Verify installation
uv run news48 --help

📖 Usage

Manual Pipeline

# Seed feeds from a file
uv run news48 seed seed.txt

# Run pipeline stages
uv run news48 fetch
uv run news48 download --limit 10
uv run news48 parse --limit 10

# Inspect system state
uv run news48 stats --json
uv run news48 cleanup status --json
uv run news48 cleanup health --json

# Manage lessons learned
uv run news48 lessons list                          # view all lessons
uv run news48 lessons list --agent executor --json  # filter by agent
uv run news48 lessons add --agent executor \
  --category "Command Syntax" \
  --lesson "Use timeout=600 for fact-check"         # add manually

Agent Operations

One-shot runs:

uv run news48 agents run --agent sentinel
uv run news48 agents run --agent executor
uv run news48 agents run --agent parser
uv run news48 agents run --agent fact_checker

Continuous autonomous mode:

dramatiq news48.core.agents.actors --processes 1 --threads 8  # run workers
periodiq news48.core.agents.actors                               # enqueue cron tasks
uv run news48 agents status --json                  # inspect queues and schedules

agents start, agents stop, and agents dashboard are no longer part of the operational model. Docker manages worker lifecycle, while Redis stores queue state.

🐳 Docker

news48 can run entirely in Docker with separate containers for the web interface, MySQL, Redis, Dramatiq workers, Periodiq scheduler, SearXNG, and Byparr.

Prerequisites

Docker and Docker Compose
An OpenAI-compatible LLM endpoint
API keys configured in .env

Setup

# One-time setup
cp .env.example .env
# Edit .env with your API keys

Seeding the Database

Before the pipeline can fetch articles, the database needs feed URLs. Create a seed.txt file in the project root with one RSS/Atom URL per line:

# seed.txt
https://feeds.arstechnica.com/arstechnica/index
https://feeds.bbci.co.uk/news/rss.xml
https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml

When the worker stack starts, the sentinel agent automatically detects an empty database and creates a seed plan for the executor. The executor then runs news48 seed seed.txt — so seeding happens automatically as long as seed.txt is accessible inside the worker container.

Development — the project root is mounted at /app, so seed.txt is automatically available at /app/seed.txt.

Production — the project root is not mounted. Either:

Build the image with seed.txt present in the project root (it is not excluded by .dockerignore and gets copied via COPY . .), or
Mount it at runtime:

docker compose -f docker-compose.yml -f docker-compose.prod.yml run --rm \
  -v ./seed.txt:/app/seed.txt:ro \
  dramatiq-worker news48 seed /app/seed.txt

You can also seed manually at any time:

# Development
docker compose exec dramatiq-worker news48 seed /app/seed.txt

# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml exec \
  dramatiq-worker news48 seed /app/seed.txt

Verify feeds were added:

docker compose exec dramatiq-worker news48 feeds list

Docker Development

# Start all services with live reload
docker compose up

# Web UI available at http://localhost:8765
# Code changes auto-reload via volume mount
# RedisInsight available at http://localhost:8001
# Dozzle logs UI available at http://localhost:9999

# Run CLI commands
docker compose exec dramatiq-worker news48 stats
docker compose exec dramatiq-worker news48 feeds list

# Run one-off commands
docker compose run --rm dramatiq-worker news48 seed /app/seed.txt

# View logs
docker compose logs -f dramatiq-worker
docker compose logs -f periodiq-scheduler
docker compose logs -f web
docker compose logs -f redis

# Stop everything
docker compose down

# Fresh start (removes data)
docker compose down -v

Docker Production

# Start production stack
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# Web UI available at http://localhost:8000

# Check status
docker compose ps
docker compose logs -f web

# Backup MySQL database
docker compose exec mysql mysqldump -unews48 -pnews48 news48 > backup.sql

# Update to new version
docker compose -f docker-compose.yml -f docker-compose.prod.yml build --no-cache
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# Stop production
docker compose -f docker-compose.yml -f docker-compose.prod.yml down

Worker Observability

There is no dedicated Dramatiq admin UI in the current stack. Use the included Docker and Redis tooling instead:

RedisInsight at http://localhost:8001 — inspect Redis keys, queues, and broker state
Dozzle at http://localhost:9999 in development — inspect container logs for dramatiq-worker and periodiq-scheduler
uv run news48 agents status --json — inspect queue and schedule state from the CLI

This means Dramatiq execution is currently observed through Redis, logs, and the CLI rather than through a standalone Dramatiq dashboard.

Architecture

Service	Image	Port	Role
`web`	news48-web (built)	8000	FastAPI web interface
`mysql`	mysql:8.0	3306	Primary relational database
`redis`	redis/redis-stack	6379 / 8001	Dramatiq broker + RedisInsight
`dramatiq-worker`	news48-worker (built)	none	Executes agents and pipeline actors
`periodiq-scheduler`	news48-worker (built)	none	Enqueues scheduled agent and pipeline work
`searxng`	searxng/searxng:latest	8080 (internal)	Meta-search engine
`dozzle`	amir20/dozzle:latest	8080	Container log viewer
`byparr`	ghcr.io/thephaseless/byparr:main	8191 (internal)	Anti-bot bypass

🔌 MCP Integration

news48 exposes operations via the Model Context Protocol so AI assistants can interact with your news pipeline. There are two MCP interfaces: a local server (stdio, no auth) for development, and a remote endpoint (Streamable HTTP, auth required) for production.

Local MCP Server (stdio)

The local MCP server communicates over stdio and requires no authentication — ideal for local AI assistants like Claude Desktop or Cursor:

uv run news48 mcp serve

Configure your AI assistant to use it:

{
  "mcpServers": {
    "news48": {
      "command": "news48",
      "args": ["mcp", "serve"]
    }
  }
}

Available tools: fetch_feeds, list_feeds, search_articles, get_article_detail, get_stats, parse_article

Remote MCP Endpoint (Streamable HTTP)

The web app exposes an authenticated MCP endpoint at /mcp/ protected by API keys stored in Redis. Before connecting, you need to create a key.

Creating an API Key

# Local development
uv run news48 mcp create-key --label "Claude Desktop"
# Output: Created MCP API key: n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...
# Store this key securely — it cannot be retrieved later.

# Inside Docker
docker compose exec dramatiq-worker news48 mcp create-key --label "My Assistant"

Important: The full key is only displayed once at creation time. Copy it immediately and store it securely. The list-keys command only shows masked keys (e.g., n48-aBcD...vXyZ).

Managing API Keys

# List all active keys (masked)
uv run news48 mcp list-keys

# Revoke a key (takes effect immediately, no restart needed)
uv run news48 mcp revoke-key n48-aBcDeFgHiJkLmNoPqRsTuVwXyZ...

# In Docker
docker compose exec dramatiq-worker news48 mcp list-keys
docker compose exec dramatiq-worker news48 mcp revoke-key n48-...

Keys are stored in a Redis SET (mcp:keys) for O(1) lookup. Revoking a key removes it from Redis instantly — no application restart required. Keys persist across Redis restarts (Redis is configured with AOF + periodic snapshots).

Connecting a Remote MCP Client

Once you have a key, configure your AI assistant:

{
  "mcpServers": {
    "news48-remote": {
      "url": "https://your-domain.com/mcp/",
      "headers": {
        "Authorization": "Bearer n48-your-api-key-here"
      }
    }
  }
}

Available tools: browse_articles, get_topic_clusters, article_detail, web_stats

Testing the Endpoint

# Without auth — should return 401
curl -s -o /dev/null -w "%{http_code}" -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'

# With valid auth — should return 200
curl -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer n48-your-key-here" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'

Security: All keys are prefixed n48- for easy detection in secret scanners. The web service requires REDIS_URL to verify keys — if Redis is unreachable, all MCP requests are denied (fail-closed).

🧬 Development

# Run test suite
uv run pytest

# Format code
uv run black .
uv run isort .

Key test suites cover agent behavior, planner tools, lessons learned, streaming, and database claim paths.

📄 License

MIT — see LICENSE for details.

Project details

Release history Release notifications | RSS feed

0.4.0

Apr 29, 2026

0.3.9

Apr 29, 2026

0.3.8

Apr 29, 2026

0.3.5

Apr 29, 2026

0.3.4

Apr 28, 2026

0.3.3

Apr 28, 2026

0.3.2

Apr 28, 2026

0.3.1

Apr 28, 2026

0.3.0

Apr 27, 2026

0.2.4

Apr 26, 2026

0.2.3

Apr 26, 2026

0.2.2

Apr 25, 2026

0.2.1

Apr 25, 2026

0.2.0

Apr 25, 2026

0.1.9

Apr 25, 2026

0.1.8

Apr 25, 2026

0.1.6

Apr 25, 2026

0.1.5

Apr 24, 2026

0.1.4

Apr 23, 2026

0.1.3

Apr 23, 2026

0.1.2

Apr 23, 2026

This version

0.1.1

Apr 23, 2026

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news48-0.1.1.tar.gz (175.9 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

news48-0.1.1-py3-none-any.whl (198.8 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file news48-0.1.1.tar.gz.

File metadata

Download URL: news48-0.1.1.tar.gz
Upload date: Apr 23, 2026
Size: 175.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for news48-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`56751d431ab574ffb0a034d583cbd449b88973a15ee441bc865b05246049e809`
MD5	`3b2868e8107bb0e503922fc7424a0c99`
BLAKE2b-256	`1503ac63e8519e0b10c3d5e68541f2cfd31a0b13ef233b605fefe95fde4c36e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for news48-0.1.1.tar.gz:

Publisher: ci.yml on malvavisc0/news48

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: news48-0.1.1.tar.gz
- Subject digest: 56751d431ab574ffb0a034d583cbd449b88973a15ee441bc865b05246049e809
- Sigstore transparency entry: 1364793795
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: malvavisc0/news48@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/malvavisc0
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9
- Trigger Event: push

File details

Details for the file news48-0.1.1-py3-none-any.whl.

File metadata

Download URL: news48-0.1.1-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 198.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for news48-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0cd7bce4aa2b95d0346ddbc681d22a6d4fe0dde60f8baaf6730e313b045345ed`
MD5	`04553be3155e80e741a7a652c6fb0369`
BLAKE2b-256	`03fafd5567dd3fd25f9978de29bb2c6ebd907c1761b7db99167de1974e49f203`

See more details on using hashes here.

Provenance

The following attestation bundles were made for news48-0.1.1-py3-none-any.whl:

Publisher: ci.yml on malvavisc0/news48

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: news48-0.1.1-py3-none-any.whl
- Subject digest: 0cd7bce4aa2b95d0346ddbc681d22a6d4fe0dde60f8baaf6730e313b045345ed
- Sigstore transparency entry: 1364793823
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: malvavisc0/news48@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/malvavisc0
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@cdc0ad3cbc132cdef29e5cf2b62fb87d5531a8c9
- Trigger Event: push

news48 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🗞️ news48

Table of Contents

✨ Features

🏗️ Architecture

🧠 Self-Learning Agents

🚀 Quick Start

Prerequisites

Installation

📖 Usage

Manual Pipeline

Agent Operations

🐳 Docker

Prerequisites

Setup

Seeding the Database

Docker Development

Docker Production

Worker Observability

Architecture

🔌 MCP Integration

Local MCP Server (stdio)

Remote MCP Endpoint (Streamable HTTP)

Creating an API Key

Managing API Keys

Connecting a Remote MCP Client

Testing the Endpoint

🧬 Development

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance