Open-source AI content curation pipeline — scout, translate, publish
Project description
_ _
_ __ (_)_ __ ___ _ __ ___ ___| |_
| '_ \| | '_ \ / _ \ '_ \ / _ \/ __| __|
| |_) | | |_) | __/ |_) | (_) \__ \ |_
| .__/|_| .__/ \___| .__/ \___/|___/\__|
|_| |_| |_|
PipePost
Open-source AI content curation pipeline -- scout, translate, and publish articles from any domain automatically.
HackerNews ─┐ ┌─ Blog (webhook)
Reddit ─┤ ┌───────┐ ┌──────────┐ ┌──────────┐ ├─ Telegram channel
RSS/Atom ─┼──>│ Scout ├──>│Translate ├──>│ Publish ├──>├─ Markdown files
DuckDuckGo ─┤ │ + Score│ │ + Adapt │ │ + Fanout │ ├─ OpenClaw (23+ channels)
Custom ─┘ └───────┘ └──────────┘ └──────────┘ └─ Custom destination
AI ranks AI translates Publishes to
best articles & adapts style multiple targets
PipePost discovers articles from sources like HackerNews, Reddit, RSS feeds, and search engines, translates them to your target language using AI, and publishes to your blog or CMS. Works for any niche -- tech, business, health, lifestyle, and more.
Table of Contents
- Features
- Quick Start
- Architecture
- Use Cases
- Sources
- Destinations
- Steps
- Configuration
- Telegram Bot
- OpenClaw Integration
- Adding a Custom Source
- Adding a Custom Destination
- Supported LLM Models
- Docker
- Development
- Contributing
- License
Features
- 📡 Multiple Sources — HackerNews, Reddit, RSS/Atom, DuckDuckGo search
- 🌍 AI Translation — Full paragraph-by-paragraph translation via any LLM (DeepSeek, Claude, GPT, Qwen, etc.)
- 📝 Multiple Destinations — Webhook, Markdown, Telegram, OpenClaw (23+ channels)
- 🤖 Telegram Bot — Interactive curation: scout candidates, approve/reject via inline buttons
- 🎯 Smart Scoring — LLM-based candidate ranking by relevance, originality, and engagement
- ✍️ Style Adaptation — Adapt content for blog, Telegram, newsletter, or Twitter thread
- 📢 Fanout Publish — Publish to multiple destinations simultaneously
- 📦 Batch Mode — Process multiple articles in one run (
--batch -n 5) - 🔄 Composable Flows — Chain steps: dedup → scout → score → fetch → translate → adapt → publish
- 💾 Deduplication — SQLite-backed persistence prevents re-publishing across runs
- 📊 Prometheus Metrics — Pipeline runs, step durations, error counters (optional)
- ⚙️ Config-Driven Flows — Define entire pipelines in YAML without writing Python
- 🧩 Plugin Architecture — Add sources and destinations with a single file
- 🐳 Docker Ready —
docker compose upand go
Quick Start
# Install from PyPI
pip install pipepost
# Or from source
git clone https://github.com/DenSul/pipepost && cd pipepost
pip install -e .
# Configure
export PIPEPOST_MODEL=deepseek/deepseek-chat
export DEEPSEEK_API_KEY=your-key # or OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.
# List available components
pipepost sources
pipepost destinations
pipepost flows
# Run a pipeline flow
pipepost run default --source hackernews --dest webhook --lang ru
# Preview without publishing (dry run)
pipepost run default --source hackernews --dry-run
# Batch mode — process multiple articles
pipepost run default --source hackernews --batch -n 5
# Use a config file
pipepost run --config pipepost.yaml --source hackernews
# Run interactive Telegram bot
export TELEGRAM_BOT_TOKEN=your-bot-token
pipepost bot --source hackernews --lang ru
# Check health
pipepost health
Example batch output:
$ pipepost run default --source hackernews --batch -n 3 --lang ru
Batch: processed 3 article(s)
[1] Восемь лет желания, три месяца работы с ИИ | 2026-04-05-vosem-let-zhelaniya | ok
[2] Финская сауна усиливает иммунный ответ | 2026-04-05-finskaya-sauna | ok
[3] Утечка email-адресов в BrowserStack | 2026-04-05-utechka-email | ok
Architecture
graph LR
subgraph Sources
HN[HackerNews]
RD[Reddit]
RSS[RSS/Atom]
DDG[DuckDuckGo]
end
subgraph Pipeline
Dedup[Dedup<br><i>SQLite</i>]
Scout[Scout<br><i>fetch candidates</i>]
Score[Score<br><i>LLM ranking</i>]
Fetch[Fetch<br><i>download article</i>]
Translate[Translate<br><i>LLM translation</i>]
Adapt[Adapt<br><i>style: blog/tg/thread</i>]
Validate[Validate<br><i>quality check</i>]
end
subgraph Destinations
WH[Webhook / CMS]
MD[Markdown]
TG[Telegram]
OC[OpenClaw<br><i>23+ channels</i>]
end
HN & RD & RSS & DDG --> Dedup --> Scout --> Score --> Fetch --> Translate --> Adapt --> Validate
Validate --> WH & MD & TG & OC
style Pipeline fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
style Sources fill:#0f3460,stroke:#16213e,color:#e0e0e0
style Destinations fill:#533483,stroke:#16213e,color:#e0e0e0
Every step is independent and composable. The default flow runs end-to-end: loads published URLs from SQLite, scouts candidates, fetches content, translates via LLM, validates quality, publishes, and persists the URL to avoid duplicates.
Create custom flows by chaining steps:
from pipepost.core import Flow
from pipepost.steps import (
AdaptStep, DeduplicationStep, FanoutPublishStep, FetchStep,
PostPublishStep, ScoutStep, ScoringStep, TranslateStep, ValidateStep,
)
from pipepost.storage import SQLiteStorage
storage = SQLiteStorage(db_path="my_project.db")
my_flow = Flow(
name="my-pipeline",
steps=[
DeduplicationStep(storage=storage),
ScoutStep(max_candidates=20),
ScoringStep(niche="tech", max_score_candidates=5),
FetchStep(max_chars=15000),
TranslateStep(model="deepseek/deepseek-chat", target_lang="ru"),
AdaptStep(style="telegram"),
ValidateStep(min_content_len=500),
FanoutPublishStep(destination_names=["webhook", "telegram", "markdown"]),
PostPublishStep(storage=storage),
],
)
Use Cases
Cooking & Food
sources:
- name: food-news
type: reddit
subreddits: [cooking, recipes, AskCulinary]
- name: food-search
type: search
queries:
- "new restaurant trends 2026"
- "seasonal recipes spring"
Travel & Adventure
sources:
- name: travel-news
type: search
queries:
- "best travel destinations 2026"
- "budget travel tips Europe"
- "digital nomad guides"
Finance & Investing
sources:
- name: finance-news
type: reddit
subreddits: [personalfinance, investing]
- name: finance-search
type: search
queries:
- "stock market analysis today"
- "personal finance strategies"
Health & Science
sources:
- name: health-news
type: search
queries:
- "health research breakthroughs"
- "nutrition science news"
- "mental health studies"
Tech & Programming
sources:
- name: tech-news
type: search
queries:
- "latest AI research papers"
- "open source projects trending"
Sports & Fitness
sources:
- name: sports-news
type: reddit
subreddits: [sports, fitness, running]
- name: sports-search
type: search
queries:
- "sports highlights this week"
- "fitness training programs"
Sources
| Source | Type | Description |
|---|---|---|
hackernews |
API | Top stories from Hacker News (Firebase API) |
reddit |
API | Top posts from configurable subreddits |
rss |
RSS/Atom | Any RSS or Atom feed URL |
search |
DuckDuckGo | Keyword-based article search |
Destinations
| Destination | Description |
|---|---|
webhook |
POST to any URL (WordPress REST API, Ghost, custom) |
markdown |
Save as .md files with YAML frontmatter |
telegram |
Post to Telegram channels/chats via Bot API |
openclaw |
Route through OpenClaw to 23+ messaging platforms |
Steps
| Step | Description |
|---|---|
dedup |
Load published URLs from SQLite to prevent re-processing |
scout |
Fetch candidates from a source (HN, Reddit, RSS, search) |
score |
LLM-based candidate ranking by relevance, originality, engagement |
fetch |
Download article, extract content as markdown, get og:image |
translate |
Translate via LLM (LiteLLM — supports 100+ models) |
adapt |
Adapt content style: blog, telegram, newsletter, or thread |
validate |
Check translation quality (length, ratio, required fields) |
publish |
Send to a single configured destination |
fanout_publish |
Publish to multiple destinations concurrently |
post_publish |
Persist published URL to SQLite for future deduplication |
Configuration
# pipepost.yaml
sources:
- name: hackernews
min_score: 100
- name: my-blog
type: rss
url: https://example.com/feed.xml
- name: daily-search
type: search
queries:
- "latest news in your niche"
- "trending articles today"
destination:
type: webhook
url: https://myblog.com/api/posts/auto-publish
headers:
Authorization: "Bearer your-token"
translate:
model: deepseek/deepseek-chat
target_lang: ru
min_ratio: 0.8
Config-Driven Flows
Define your entire pipeline in YAML -- no Python needed:
flow:
steps: [dedup, scout, score, fetch, translate, adapt, validate, publish, post_publish]
on_error: stop
score:
niche: tech
adapt:
style: telegram
publish:
destination_name: webhook
storage:
db_path: my_project.db
Run with: pipepost run --config pipepost.yaml --source hackernews
See examples/pipepost.yaml for a complete configuration example.
Adding a Custom Source
Create a single file — PipePost auto-discovers it:
# pipepost/sources/my_source.py
from pipepost.sources.base import Source
from pipepost.core.context import Candidate
from pipepost.core.registry import register_source
class MySource(Source):
name = "my-source"
source_type = "api"
async def fetch_candidates(self, limit: int = 10) -> list[Candidate]:
# Your logic here
return [Candidate(url="https://...", title="...", source_name=self.name)]
register_source("my-source", MySource())
Adding a Custom Destination
# pipepost/destinations/my_cms.py
from pipepost.destinations.base import Destination
from pipepost.core.context import PublishResult, TranslatedArticle
from pipepost.core.registry import register_destination
class MyCMSDestination(Destination):
name = "my-cms"
async def publish(self, article: TranslatedArticle) -> PublishResult:
# Your CMS API logic here
return PublishResult(success=True, slug="article-slug")
register_destination("my-cms", MyCMSDestination())
Telegram Bot
PipePost includes an interactive Telegram bot for human-in-the-loop content curation:
export TELEGRAM_BOT_TOKEN=your-bot-token
pipepost bot --source hackernews --lang ru
sequenceDiagram
participant U as You (Telegram)
participant B as PipePost Bot
participant S as Source (HN/Reddit)
participant L as LLM (DeepSeek/GPT)
participant D as Destination
U->>B: /scout
B->>S: fetch_candidates(limit=5)
S-->>B: 5 articles
B->>U: Article 1: "..." [Publish] [Skip]
B->>U: Article 2: "..." [Publish] [Skip]
U->>B: tap [Publish] on Article 1
B->>B: fetch full content
B->>L: translate to Russian
L-->>B: translated article
B->>D: publish
D-->>B: slug: my-article
B->>U: Published: my-article
How it works:
- Send
/scoutto the bot - Bot fetches candidates and shows them with inline buttons
- Tap Publish — bot runs the full pipeline (fetch → translate → validate → publish)
- Tap Skip — bot moves to the next candidate
Telegram as a destination (automated, no approval needed):
destination:
type: telegram
bot_token: "your-bot-token"
chat_id: "@your_channel"
OpenClaw Integration
PipePost integrates with OpenClaw -- a self-hosted AI assistant platform with 23+ messaging channels.
graph LR
PP[PipePost] -->|publish| OC[OpenClaw Gateway]
OC --> TG[Telegram]
OC --> SL[Slack]
OC --> DC[Discord]
OC --> WA[WhatsApp]
OC --> SG[Signal]
OC --> MS[Teams]
OC --> ETC[...20+ more]
style PP fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
style OC fill:#533483,stroke:#16213e,color:#e0e0e0
As a destination -- publish through OpenClaw to all connected channels:
destination:
type: openclaw
gateway_url: "ws://127.0.0.1:18789"
session_id: "my-session"
channels: ["telegram", "slack", "discord"]
As an OpenClaw skill — see examples/openclaw-skill/SKILL.md for a ready-to-use skill that lets OpenClaw agents curate content via PipePost.
Supported LLM Models
PipePost uses LiteLLM for translation, supporting 100+ models:
- DeepSeek —
deepseek/deepseek-chat,deepseek/deepseek-reasoner - OpenAI —
gpt-4o,gpt-4o-mini - Anthropic —
claude-sonnet-4-20250514,claude-haiku-4-20250414 - Google —
gemini/gemini-2.0-flash - Local —
ollama/llama3.1, any Ollama model
Set via PIPEPOST_MODEL env var or in YAML config.
Docker
# Build and run
docker compose up -d
# Or build manually
docker build -t pipepost .
docker run -v ./pipepost.yaml:/app/config/pipepost.yaml pipepost run default
Development
git clone https://github.com/DenSul/pipepost
cd pipepost
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,metrics]"
# Lint
ruff check pipepost/
# Type check
mypy --strict pipepost/
# Test
pytest tests/
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for guidelines on how to get started.
In short: fork, branch, make your changes, run ruff check, mypy --strict, and pytest, then open a PR.
License
AGPL-3.0 -- Free to use, modify, and self-host. If you offer PipePost as a hosted service, you must open-source your modifications.
Built by Denis Sultanov
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipepost-0.1.0.tar.gz.
File metadata
- Download URL: pipepost-0.1.0.tar.gz
- Upload date:
- Size: 73.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4a9e64ec730af27406d58a5cd63286bf137125e4da31cbb4f0f8fbf41669ffd
|
|
| MD5 |
8d81e95ba387cd298673473ba24ae6f3
|
|
| BLAKE2b-256 |
da198d030b837173d747486178752c6eaf96bdb43fc2b17cc3d887af973e15da
|
File details
Details for the file pipepost-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pipepost-0.1.0-py3-none-any.whl
- Upload date:
- Size: 54.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8da14d147db22d83a909c573ccad221b040416a5ec89b8c22e017e34716ef15
|
|
| MD5 |
afe98adecf2495d978e0b9f4f046b933
|
|
| BLAKE2b-256 |
3ec744e56175ddc6730a2bb0d69eec95af8b7b9f79564e7f07505d2dd51be755
|