Open-source AI content curation pipeline — scout, translate, publish

These details have not been verified by PyPI

Project links

Project description

         _                          _
 _ __ (_)_ __   ___ _ __   ___  ___| |_
| '_ \| | '_ \ / _ \ '_ \ / _ \/ __| __|
| |_) | | |_) |  __/ |_) | (_) \__ \ |_
| .__/|_| .__/ \___| .__/ \___/|___/\__|
|_|     |_|        |_|

PipePost

Open-source AI content curation pipeline -- scout, translate, and publish articles from any domain automatically.

  HackerNews ─┐                                              ┌─ Blog (webhook)
  Reddit     ─┤   ┌───────┐   ┌──────────┐   ┌──────────┐   ├─ Telegram channel
  RSS/Atom   ─┼──>│ Scout ├──>│Translate ├──>│ Publish  ├──>├─ Markdown files
  DuckDuckGo ─┤   │ + Score│   │ + Adapt  │   │ + Fanout │   ├─ OpenClaw (23+ channels)
  Custom     ─┘   └───────┘   └──────────┘   └──────────┘   └─ Custom destination
                    AI ranks      AI translates    Publishes to
                    best articles  & adapts style   multiple targets

PipePost discovers articles from sources like HackerNews, Reddit, RSS feeds, and search engines, translates them to your target language using AI, and publishes to your blog or CMS. Works for any niche -- tech, business, health, lifestyle, and more.

Features
Quick Start
Architecture
Use Cases
Sources
Destinations
Steps
Configuration
Telegram Bot
OpenClaw Integration
Adding a Custom Source
Adding a Custom Destination
Supported LLM Models
Docker
Development
Contributing
License

Features

📡 Multiple Sources — HackerNews, Reddit, RSS/Atom, DuckDuckGo search
🌍 AI Translation — Full paragraph-by-paragraph translation via any LLM (DeepSeek, Claude, GPT, Qwen, etc.)
📝 Multiple Destinations — Webhook, Markdown, Telegram, OpenClaw (23+ channels)
🤖 Telegram Bot — Interactive curation: scout candidates, approve/reject via inline buttons
🎯 Smart Scoring — LLM-based candidate ranking by relevance, originality, and engagement
✍️ Style Adaptation — Adapt content for blog, Telegram, newsletter, or Twitter thread
📢 Fanout Publish — Publish to multiple destinations simultaneously
📦 Batch Mode — Process multiple articles in one run (--batch -n 5)
🔄 Composable Flows — Chain steps: dedup → scout → score → fetch → translate → adapt → publish
💾 Deduplication — SQLite-backed persistence prevents re-publishing across runs
📊 Prometheus Metrics — Pipeline runs, step durations, error counters (optional)
⚙️ Config-Driven Flows — Define entire pipelines in YAML without writing Python
🧩 Plugin Architecture — Add sources and destinations with a single file
🐳 Docker Ready — docker compose up and go

Quick Start

# Install from PyPI
pip install pipepost

# Or from source
git clone https://github.com/DenSul/pipepost && cd pipepost
pip install -e .

# Configure
export PIPEPOST_MODEL=deepseek/deepseek-chat
export DEEPSEEK_API_KEY=your-key  # or OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.

# List available components
pipepost sources
pipepost destinations
pipepost flows

# Run a pipeline flow
pipepost run default --source hackernews --dest webhook --lang ru

# Preview without publishing (dry run)
pipepost run default --source hackernews --dry-run

# Batch mode — process multiple articles
pipepost run default --source hackernews --batch -n 5

# Use a config file
pipepost run --config pipepost.yaml --source hackernews

# Run interactive Telegram bot
export TELEGRAM_BOT_TOKEN=your-bot-token
pipepost bot --source hackernews --lang ru

# Check health
pipepost health

Example batch output:

$ pipepost run default --source hackernews --batch -n 3 --lang ru

Batch: processed 3 article(s)
  [1] Восемь лет желания, три месяца работы с ИИ | 2026-04-05-vosem-let-zhelaniya | ok
  [2] Финская сауна усиливает иммунный ответ    | 2026-04-05-finskaya-sauna       | ok
  [3] Утечка email-адресов в BrowserStack        | 2026-04-05-utechka-email        | ok

Architecture

graph LR
    subgraph Sources
        HN[HackerNews]
        RD[Reddit]
        RSS[RSS/Atom]
        DDG[DuckDuckGo]
    end

    subgraph Pipeline
        Dedup[Dedup<br><i>SQLite</i>]
        Scout[Scout<br><i>fetch candidates</i>]
        Score[Score<br><i>LLM ranking</i>]
        Fetch[Fetch<br><i>download article</i>]
        Translate[Translate<br><i>LLM translation</i>]
        Adapt[Adapt<br><i>style: blog/tg/thread</i>]
        Validate[Validate<br><i>quality check</i>]
    end

    subgraph Destinations
        WH[Webhook / CMS]
        MD[Markdown]
        TG[Telegram]
        OC[OpenClaw<br><i>23+ channels</i>]
    end

    HN & RD & RSS & DDG --> Dedup --> Scout --> Score --> Fetch --> Translate --> Adapt --> Validate
    Validate --> WH & MD & TG & OC

    style Pipeline fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
    style Sources fill:#0f3460,stroke:#16213e,color:#e0e0e0
    style Destinations fill:#533483,stroke:#16213e,color:#e0e0e0

Every step is independent and composable. The default flow runs end-to-end: loads published URLs from SQLite, scouts candidates, fetches content, translates via LLM, validates quality, publishes, and persists the URL to avoid duplicates.

Create custom flows by chaining steps:

from pipepost.core import Flow
from pipepost.steps import (
    AdaptStep, DeduplicationStep, FanoutPublishStep, FetchStep,
    PostPublishStep, ScoutStep, ScoringStep, TranslateStep, ValidateStep,
)
from pipepost.storage import SQLiteStorage

storage = SQLiteStorage(db_path="my_project.db")

my_flow = Flow(
    name="my-pipeline",
    steps=[
        DeduplicationStep(storage=storage),
        ScoutStep(max_candidates=20),
        ScoringStep(niche="tech", max_score_candidates=5),
        FetchStep(max_chars=15000),
        TranslateStep(model="deepseek/deepseek-chat", target_lang="ru"),
        AdaptStep(style="telegram"),
        ValidateStep(min_content_len=500),
        FanoutPublishStep(destination_names=["webhook", "telegram", "markdown"]),
        PostPublishStep(storage=storage),
    ],
)

Use Cases

Cooking & Food

sources:
  - name: food-news
    type: reddit
    subreddits: [cooking, recipes, AskCulinary]
  - name: food-search
    type: search
    queries:
      - "new restaurant trends 2026"
      - "seasonal recipes spring"

Travel & Adventure

sources:
  - name: travel-news
    type: search
    queries:
      - "best travel destinations 2026"
      - "budget travel tips Europe"
      - "digital nomad guides"

Finance & Investing

sources:
  - name: finance-news
    type: reddit
    subreddits: [personalfinance, investing]
  - name: finance-search
    type: search
    queries:
      - "stock market analysis today"
      - "personal finance strategies"

Health & Science

sources:
  - name: health-news
    type: search
    queries:
      - "health research breakthroughs"
      - "nutrition science news"
      - "mental health studies"

Tech & Programming

sources:
  - name: tech-news
    type: search
    queries:
      - "latest AI research papers"
      - "open source projects trending"

Sports & Fitness

sources:
  - name: sports-news
    type: reddit
    subreddits: [sports, fitness, running]
  - name: sports-search
    type: search
    queries:
      - "sports highlights this week"
      - "fitness training programs"

Sources

Source	Type	Description
`hackernews`	API	Top stories from Hacker News (Firebase API)
`reddit`	API	Top posts from configurable subreddits
`rss`	RSS/Atom	Any RSS or Atom feed URL
`search`	DuckDuckGo	Keyword-based article search

Destinations

Destination	Description
`webhook`	POST to any URL (WordPress REST API, Ghost, custom)
`markdown`	Save as `.md` files with YAML frontmatter
`telegram`	Post to Telegram channels/chats via Bot API
`openclaw`	Route through OpenClaw to 23+ messaging platforms

Steps

Step	Description
`dedup`	Load published URLs from SQLite to prevent re-processing
`scout`	Fetch candidates from a source (HN, Reddit, RSS, search)
`score`	LLM-based candidate ranking by relevance, originality, engagement
`fetch`	Download article, extract content as markdown, get og:image
`translate`	Translate via LLM (LiteLLM — supports 100+ models)
`adapt`	Adapt content style: blog, telegram, newsletter, or thread
`validate`	Check translation quality (length, ratio, required fields)
`publish`	Send to a single configured destination
`fanout_publish`	Publish to multiple destinations concurrently
`post_publish`	Persist published URL to SQLite for future deduplication

Configuration

# pipepost.yaml
sources:
  - name: hackernews
    min_score: 100
  - name: my-blog
    type: rss
    url: https://example.com/feed.xml
  - name: daily-search
    type: search
    queries:
      - "latest news in your niche"
      - "trending articles today"

destination:
  type: webhook
  url: https://myblog.com/api/posts/auto-publish
  headers:
    Authorization: "Bearer your-token"

translate:
  model: deepseek/deepseek-chat
  target_lang: ru
  min_ratio: 0.8

Config-Driven Flows

Define your entire pipeline in YAML -- no Python needed:

flow:
  steps: [dedup, scout, score, fetch, translate, adapt, validate, publish, post_publish]
  on_error: stop
  score:
    niche: tech
  adapt:
    style: telegram
  publish:
    destination_name: webhook
  storage:
    db_path: my_project.db

Run with: pipepost run --config pipepost.yaml --source hackernews

See examples/pipepost.yaml for a complete configuration example.

Adding a Custom Source

Create a single file — PipePost auto-discovers it:

# pipepost/sources/my_source.py
from pipepost.sources.base import Source
from pipepost.core.context import Candidate
from pipepost.core.registry import register_source


class MySource(Source):
    name = "my-source"
    source_type = "api"

    async def fetch_candidates(self, limit: int = 10) -> list[Candidate]:
        # Your logic here
        return [Candidate(url="https://...", title="...", source_name=self.name)]


register_source("my-source", MySource())

Adding a Custom Destination

# pipepost/destinations/my_cms.py
from pipepost.destinations.base import Destination
from pipepost.core.context import PublishResult, TranslatedArticle
from pipepost.core.registry import register_destination


class MyCMSDestination(Destination):
    name = "my-cms"

    async def publish(self, article: TranslatedArticle) -> PublishResult:
        # Your CMS API logic here
        return PublishResult(success=True, slug="article-slug")


register_destination("my-cms", MyCMSDestination())

Telegram Bot

PipePost includes an interactive Telegram bot for human-in-the-loop content curation:

export TELEGRAM_BOT_TOKEN=your-bot-token
pipepost bot --source hackernews --lang ru

sequenceDiagram
    participant U as You (Telegram)
    participant B as PipePost Bot
    participant S as Source (HN/Reddit)
    participant L as LLM (DeepSeek/GPT)
    participant D as Destination

    U->>B: /scout
    B->>S: fetch_candidates(limit=5)
    S-->>B: 5 articles
    B->>U: Article 1: "..." [Publish] [Skip]
    B->>U: Article 2: "..." [Publish] [Skip]
    U->>B: tap [Publish] on Article 1
    B->>B: fetch full content
    B->>L: translate to Russian
    L-->>B: translated article
    B->>D: publish
    D-->>B: slug: my-article
    B->>U: Published: my-article

How it works:

Send /scout to the bot
Bot fetches candidates and shows them with inline buttons
Tap Publish — bot runs the full pipeline (fetch → translate → validate → publish)
Tap Skip — bot moves to the next candidate

Telegram as a destination (automated, no approval needed):

destination:
  type: telegram
  bot_token: "your-bot-token"
  chat_id: "@your_channel"

OpenClaw Integration

PipePost integrates with OpenClaw -- a self-hosted AI assistant platform with 23+ messaging channels.

graph LR
    PP[PipePost] -->|publish| OC[OpenClaw Gateway]
    OC --> TG[Telegram]
    OC --> SL[Slack]
    OC --> DC[Discord]
    OC --> WA[WhatsApp]
    OC --> SG[Signal]
    OC --> MS[Teams]
    OC --> ETC[...20+ more]

    style PP fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
    style OC fill:#533483,stroke:#16213e,color:#e0e0e0

As a destination -- publish through OpenClaw to all connected channels:

destination:
  type: openclaw
  gateway_url: "ws://127.0.0.1:18789"
  session_id: "my-session"
  channels: ["telegram", "slack", "discord"]

As an OpenClaw skill — see examples/openclaw-skill/SKILL.md for a ready-to-use skill that lets OpenClaw agents curate content via PipePost.

Supported LLM Models

PipePost uses LiteLLM for translation, supporting 100+ models:

DeepSeek — deepseek/deepseek-chat, deepseek/deepseek-reasoner
OpenAI — gpt-4o, gpt-4o-mini
Anthropic — claude-sonnet-4-20250514, claude-haiku-4-20250414
Google — gemini/gemini-2.0-flash
Local — ollama/llama3.1, any Ollama model

Set via PIPEPOST_MODEL env var or in YAML config.

Docker

# Build and run
docker compose up -d

# Or build manually
docker build -t pipepost .
docker run -v ./pipepost.yaml:/app/config/pipepost.yaml pipepost run default

Development

git clone https://github.com/DenSul/pipepost
cd pipepost
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,metrics]"

# Lint
ruff check pipepost/

# Type check
mypy --strict pipepost/

# Test
pytest tests/

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines on how to get started.

In short: fork, branch, make your changes, run ruff check, mypy --strict, and pytest, then open a PR.

License

AGPL-3.0 -- Free to use, modify, and self-host. If you offer PipePost as a hosted service, you must open-source your modifications.

Built by Denis Sultanov

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipepost-0.1.0.tar.gz (73.1 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pipepost-0.1.0-py3-none-any.whl (54.1 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file pipepost-0.1.0.tar.gz.

File metadata

Download URL: pipepost-0.1.0.tar.gz
Upload date: Apr 5, 2026
Size: 73.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pipepost-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b4a9e64ec730af27406d58a5cd63286bf137125e4da31cbb4f0f8fbf41669ffd`
MD5	`8d81e95ba387cd298673473ba24ae6f3`
BLAKE2b-256	`da198d030b837173d747486178752c6eaf96bdb43fc2b17cc3d887af973e15da`

See more details on using hashes here.

File details

Details for the file pipepost-0.1.0-py3-none-any.whl.

File metadata

Download URL: pipepost-0.1.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 54.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pipepost-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8da14d147db22d83a909c573ccad221b040416a5ec89b8c22e017e34716ef15`
MD5	`afe98adecf2495d978e0b9f4f046b933`
BLAKE2b-256	`3ec744e56175ddc6730a2bb0d69eec95af8b7b9f79564e7f07505d2dd51be755`

See more details on using hashes here.

pipepost 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PipePost

Table of Contents

Features

Quick Start

Architecture

Use Cases

Cooking & Food

Travel & Adventure

Finance & Investing

Health & Science

Tech & Programming

Sports & Fitness

Sources

Destinations

Steps

Configuration

Config-Driven Flows

Adding a Custom Source

Adding a Custom Destination

Telegram Bot

OpenClaw Integration

Supported LLM Models

Docker

Development

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes