AI blog generator with 7-pass pipeline, multi-LLM support, pluggable backends, humanizer, SEO analysis, and AI detection.

These details have not been verified by PyPI

Project links

Repository

Project description

blog-pipeline

AI blog generator that doesn't sound like AI.

7-pass pipeline with multi-LLM support (Anthropic, OpenAI, LiteLLM), pluggable storage backends (filesystem, Supabase, PostgreSQL, WordPress, Notion, Contentful), a configurable humanizer that strips AI writing tells, SEO analysis, AI content detection scoring, and a quality audit gate.

Install

pip install blog-pipeline

With optional providers/backends:

pip install "blog-pipeline[openai]"         # OpenAI support
pip install "blog-pipeline[litellm]"        # LiteLLM (any provider)
pip install "blog-pipeline[postgres]"       # PostgreSQL backend
pip install "blog-pipeline[all]"            # everything

From source:

git clone https://github.com/nometria/blog-pipeline
cd blog-pipeline
pip install -e ".[dev]"

Quick Start

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Generate 5 blog posts (writes to ./blogs/)
blog-generate --count 5 --niche "developer tooling and SaaS"

# Re-humanize existing drafts only
blog-generate --passes 4

# Full pipeline with audit gate
blog-generate --passes 1-7 --audit --audit-threshold 60

# Run tests
pytest tests/ -v

Pipeline Passes

Pass	What it does
0	Fetch existing titles from backend (prevents duplicates)
1	Identify new topics (skips anything already written)
2	Plan structure per topic (comparison / deep-dive / case-study / how-to / opinion)
3	Generate full markdown content
4	Humanizer pass with AI detection scoring (before/after)
5	Add internal links across all posts
6	Push to configured backend + update local registry
7	Audit gate: score posts, reject weak ones (optional, `--audit`)

LLM Providers

Set LLM_PROVIDER and (optionally) LLM_MODEL:

Provider	Env var	Default model	Package
`anthropic` (default)	`ANTHROPIC_API_KEY`	`claude-opus-4-5`	included
`openai`	`OPENAI_API_KEY`	`gpt-4o`	`pip install "blog-pipeline[openai]"`
`litellm`	varies by model	`claude-opus-4-5`	`pip install "blog-pipeline[litellm]"`

# Use OpenAI instead of Anthropic
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export LLM_MODEL=gpt-4o
blog-generate --count 3

Use the LLM abstraction in your own code:

from blog_pipeline import ask_llm
response = ask_llm("Explain Docker in 3 sentences", system="Be concise")

Storage Backends

Set BLOG_BACKEND to choose where posts are stored:

Backend	Env var	Extra deps	Description
`filesystem` (default)	`BLOGS_DIR`	none	Markdown files + `_metadata.json`
`supabase`	`SUPABASE_URL`, `SUPABASE_SERVICE_KEY`	none	PostgREST API via urllib
`postgres`	`POSTGRES_DSN`	`psycopg2`	Direct PostgreSQL connection
`wordpress`	`WP_URL`, `WP_USER`, `WP_APP_PASSWORD`	none	WP REST API via urllib
`notion`	`NOTION_API_KEY`, `NOTION_DATABASE_ID`	none	Notion API via urllib
`contentful`	`CONTENTFUL_SPACE_ID`, `CONTENTFUL_MGMT_TOKEN`	none	Contentful Management API

# Push to WordPress
export BLOG_BACKEND=wordpress
export WP_URL=https://myblog.com
export WP_USER=admin
export WP_APP_PASSWORD=xxxx-xxxx-xxxx-xxxx
blog-generate --passes 1-6 --count 3

Use backends programmatically:

from blog_pipeline import get_backend
backend = get_backend("filesystem")   # or "supabase", "wordpress", etc.
backend.push_post({"title": "Hello", "content": "# Hello\n\nWorld.", "published": True})
titles = backend.fetch_titles()

The Humanizer

The humanizer enforces strict rules to remove AI writing tells. Rules are configurable via YAML.

Default rules include

50+ banned words (leverage, seamless, robust, delve, paradigm, etc.)
17+ banned phrases ("in conclusion", "it's worth noting", "dive deep into")
12+ flagged sentence starters (Furthermore, Moreover, Additionally)
No em-dashes, no semicolons connecting sentences, no emojis
Contractions required (it's, we're, don't)
Active voice only
Max 1 exclamation mark per post
Paragraph opening variety enforcement

Customize rules

Create a humanizer_rules.yml in your project root or set HUMANIZER_RULES:

banned_words:
  - "leverage"
  - "synergy"
  - "my-custom-banned-word"
max_exclamations: 2
require_contractions: true

Standalone usage

from blog_pipeline import humanize_post, check_banned_words

clean = humanize_post(my_ai_draft)
issues = check_banned_words(clean)

With AI detection scoring

from blog_pipeline.humanizer import humanize_post_scored

result = humanize_post_scored(my_draft)
print(f"AI score: {result['ai_score_before']:.2f} -> {result['ai_score_after']:.2f}")
print(f"Improvement: {result['improvement']:.2f}")
print(result["content"])

AI Detection

Heuristic-based AI content detector. Pure Python, no external API calls.

Heuristic	Weight
Banned word density	25%
Sentence uniformity	20%
Paragraph opening variety	15%
Passive voice ratio	15%
Sentence length variance	10%
Em-dash density	10%
Exclamation density	5%

from blog_pipeline import score_ai

result = score_ai(content)
print(f"AI score: {result['ai_score']:.2f}")  # 0.0 = human, 1.0 = AI
for flag in result["flags"]:
    print(f"  - {flag}")

SEO Analysis

Built-in SEO scoring with Flesch-Kincaid readability (pure Python syllable counting).

from blog_pipeline import score_seo, calculate_readability

seo = score_seo(content, primary_keyword="deploy")
print(f"SEO score: {seo['seo_score']}/100")

readability = calculate_readability(content)
print(f"Grade level: {readability['flesch_kincaid_grade']}")

SEO factors scored: word count (20pts), heading structure (15pts), keyword density (20pts), readability (15pts), internal links (10pts), meta description quality (10pts), keyword in headings (10pts).

Audit

Score existing blog posts and optionally unpublish weak ones.

# Score all blogs
blog-audit --dir blogs

# Include SEO scoring
blog-audit --seo

# Unpublish posts below threshold via backend
blog-audit --min-score 60 --unpublish

# Re-humanize weak posts
blog-audit --fix

# JSON output
blog-audit --json

Composite scoring: quality 60% + AI detection 20% + SEO 20%.

from blog_pipeline.audit import score_post, run_audit
from pathlib import Path

result = score_post(content, seo=True)
print(f"Score: {result['score']}, Grade: {result['grade']}")

results = run_audit(Path("blogs"), min_score=60, seo=True)

CLI Reference

blog-generate

blog-generate [OPTIONS]

Options:
  --passes RANGE       Pipeline passes to run (default: 1-6)
  --count N            Number of blogs to generate (default: 5)
  --niche TEXT         Topic niche (default: "developer tooling and infrastructure")
  --audit              Enable Pass 7 audit gate
  --audit-threshold N  Minimum audit score to keep a post (default: 50)

blog-audit

blog-audit [OPTIONS]

Options:
  --dir PATH           Blog directory (default: blogs)
  --min-score N        Minimum score threshold (default: 50)
  --seo                Include SEO scoring
  --unpublish          Unpublish posts below threshold via backend
  --fix                Re-humanize posts below threshold
  --json               Output as JSON

blog-humanize

blog-humanize [FILE] [OPTIONS]

Arguments:
  FILE                 Markdown file (default: stdin)

Options:
  --check-only         Only report AI tells, don't rewrite
  --in-place           Overwrite input file
  --score              Show AI detection scores

GitHub Action

Use blog-pipeline as a GitHub Action for scheduled blog generation:

- uses: nometria/blog-pipeline@v0.2
  with:
    passes: "1-7"
    count: "3"
    niche: "developer tooling"
    audit: "true"
    audit-threshold: "60"
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    BLOG_BACKEND: supabase
    SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
    SUPABASE_SERVICE_KEY: ${{ secrets.SUPABASE_SERVICE_KEY }}

See examples/blog-pipeline-action.yml for a complete workflow with weekly schedule and manual trigger.

Environment Variables

Variable	Required	Default	Description
`LLM_PROVIDER`	no	`anthropic`	LLM provider: anthropic, openai, litellm
`LLM_MODEL`	no	per-provider	Model override
`ANTHROPIC_API_KEY`	if anthropic		Anthropic API key
`OPENAI_API_KEY`	if openai		OpenAI API key
`BLOG_BACKEND`	no	`filesystem`	Storage backend
`BLOGS_DIR`	no	`./blogs`	Local blog directory
`BLOG_AUTHOR`	no	`Your Team`	Default author name
`BLOG_AUTHOR_TITLE`	no	`Engineering & Product`	Default author title
`BLOG_AUTHOR_IMAGE`	no		Author image URL
`HUMANIZER_RULES`	no		Path to custom rules YAML
`SUPABASE_URL`	if supabase		Supabase project URL
`SUPABASE_SERVICE_KEY`	if supabase		Supabase service key
`SUPABASE_BLOGS_TABLE`	no	`blogs`	Supabase table name
`POSTGRES_DSN`	if postgres		PostgreSQL connection string
`WP_URL`	if wordpress		WordPress site URL
`WP_USER`	if wordpress		WordPress username
`WP_APP_PASSWORD`	if wordpress		WordPress application password
`NOTION_API_KEY`	if notion		Notion integration token
`NOTION_DATABASE_ID`	if notion		Notion database ID
`CONTENTFUL_SPACE_ID`	if contentful		Contentful space ID
`CONTENTFUL_MGMT_TOKEN`	if contentful		Contentful management token
`CONTENTFUL_ENVIRONMENT`	no	`master`	Contentful environment

API Reference

Core

from blog_pipeline import (
    ask_llm,                # LLM abstraction (anthropic/openai/litellm)
    get_backend,            # Backend factory
    humanize_post,          # Humanize content
    check_banned_words,     # Check for AI tells
    check_ai_tells,         # Detailed AI tell analysis
    humanize_post_scored,   # Humanize with before/after AI scores
    score_ai,               # AI detection scoring
    score_seo,              # SEO scoring
    calculate_readability,  # Flesch-Kincaid readability
    check_keyword_density,  # Keyword density check
    load_rules,             # Load humanizer rules
    build_system_prompt,    # Build dynamic system prompt
    HumanizerRules,         # Rules dataclass
)

Backends

All backends implement the BlogBackend interface:

class BlogBackend:
    def fetch_titles(self, limit=500) -> list[str]: ...
    def push_post(self, post: dict) -> bool: ...
    def unpublish(self, title: str) -> bool: ...
    def list_posts(self, published_only=False) -> list[dict]: ...

Post dict shape:

{
    "title":        str,
    "content":      str,       # markdown
    "author":       str,
    "author_title": str,
    "author_image": str,
    "category":     str,
    "tags":         list[str],
    "seo_keywords": list[str],
    "cover_image":  str,
    "published":    bool,
    "created_at":   str,       # ISO-8601
}

Output Files

File	Description
`blogs/<slug>.md`	Humanized markdown blog posts
`blogs/_metadata.json`	Filesystem backend metadata sidecar
`blogs/_topics.json`	Topic cache (pass 1)
`blogs/_plans.json`	Structure plans (pass 2)
`blogs/_registry.json`	Push tracking registry (pass 6)

Development

git clone https://github.com/nometria/blog-pipeline
cd blog-pipeline
pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.2.0

Mar 27, 2026

0.1.2

Mar 26, 2026

0.1.1

Mar 23, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blog_pipeline-0.2.0.tar.gz (49.8 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

blog_pipeline-0.2.0-py3-none-any.whl (45.8 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file blog_pipeline-0.2.0.tar.gz.

File metadata

Download URL: blog_pipeline-0.2.0.tar.gz
Upload date: Mar 27, 2026
Size: 49.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for blog_pipeline-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ee6432664636b641e6f55c4be8c0c9745f488318a38d9ad1c9adc1295f2657a0`
MD5	`d0d7dde7a7de049f860d7912aeb380b0`
BLAKE2b-256	`3d2d314f8d96da02815f1f0309ee6300252bb639896c98ec5f269d25cfe62533`

See more details on using hashes here.

File details

Details for the file blog_pipeline-0.2.0-py3-none-any.whl.

File metadata

Download URL: blog_pipeline-0.2.0-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for blog_pipeline-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f300f77602ed3dd1658caa1c4958c72d7765f48c4febf13ffba63c7c6806ed70`
MD5	`af49a589fd328400d496121bfb4fc3ac`
BLAKE2b-256	`11a15b96cfce8692c099f7a0f797ec0f68180fe1f0111594cbb6bcf74f650f7f`

See more details on using hashes here.

blog-pipeline 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

blog-pipeline

Install

Quick Start

Pipeline Passes

LLM Providers

Storage Backends

The Humanizer

Default rules include

Customize rules

Standalone usage

With AI detection scoring

AI Detection

SEO Analysis

Audit

CLI Reference

blog-generate

blog-audit

blog-humanize

GitHub Action

Environment Variables

API Reference

Core

Backends

Output Files

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes