Skip to main content

AI blog generator with 7-pass pipeline, multi-LLM support, pluggable backends, humanizer, SEO analysis, and AI detection.

Project description

blog-pipeline

AI blog generator that doesn't sound like AI.

7-pass pipeline with multi-LLM support (Anthropic, OpenAI, LiteLLM), pluggable storage backends (filesystem, Supabase, PostgreSQL, WordPress, Notion, Contentful), a configurable humanizer that strips AI writing tells, SEO analysis, AI content detection scoring, and a quality audit gate.


Install

pip install blog-pipeline

With optional providers/backends:

pip install "blog-pipeline[openai]"         # OpenAI support
pip install "blog-pipeline[litellm]"        # LiteLLM (any provider)
pip install "blog-pipeline[postgres]"       # PostgreSQL backend
pip install "blog-pipeline[all]"            # everything

From source:

git clone https://github.com/nometria/blog-pipeline
cd blog-pipeline
pip install -e ".[dev]"

Quick Start

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Generate 5 blog posts (writes to ./blogs/)
blog-generate --count 5 --niche "developer tooling and SaaS"

# Re-humanize existing drafts only
blog-generate --passes 4

# Full pipeline with audit gate
blog-generate --passes 1-7 --audit --audit-threshold 60

# Run tests
pytest tests/ -v

Pipeline Passes

Pass What it does
0 Fetch existing titles from backend (prevents duplicates)
1 Identify new topics (skips anything already written)
2 Plan structure per topic (comparison / deep-dive / case-study / how-to / opinion)
3 Generate full markdown content
4 Humanizer pass with AI detection scoring (before/after)
5 Add internal links across all posts
6 Push to configured backend + update local registry
7 Audit gate: score posts, reject weak ones (optional, --audit)

LLM Providers

Set LLM_PROVIDER and (optionally) LLM_MODEL:

Provider Env var Default model Package
anthropic (default) ANTHROPIC_API_KEY claude-opus-4-5 included
openai OPENAI_API_KEY gpt-4o pip install "blog-pipeline[openai]"
litellm varies by model claude-opus-4-5 pip install "blog-pipeline[litellm]"
# Use OpenAI instead of Anthropic
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export LLM_MODEL=gpt-4o
blog-generate --count 3

Use the LLM abstraction in your own code:

from blog_pipeline import ask_llm
response = ask_llm("Explain Docker in 3 sentences", system="Be concise")

Storage Backends

Set BLOG_BACKEND to choose where posts are stored:

Backend Env var Extra deps Description
filesystem (default) BLOGS_DIR none Markdown files + _metadata.json
supabase SUPABASE_URL, SUPABASE_SERVICE_KEY none PostgREST API via urllib
postgres POSTGRES_DSN psycopg2 Direct PostgreSQL connection
wordpress WP_URL, WP_USER, WP_APP_PASSWORD none WP REST API via urllib
notion NOTION_API_KEY, NOTION_DATABASE_ID none Notion API via urllib
contentful CONTENTFUL_SPACE_ID, CONTENTFUL_MGMT_TOKEN none Contentful Management API
# Push to WordPress
export BLOG_BACKEND=wordpress
export WP_URL=https://myblog.com
export WP_USER=admin
export WP_APP_PASSWORD=xxxx-xxxx-xxxx-xxxx
blog-generate --passes 1-6 --count 3

Use backends programmatically:

from blog_pipeline import get_backend
backend = get_backend("filesystem")   # or "supabase", "wordpress", etc.
backend.push_post({"title": "Hello", "content": "# Hello\n\nWorld.", "published": True})
titles = backend.fetch_titles()

The Humanizer

The humanizer enforces strict rules to remove AI writing tells. Rules are configurable via YAML.

Default rules include

  • 50+ banned words (leverage, seamless, robust, delve, paradigm, etc.)
  • 17+ banned phrases ("in conclusion", "it's worth noting", "dive deep into")
  • 12+ flagged sentence starters (Furthermore, Moreover, Additionally)
  • No em-dashes, no semicolons connecting sentences, no emojis
  • Contractions required (it's, we're, don't)
  • Active voice only
  • Max 1 exclamation mark per post
  • Paragraph opening variety enforcement

Customize rules

Create a humanizer_rules.yml in your project root or set HUMANIZER_RULES:

banned_words:
  - "leverage"
  - "synergy"
  - "my-custom-banned-word"
max_exclamations: 2
require_contractions: true

Standalone usage

from blog_pipeline import humanize_post, check_banned_words

clean = humanize_post(my_ai_draft)
issues = check_banned_words(clean)

With AI detection scoring

from blog_pipeline.humanizer import humanize_post_scored

result = humanize_post_scored(my_draft)
print(f"AI score: {result['ai_score_before']:.2f} -> {result['ai_score_after']:.2f}")
print(f"Improvement: {result['improvement']:.2f}")
print(result["content"])

AI Detection

Heuristic-based AI content detector. Pure Python, no external API calls.

Heuristic Weight
Banned word density 25%
Sentence uniformity 20%
Paragraph opening variety 15%
Passive voice ratio 15%
Sentence length variance 10%
Em-dash density 10%
Exclamation density 5%
from blog_pipeline import score_ai

result = score_ai(content)
print(f"AI score: {result['ai_score']:.2f}")  # 0.0 = human, 1.0 = AI
for flag in result["flags"]:
    print(f"  - {flag}")

SEO Analysis

Built-in SEO scoring with Flesch-Kincaid readability (pure Python syllable counting).

from blog_pipeline import score_seo, calculate_readability

seo = score_seo(content, primary_keyword="deploy")
print(f"SEO score: {seo['seo_score']}/100")

readability = calculate_readability(content)
print(f"Grade level: {readability['flesch_kincaid_grade']}")

SEO factors scored: word count (20pts), heading structure (15pts), keyword density (20pts), readability (15pts), internal links (10pts), meta description quality (10pts), keyword in headings (10pts).


Audit

Score existing blog posts and optionally unpublish weak ones.

# Score all blogs
blog-audit --dir blogs

# Include SEO scoring
blog-audit --seo

# Unpublish posts below threshold via backend
blog-audit --min-score 60 --unpublish

# Re-humanize weak posts
blog-audit --fix

# JSON output
blog-audit --json

Composite scoring: quality 60% + AI detection 20% + SEO 20%.

from blog_pipeline.audit import score_post, run_audit
from pathlib import Path

result = score_post(content, seo=True)
print(f"Score: {result['score']}, Grade: {result['grade']}")

results = run_audit(Path("blogs"), min_score=60, seo=True)

CLI Reference

blog-generate

blog-generate [OPTIONS]

Options:
  --passes RANGE       Pipeline passes to run (default: 1-6)
  --count N            Number of blogs to generate (default: 5)
  --niche TEXT         Topic niche (default: "developer tooling and infrastructure")
  --audit              Enable Pass 7 audit gate
  --audit-threshold N  Minimum audit score to keep a post (default: 50)

blog-audit

blog-audit [OPTIONS]

Options:
  --dir PATH           Blog directory (default: blogs)
  --min-score N        Minimum score threshold (default: 50)
  --seo                Include SEO scoring
  --unpublish          Unpublish posts below threshold via backend
  --fix                Re-humanize posts below threshold
  --json               Output as JSON

blog-humanize

blog-humanize [FILE] [OPTIONS]

Arguments:
  FILE                 Markdown file (default: stdin)

Options:
  --check-only         Only report AI tells, don't rewrite
  --in-place           Overwrite input file
  --score              Show AI detection scores

GitHub Action

Use blog-pipeline as a GitHub Action for scheduled blog generation:

- uses: nometria/blog-pipeline@v0.2
  with:
    passes: "1-7"
    count: "3"
    niche: "developer tooling"
    audit: "true"
    audit-threshold: "60"
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    BLOG_BACKEND: supabase
    SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
    SUPABASE_SERVICE_KEY: ${{ secrets.SUPABASE_SERVICE_KEY }}

See examples/blog-pipeline-action.yml for a complete workflow with weekly schedule and manual trigger.


Environment Variables

Variable Required Default Description
LLM_PROVIDER no anthropic LLM provider: anthropic, openai, litellm
LLM_MODEL no per-provider Model override
ANTHROPIC_API_KEY if anthropic Anthropic API key
OPENAI_API_KEY if openai OpenAI API key
BLOG_BACKEND no filesystem Storage backend
BLOGS_DIR no ./blogs Local blog directory
BLOG_AUTHOR no Your Team Default author name
BLOG_AUTHOR_TITLE no Engineering & Product Default author title
BLOG_AUTHOR_IMAGE no Author image URL
HUMANIZER_RULES no Path to custom rules YAML
SUPABASE_URL if supabase Supabase project URL
SUPABASE_SERVICE_KEY if supabase Supabase service key
SUPABASE_BLOGS_TABLE no blogs Supabase table name
POSTGRES_DSN if postgres PostgreSQL connection string
WP_URL if wordpress WordPress site URL
WP_USER if wordpress WordPress username
WP_APP_PASSWORD if wordpress WordPress application password
NOTION_API_KEY if notion Notion integration token
NOTION_DATABASE_ID if notion Notion database ID
CONTENTFUL_SPACE_ID if contentful Contentful space ID
CONTENTFUL_MGMT_TOKEN if contentful Contentful management token
CONTENTFUL_ENVIRONMENT no master Contentful environment

API Reference

Core

from blog_pipeline import (
    ask_llm,                # LLM abstraction (anthropic/openai/litellm)
    get_backend,            # Backend factory
    humanize_post,          # Humanize content
    check_banned_words,     # Check for AI tells
    check_ai_tells,         # Detailed AI tell analysis
    humanize_post_scored,   # Humanize with before/after AI scores
    score_ai,               # AI detection scoring
    score_seo,              # SEO scoring
    calculate_readability,  # Flesch-Kincaid readability
    check_keyword_density,  # Keyword density check
    load_rules,             # Load humanizer rules
    build_system_prompt,    # Build dynamic system prompt
    HumanizerRules,         # Rules dataclass
)

Backends

All backends implement the BlogBackend interface:

class BlogBackend:
    def fetch_titles(self, limit=500) -> list[str]: ...
    def push_post(self, post: dict) -> bool: ...
    def unpublish(self, title: str) -> bool: ...
    def list_posts(self, published_only=False) -> list[dict]: ...

Post dict shape:

{
    "title":        str,
    "content":      str,       # markdown
    "author":       str,
    "author_title": str,
    "author_image": str,
    "category":     str,
    "tags":         list[str],
    "seo_keywords": list[str],
    "cover_image":  str,
    "published":    bool,
    "created_at":   str,       # ISO-8601
}

Output Files

File Description
blogs/<slug>.md Humanized markdown blog posts
blogs/_metadata.json Filesystem backend metadata sidecar
blogs/_topics.json Topic cache (pass 1)
blogs/_plans.json Structure plans (pass 2)
blogs/_registry.json Push tracking registry (pass 6)

Development

git clone https://github.com/nometria/blog-pipeline
cd blog-pipeline
pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blog_pipeline-0.2.0.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blog_pipeline-0.2.0-py3-none-any.whl (45.8 kB view details)

Uploaded Python 3

File details

Details for the file blog_pipeline-0.2.0.tar.gz.

File metadata

  • Download URL: blog_pipeline-0.2.0.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for blog_pipeline-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ee6432664636b641e6f55c4be8c0c9745f488318a38d9ad1c9adc1295f2657a0
MD5 d0d7dde7a7de049f860d7912aeb380b0
BLAKE2b-256 3d2d314f8d96da02815f1f0309ee6300252bb639896c98ec5f269d25cfe62533

See more details on using hashes here.

File details

Details for the file blog_pipeline-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: blog_pipeline-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 45.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for blog_pipeline-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f300f77602ed3dd1658caa1c4958c72d7765f48c4febf13ffba63c7c6806ed70
MD5 af49a589fd328400d496121bfb4fc3ac
BLAKE2b-256 11a15b96cfce8692c099f7a0f797ec0f68180fe1f0111594cbb6bcf74f650f7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page