AI blog generator with 7-pass pipeline, multi-LLM support, pluggable backends, humanizer, SEO analysis, and AI detection.
Project description
blog-pipeline
AI blog generator that doesn't sound like AI.
7-pass pipeline with multi-LLM support (Anthropic, OpenAI, LiteLLM), pluggable storage backends (filesystem, Supabase, PostgreSQL, WordPress, Notion, Contentful), a configurable humanizer that strips AI writing tells, SEO analysis, AI content detection scoring, and a quality audit gate.
Install
pip install blog-pipeline
With optional providers/backends:
pip install "blog-pipeline[openai]" # OpenAI support
pip install "blog-pipeline[litellm]" # LiteLLM (any provider)
pip install "blog-pipeline[postgres]" # PostgreSQL backend
pip install "blog-pipeline[all]" # everything
From source:
git clone https://github.com/nometria/blog-pipeline
cd blog-pipeline
pip install -e ".[dev]"
Quick Start
# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...
# Generate 5 blog posts (writes to ./blogs/)
blog-generate --count 5 --niche "developer tooling and SaaS"
# Re-humanize existing drafts only
blog-generate --passes 4
# Full pipeline with audit gate
blog-generate --passes 1-7 --audit --audit-threshold 60
# Run tests
pytest tests/ -v
Pipeline Passes
| Pass | What it does |
|---|---|
| 0 | Fetch existing titles from backend (prevents duplicates) |
| 1 | Identify new topics (skips anything already written) |
| 2 | Plan structure per topic (comparison / deep-dive / case-study / how-to / opinion) |
| 3 | Generate full markdown content |
| 4 | Humanizer pass with AI detection scoring (before/after) |
| 5 | Add internal links across all posts |
| 6 | Push to configured backend + update local registry |
| 7 | Audit gate: score posts, reject weak ones (optional, --audit) |
LLM Providers
Set LLM_PROVIDER and (optionally) LLM_MODEL:
| Provider | Env var | Default model | Package |
|---|---|---|---|
anthropic (default) |
ANTHROPIC_API_KEY |
claude-opus-4-5 |
included |
openai |
OPENAI_API_KEY |
gpt-4o |
pip install "blog-pipeline[openai]" |
litellm |
varies by model | claude-opus-4-5 |
pip install "blog-pipeline[litellm]" |
# Use OpenAI instead of Anthropic
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export LLM_MODEL=gpt-4o
blog-generate --count 3
Use the LLM abstraction in your own code:
from blog_pipeline import ask_llm
response = ask_llm("Explain Docker in 3 sentences", system="Be concise")
Storage Backends
Set BLOG_BACKEND to choose where posts are stored:
| Backend | Env var | Extra deps | Description |
|---|---|---|---|
filesystem (default) |
BLOGS_DIR |
none | Markdown files + _metadata.json |
supabase |
SUPABASE_URL, SUPABASE_SERVICE_KEY |
none | PostgREST API via urllib |
postgres |
POSTGRES_DSN |
psycopg2 |
Direct PostgreSQL connection |
wordpress |
WP_URL, WP_USER, WP_APP_PASSWORD |
none | WP REST API via urllib |
notion |
NOTION_API_KEY, NOTION_DATABASE_ID |
none | Notion API via urllib |
contentful |
CONTENTFUL_SPACE_ID, CONTENTFUL_MGMT_TOKEN |
none | Contentful Management API |
# Push to WordPress
export BLOG_BACKEND=wordpress
export WP_URL=https://myblog.com
export WP_USER=admin
export WP_APP_PASSWORD=xxxx-xxxx-xxxx-xxxx
blog-generate --passes 1-6 --count 3
Use backends programmatically:
from blog_pipeline import get_backend
backend = get_backend("filesystem") # or "supabase", "wordpress", etc.
backend.push_post({"title": "Hello", "content": "# Hello\n\nWorld.", "published": True})
titles = backend.fetch_titles()
The Humanizer
The humanizer enforces strict rules to remove AI writing tells. Rules are configurable via YAML.
Default rules include
- 50+ banned words (leverage, seamless, robust, delve, paradigm, etc.)
- 17+ banned phrases ("in conclusion", "it's worth noting", "dive deep into")
- 12+ flagged sentence starters (Furthermore, Moreover, Additionally)
- No em-dashes, no semicolons connecting sentences, no emojis
- Contractions required (it's, we're, don't)
- Active voice only
- Max 1 exclamation mark per post
- Paragraph opening variety enforcement
Customize rules
Create a humanizer_rules.yml in your project root or set HUMANIZER_RULES:
banned_words:
- "leverage"
- "synergy"
- "my-custom-banned-word"
max_exclamations: 2
require_contractions: true
Standalone usage
from blog_pipeline import humanize_post, check_banned_words
clean = humanize_post(my_ai_draft)
issues = check_banned_words(clean)
With AI detection scoring
from blog_pipeline.humanizer import humanize_post_scored
result = humanize_post_scored(my_draft)
print(f"AI score: {result['ai_score_before']:.2f} -> {result['ai_score_after']:.2f}")
print(f"Improvement: {result['improvement']:.2f}")
print(result["content"])
AI Detection
Heuristic-based AI content detector. Pure Python, no external API calls.
| Heuristic | Weight |
|---|---|
| Banned word density | 25% |
| Sentence uniformity | 20% |
| Paragraph opening variety | 15% |
| Passive voice ratio | 15% |
| Sentence length variance | 10% |
| Em-dash density | 10% |
| Exclamation density | 5% |
from blog_pipeline import score_ai
result = score_ai(content)
print(f"AI score: {result['ai_score']:.2f}") # 0.0 = human, 1.0 = AI
for flag in result["flags"]:
print(f" - {flag}")
SEO Analysis
Built-in SEO scoring with Flesch-Kincaid readability (pure Python syllable counting).
from blog_pipeline import score_seo, calculate_readability
seo = score_seo(content, primary_keyword="deploy")
print(f"SEO score: {seo['seo_score']}/100")
readability = calculate_readability(content)
print(f"Grade level: {readability['flesch_kincaid_grade']}")
SEO factors scored: word count (20pts), heading structure (15pts), keyword density (20pts), readability (15pts), internal links (10pts), meta description quality (10pts), keyword in headings (10pts).
Audit
Score existing blog posts and optionally unpublish weak ones.
# Score all blogs
blog-audit --dir blogs
# Include SEO scoring
blog-audit --seo
# Unpublish posts below threshold via backend
blog-audit --min-score 60 --unpublish
# Re-humanize weak posts
blog-audit --fix
# JSON output
blog-audit --json
Composite scoring: quality 60% + AI detection 20% + SEO 20%.
from blog_pipeline.audit import score_post, run_audit
from pathlib import Path
result = score_post(content, seo=True)
print(f"Score: {result['score']}, Grade: {result['grade']}")
results = run_audit(Path("blogs"), min_score=60, seo=True)
CLI Reference
blog-generate
blog-generate [OPTIONS]
Options:
--passes RANGE Pipeline passes to run (default: 1-6)
--count N Number of blogs to generate (default: 5)
--niche TEXT Topic niche (default: "developer tooling and infrastructure")
--audit Enable Pass 7 audit gate
--audit-threshold N Minimum audit score to keep a post (default: 50)
blog-audit
blog-audit [OPTIONS]
Options:
--dir PATH Blog directory (default: blogs)
--min-score N Minimum score threshold (default: 50)
--seo Include SEO scoring
--unpublish Unpublish posts below threshold via backend
--fix Re-humanize posts below threshold
--json Output as JSON
blog-humanize
blog-humanize [FILE] [OPTIONS]
Arguments:
FILE Markdown file (default: stdin)
Options:
--check-only Only report AI tells, don't rewrite
--in-place Overwrite input file
--score Show AI detection scores
GitHub Action
Use blog-pipeline as a GitHub Action for scheduled blog generation:
- uses: nometria/blog-pipeline@v0.2
with:
passes: "1-7"
count: "3"
niche: "developer tooling"
audit: "true"
audit-threshold: "60"
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
BLOG_BACKEND: supabase
SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
SUPABASE_SERVICE_KEY: ${{ secrets.SUPABASE_SERVICE_KEY }}
See examples/blog-pipeline-action.yml for a complete workflow with weekly
schedule and manual trigger.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
LLM_PROVIDER |
no | anthropic |
LLM provider: anthropic, openai, litellm |
LLM_MODEL |
no | per-provider | Model override |
ANTHROPIC_API_KEY |
if anthropic | Anthropic API key | |
OPENAI_API_KEY |
if openai | OpenAI API key | |
BLOG_BACKEND |
no | filesystem |
Storage backend |
BLOGS_DIR |
no | ./blogs |
Local blog directory |
BLOG_AUTHOR |
no | Your Team |
Default author name |
BLOG_AUTHOR_TITLE |
no | Engineering & Product |
Default author title |
BLOG_AUTHOR_IMAGE |
no | Author image URL | |
HUMANIZER_RULES |
no | Path to custom rules YAML | |
SUPABASE_URL |
if supabase | Supabase project URL | |
SUPABASE_SERVICE_KEY |
if supabase | Supabase service key | |
SUPABASE_BLOGS_TABLE |
no | blogs |
Supabase table name |
POSTGRES_DSN |
if postgres | PostgreSQL connection string | |
WP_URL |
if wordpress | WordPress site URL | |
WP_USER |
if wordpress | WordPress username | |
WP_APP_PASSWORD |
if wordpress | WordPress application password | |
NOTION_API_KEY |
if notion | Notion integration token | |
NOTION_DATABASE_ID |
if notion | Notion database ID | |
CONTENTFUL_SPACE_ID |
if contentful | Contentful space ID | |
CONTENTFUL_MGMT_TOKEN |
if contentful | Contentful management token | |
CONTENTFUL_ENVIRONMENT |
no | master |
Contentful environment |
API Reference
Core
from blog_pipeline import (
ask_llm, # LLM abstraction (anthropic/openai/litellm)
get_backend, # Backend factory
humanize_post, # Humanize content
check_banned_words, # Check for AI tells
check_ai_tells, # Detailed AI tell analysis
humanize_post_scored, # Humanize with before/after AI scores
score_ai, # AI detection scoring
score_seo, # SEO scoring
calculate_readability, # Flesch-Kincaid readability
check_keyword_density, # Keyword density check
load_rules, # Load humanizer rules
build_system_prompt, # Build dynamic system prompt
HumanizerRules, # Rules dataclass
)
Backends
All backends implement the BlogBackend interface:
class BlogBackend:
def fetch_titles(self, limit=500) -> list[str]: ...
def push_post(self, post: dict) -> bool: ...
def unpublish(self, title: str) -> bool: ...
def list_posts(self, published_only=False) -> list[dict]: ...
Post dict shape:
{
"title": str,
"content": str, # markdown
"author": str,
"author_title": str,
"author_image": str,
"category": str,
"tags": list[str],
"seo_keywords": list[str],
"cover_image": str,
"published": bool,
"created_at": str, # ISO-8601
}
Output Files
| File | Description |
|---|---|
blogs/<slug>.md |
Humanized markdown blog posts |
blogs/_metadata.json |
Filesystem backend metadata sidecar |
blogs/_topics.json |
Topic cache (pass 1) |
blogs/_plans.json |
Structure plans (pass 2) |
blogs/_registry.json |
Push tracking registry (pass 6) |
Development
git clone https://github.com/nometria/blog-pipeline
cd blog-pipeline
pip install -e ".[dev]"
pytest tests/ -v
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blog_pipeline-0.2.0.tar.gz.
File metadata
- Download URL: blog_pipeline-0.2.0.tar.gz
- Upload date:
- Size: 49.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee6432664636b641e6f55c4be8c0c9745f488318a38d9ad1c9adc1295f2657a0
|
|
| MD5 |
d0d7dde7a7de049f860d7912aeb380b0
|
|
| BLAKE2b-256 |
3d2d314f8d96da02815f1f0309ee6300252bb639896c98ec5f269d25cfe62533
|
File details
Details for the file blog_pipeline-0.2.0-py3-none-any.whl.
File metadata
- Download URL: blog_pipeline-0.2.0-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f300f77602ed3dd1658caa1c4958c72d7765f48c4febf13ffba63c7c6806ed70
|
|
| MD5 |
af49a589fd328400d496121bfb4fc3ac
|
|
| BLAKE2b-256 |
11a15b96cfce8692c099f7a0f797ec0f68180fe1f0111594cbb6bcf74f650f7f
|