A toolkit for collecting, storing, and analyzing multiple blogs with RSS feed and web crawler support

These details have not been verified by PyPI

Project links

Project description

Blog Toolkit

A comprehensive Python toolkit for collecting, storing, and analyzing multiple blogs with RSS feed and web crawler support.

Features

Multiple Collection Methods: Automatically detect and use RSS feeds, or fall back to web crawling
Comprehensive Analysis: Temporal patterns, content metrics, topic analysis, and sentiment analysis
Cross-Blog Comparison: Compare metrics across blogs by the same author or different authors
CLI Interface: Full-featured command-line interface for all operations
Web Dashboard: Interactive web interface with charts and visualizations
SQLite Storage: Local database for storing all blog data and metadata

Installation

This project uses UV for package management.

# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup the project
cd blog-toolkit
uv sync

Quick Pull (One-Off, No Install)

Pull blog posts from any URL directly to a file—no database or setup:

uvx blog-toolkit pull https://example.substack.com -o ./posts.json

Requires uv. Output formats: --format json (default) or --format csv. Specify -o for output file or directory.

Quick Start

Using the CLI

# Add a blog (auto-detects RSS or uses crawler)
uv run blog-toolkit add https://example.com/blog

# Add a blog with specific method
uv run blog-toolkit add https://example.com/blog --method rss

# List all blogs
uv run blog-toolkit list

# Update a blog (collect new posts)
uv run blog-toolkit update --blog-id 1

# Update all blogs
uv run blog-toolkit update --all

# Analyze a blog
uv run blog-toolkit analyze --blog-id 1

# Analyze all blogs by an author
uv run blog-toolkit analyze --author "John Doe"

# Compare two blogs
uv run blog-toolkit compare 1 2

# Export data
uv run blog-toolkit export --format json --output data.json

Using the Web Dashboard

# Start the web server
uv run python -m blog_toolkit.web.app

# Or use the Flask CLI
uv run flask --app blog_toolkit.web.app run

Then open your browser to http://127.0.0.1:5000

Project Structure

blog-toolkit/
├── src/
│   └── blog_toolkit/
│       ├── config.py       # Configuration management
│       ├── database.py     # SQLite database models
│       ├── feeds.py        # RSS/Atom feed parser
│       ├── crawler.py      # Web crawler
│       ├── collector.py    # Unified collection interface
│       ├── analyzer.py     # Analysis engine
│       ├── cli.py          # CLI interface
│       └── web/            # Web dashboard
│           ├── app.py      # Flask application
│           └── templates/  # HTML templates
├── tests/                  # Test files
├── data/                   # Database storage (gitignored)
└── pyproject.toml          # Project configuration

Configuration

Copy .env.example to .env and customize settings:

cp .env.example .env

Key settings:

BLOG_TOOLKIT_DB: Database file path (default: data/blogs.db)
CRAWLER_MAX_DEPTH: Maximum crawl depth (default: 10)
REQUEST_TIMEOUT: HTTP request timeout in seconds (default: 30)
WEB_PORT: Web dashboard port (default: 5000)

Analysis Features

Temporal Analysis

Posting frequency (daily/weekly/monthly)
Posting patterns (time of day, day of week)
Gaps between posts
Date range analysis

Content Analysis

Word count distribution and trends
Reading time calculations
Content length over time

Topic Analysis

Keyword extraction
Tag and category distribution
Top keywords identification

Sentiment Analysis

Overall sentiment (positive/neutral/negative)
Per-post sentiment scores
Sentiment trends over time

Database Schema

blogs: Blog metadata (name, URL, feed URL, author, collection method)
posts: Individual blog posts (title, content, metadata, word count, etc.)
analyses: Cached analysis results for performance

CLI Commands

add <url> - Add a new blog
update [--blog-id <id>] [--all] - Update blog(s)
analyze [--blog-id <id>] [--author <name>] - Run analysis
list - List all blogs
show <blog-id> - Show blog details
compare <blog-id1> <blog-id2> - Compare two blogs
export [--format json|csv] [--output <file>] - Export data

Web Dashboard Features

Dashboard: Overview of all blogs, recent posts, statistics
Blog Detail: Individual blog view with posts, metrics, and charts
Author View: Aggregate view of all blogs by an author
Comparison View: Side-by-side comparison of blogs
Interactive Charts: Plotly charts for trends and metrics

Documentation

Feed Extraction Workarounds — Mechanisms for pulling RSS feed data from Substack and other platforms (platform limits, JS rendering, feed discovery, content parsing). Shareable guide for developers building similar tools.

Development

# Install development dependencies
uv sync --dev

# Run tests
uv run pytest

# Format code
uv run black src/

# Type checking
uv run mypy src/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Feb 1, 2026

0.1.1

Feb 1, 2026

This version

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blog_toolkit-0.1.0.tar.gz (119.4 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

blog_toolkit-0.1.0-py3-none-any.whl (47.2 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file blog_toolkit-0.1.0.tar.gz.

File metadata

Download URL: blog_toolkit-0.1.0.tar.gz
Upload date: Feb 1, 2026
Size: 119.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for blog_toolkit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7bfb4fdad886061f70da27832b6eafd18fc6d805f3b8148ac23883b371e9af5c`
MD5	`3a4cf2e50107286f38273260f6877d7b`
BLAKE2b-256	`e52ab05bde2acc0e688b172ac3205790b8004eb5cd29b86960193e7637219bc7`

See more details on using hashes here.

File details

Details for the file blog_toolkit-0.1.0-py3-none-any.whl.

File metadata

Download URL: blog_toolkit-0.1.0-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 47.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for blog_toolkit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c769091ea84a46cedda9c54b428b42c04e5832e7debb84d2ddf39620ea605d3`
MD5	`be5176d8dc4fb7150bd15b76a73837c8`
BLAKE2b-256	`700edf8d36ccc544569cd3fcf7467d03dc95de14c911ab58bd5b605272a6da54`

See more details on using hashes here.

blog-toolkit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Blog Toolkit

Features

Installation

Quick Pull (One-Off, No Install)

Quick Start

Using the CLI

Using the Web Dashboard

Project Structure

Configuration

Analysis Features

Temporal Analysis

Content Analysis

Topic Analysis

Sentiment Analysis

Database Schema

CLI Commands

Web Dashboard Features

Documentation

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes