Shopify Developer Forum Analyzer
Project description
Discourse Forum Analyzer
A Python tool for collecting and analyzing discussions from Discourse-based forums using LLM-powered analysis.
Overview
This tool automates the collection of forum data from Discourse forums (which provide JSON representations of pages) and uses Claude AI to analyze discussions, identify common problems, and extract insights. While initially built to analyze Shopify's webhook forum, it works with any publicly accessible Discourse installation.
New to this tool? It's recommended to read the Glossary to understand key terminology.
Features
Data Collection
- Automated scraping via Discourse JSON endpoints
- Rate-limited HTTP client with retry logic
- Checkpoint-based recovery for interrupted operations
- Incremental updates (collect only new content)
- SQLite storage with SQLAlchemy ORM
LLM Analysis
- Problem extraction from discussion threads
- Automatic categorization by topic type
- Severity assessment (critical, high, medium, low)
- Theme identification across multiple discussions
- Natural language query interface
Reporting
- Markdown reports with statistics
- Problem theme grouping
- JSON and CSV export options
Requirements
- Python 3.10 or higher
- Anthropic API key (for LLM analysis features)
Installation
git clone https://github.com/your-repo/discourse-forum-analyzer.git
cd discourse-forum-analyzer
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
Quick Start
1. Initialize a New Project
Create a new directory for your analysis project and initialize it:
mkdir my-forum-analysis
cd my-forum-analysis
forum-analyzer init
The init command will interactively prompt you for:
- Discourse forum URL
- Category path (e.g., 't' or 'c')
- Category ID (with helpful hints; slug fetched automatically)
- Anthropic API key (optional, can be added later)
This creates a project structure:
my-forum-analysis/
├── config.yaml # Your configuration
├── forum.db # SQLite database (created on first collect)
├── checkpoints/ # Recovery checkpoints
├── exports/ # Analysis reports
└── logs/ # Application logs
2. Recommended Workflow
The recommended workflow ensures the most accurate and relevant analysis by first discovering themes from your specific data.
# 1. Collect forum data (initializes database automatically)
forum-analyzer collect
# 2. Discover natural categories from the data
forum-analyzer themes discover --min-topics 3
# 3. Analyze all topics using the discovered categories
forum-analyzer llm-analyze
# 4. Ask questions about your analysis
forum-analyzer ask "What are the main authentication issues?"
Working with Multiple Projects
You can work with multiple forum analysis projects by using the --dir flag:
# Initialize a new project in a specific directory
mkdir shopify-webhooks
forum-analyzer --dir shopify-webhooks init
# Collect data for that project
forum-analyzer --dir shopify-webhooks collect
# Or use environment variable
export FORUM_ANALYZER_DIR=./shopify-webhooks
forum-analyzer collect
Usage
All Commands
A full list of commands and their options are available below.
Project Initialization
# Initialize a new project in the current directory
forum-analyzer init
# Initialize in a specific directory
forum-analyzer --dir ./my-project init
# Overwrite existing configuration
forum-analyzer init --force
Data Collection
# Collect from the category in your config
forum-analyzer collect
# Collect from a specific category
forum-analyzer collect --category-id 25
# Collect with a page limit (for testing)
forum-analyzer collect --page-limit 2
# Collect from a different project directory
forum-analyzer --dir ./my-project collect
Incremental Updates
# Fetch only new/updated content
forum-analyzer update
Status
# View collection status and statistics
forum-analyzer status
Theme Management
# Discover common themes (minimum 3 topics per theme)
forum-analyzer themes discover
# Analyze more topics for better pattern discovery
forum-analyzer themes discover --context-limit 100
# List themes already discovered
forum-analyzer themes list
# Delete all themes (prompts for confirmation)
forum-analyzer themes clean
Topic Analysis
# Analyze all unanalyzed topics
forum-analyzer llm-analyze
# Re-analyze topics that have already been analyzed
forum-analyzer llm-analyze --force
# Analyze a specific topic by its ID
forum-analyzer llm-analyze --topic-id 66
Querying
# Ask questions about the analyzed data
forum-analyzer ask "What are the most common authentication issues?"
Maintenance
# Clear all collection checkpoints
forum-analyzer clear-checkpoints
Technical Details
Architecture
┌─────────────────────┐
│ Discourse Forum │
│ (JSON endpoints) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Rate-Limited │
│ HTTP Client │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Checkpoint │
│ Manager │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ SQLite Database │
│ (SQLAlchemy) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐ ┌──────────────┐
│ LLM Analyzer │────▶│ Claude API │
└──────────┬──────────┘ └──────────────┘
│
▼
┌─────────────────────┐
│ Reports & Themes │
└─────────────────────┘
Technology Stack
- Language: Python 3.10+
- Database: SQLite with SQLAlchemy
- HTTP: httpx (async)
- LLM: Claude API (Anthropic)
- CLI: Click
- Config: Pydantic + YAML
Project Structure
discourse-forum-analyzer/
├── src/forum_analyzer/
│ ├── analyzer/ # LLM analysis
│ ├── collector/ # Data collection
│ ├── config/
│ └── cli.py
├── config/
│ └── cli.py
├── examples/
│ └── shopify-webhooks/
└── tests/
Database Schema
The schema is managed by SQLAlchemy models and is split into three categories:
- Forum Data Tables:
categories,topics,posts,users - Analysis Tables:
llm_analysis,problem_themes - Operational Tables:
checkpoints,fetch_history
The schema auto-migrates when using LLM analysis features.
Example Application: Shopify Developer Forum
This tool was demonstrated by analyzing Shopify's webhook discussions.
- Topics: 271
- Posts: 1,201
- Users: 324
- Date Range: September 2024 - October 2025
Example analysis results:
- 15 distinct problem themes identified
- 18 critical issues found
- Top issue: Configuration challenges (25.1% of topics)
See the complete example analysis: examples/shopify-webhooks/LLM_ANALYSIS_REPORT.md
Development
Running Tests
pytest
Code Quality
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/
Troubleshooting
Rate Limiting
- Adjust
rate_limitin config.yaml (default: 1 req/sec).
Database Locked
- Only one instance can run at a time.
- Clear stale checkpoints:
forum-analyzer clear-checkpoints.
LLM Analysis Errors
- Verify your Anthropic API key is valid and has credit.
- Use the
--limitflag for testing with smaller datasets.
Contributing
- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request
License
MIT License - See LICENSE file for details.
Appendix: Glossary
Understanding the terminology used in this tool:
Discourse Forum Terms
Category
A top-level organizational unit in Discourse forums (e.g., "Webhooks & Events").
Topic
A discussion thread within a category.
Post
An individual message within a topic. The first post is the topic starter; subsequent posts are replies.
Analysis Terms
Classification The LLM-assigned type of problem or discussion in a topic (e.g., "webhook_delivery", "authentication").
Theme
A higher-level pattern grouping multiple related topics (e.g., "Webhook Delivery Failures").
Severity
The urgency/impact level assigned to a topic (critical, high, medium, low).
Workflow Terms
Collection
The process of downloading forum data (forum-analyzer collect).
Analysis
The process of using the LLM to extract insights from topics (forum-analyzer llm-analyze).
Theme Identification
The process of grouping topics into common patterns (forum-analyzer themes discover).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forum_analyzer-0.1.0.tar.gz.
File metadata
- Download URL: forum_analyzer-0.1.0.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a889e9a4a39ccfe56a54c815d82cddc9bf951a6176f14c86a3afb7f6bf544622
|
|
| MD5 |
c0d0ad73fe5fa8c6da156dbcabaf7e77
|
|
| BLAKE2b-256 |
cbcb517b5d5a775d36a0a03dc65b40beada61fbc33f71049d27548b12bbf14b4
|
File details
Details for the file forum_analyzer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: forum_analyzer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 40.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa7d8c7718ea580a32641ed944cc6d64f68d5c1725b748eb2d725f371f741b35
|
|
| MD5 |
bb1b34e8fcf82fbbaa74e67f3954e749
|
|
| BLAKE2b-256 |
66eb0b8c249f9f79284d890f293d0ba734f01118f3bed49387db5515f8f874b7
|