Skip to main content

Shopify Developer Forum Analyzer

Project description

Discourse Forum Analyzer

A Python tool for collecting and analyzing discussions from Discourse-based forums using LLM-powered analysis.

Overview

This tool automates the collection of forum data from Discourse forums (which provide JSON representations of pages) and uses Claude AI to analyze discussions, identify common problems, and extract insights. While initially built to analyze Shopify's webhook forum, it works with any publicly accessible Discourse installation.

New to this tool? It's recommended to read the Glossary to understand key terminology.

Features

Data Collection

  • Automated scraping via Discourse JSON endpoints
  • Rate-limited HTTP client with retry logic
  • Checkpoint-based recovery for interrupted operations
  • Incremental updates (collect only new content)
  • SQLite storage with SQLAlchemy ORM

LLM Analysis

  • Problem extraction from discussion threads
  • Automatic categorization by topic type
  • Severity assessment (critical, high, medium, low)
  • Theme identification across multiple discussions
  • Natural language query interface

Reporting

  • Markdown reports with statistics
  • Problem theme grouping
  • JSON and CSV export options

Requirements

  • Python 3.10 or higher
  • Anthropic API key (for LLM analysis features)

Installation

git clone https://github.com/your-repo/discourse-forum-analyzer.git
cd discourse-forum-analyzer

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -e .

Quick Start

1. Initialize a New Project

Create a new directory for your analysis project and initialize it:

mkdir my-forum-analysis
cd my-forum-analysis
forum-analyzer init

The init command will interactively prompt you for:

  • Discourse forum URL
  • Category path (e.g., 't' or 'c')
  • Category ID (with helpful hints; slug fetched automatically)
  • Anthropic API key (optional, can be added later)

This creates a project structure:

my-forum-analysis/
├── config.yaml          # Your configuration
├── forum.db            # SQLite database (created on first collect)
├── checkpoints/        # Recovery checkpoints
├── exports/            # Analysis reports
└── logs/               # Application logs

2. Recommended Workflow

The recommended workflow ensures the most accurate and relevant analysis by first discovering themes from your specific data.

# 1. Collect forum data (initializes database automatically)
forum-analyzer collect

# 2. Discover natural categories from the data
forum-analyzer themes discover --min-topics 3

# 3. Analyze all topics using the discovered categories
forum-analyzer llm-analyze

# 4. Ask questions about your analysis
forum-analyzer ask "What are the main authentication issues?"

Working with Multiple Projects

You can work with multiple forum analysis projects by using the --dir flag:

# Initialize a new project in a specific directory
mkdir shopify-webhooks
forum-analyzer --dir shopify-webhooks init

# Collect data for that project
forum-analyzer --dir shopify-webhooks collect

# Or use environment variable
export FORUM_ANALYZER_DIR=./shopify-webhooks
forum-analyzer collect

Usage

All Commands

A full list of commands and their options are available below.

Project Initialization

# Initialize a new project in the current directory
forum-analyzer init

# Initialize in a specific directory
forum-analyzer --dir ./my-project init

# Overwrite existing configuration
forum-analyzer init --force

Data Collection

# Collect from the category in your config
forum-analyzer collect

# Collect from a specific category
forum-analyzer collect --category-id 25

# Collect with a page limit (for testing)
forum-analyzer collect --page-limit 2

# Collect from a different project directory
forum-analyzer --dir ./my-project collect

Incremental Updates

# Fetch only new/updated content
forum-analyzer update

Status

# View collection status and statistics
forum-analyzer status

Theme Management

# Discover common themes (minimum 3 topics per theme)
forum-analyzer themes discover

# Analyze more topics for better pattern discovery
forum-analyzer themes discover --context-limit 100

# List themes already discovered
forum-analyzer themes list

# Delete all themes (prompts for confirmation)
forum-analyzer themes clean

Topic Analysis

# Analyze all unanalyzed topics
forum-analyzer llm-analyze

# Re-analyze topics that have already been analyzed
forum-analyzer llm-analyze --force

# Analyze a specific topic by its ID
forum-analyzer llm-analyze --topic-id 66

Querying

# Ask questions about the analyzed data
forum-analyzer ask "What are the most common authentication issues?"

Maintenance

# Clear all collection checkpoints
forum-analyzer clear-checkpoints

Technical Details

Architecture

┌─────────────────────┐
│  Discourse Forum    │
│  (JSON endpoints)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Rate-Limited      │
│   HTTP Client       │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Checkpoint        │
│   Manager           │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   SQLite Database   │
│   (SQLAlchemy)      │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐     ┌──────────────┐
│   LLM Analyzer      │────▶│  Claude API  │
└──────────┬──────────┘     └──────────────┘
           │
           ▼
┌─────────────────────┐
│  Reports & Themes   │
└─────────────────────┘

Technology Stack

  • Language: Python 3.10+
  • Database: SQLite with SQLAlchemy
  • HTTP: httpx (async)
  • LLM: Claude API (Anthropic)
  • CLI: Click
  • Config: Pydantic + YAML

Project Structure

discourse-forum-analyzer/
├── src/forum_analyzer/
│   ├── analyzer/              # LLM analysis
│   ├── collector/             # Data collection
│   ├── config/
│   └── cli.py
├── config/
│   └── cli.py
├── examples/
│   └── shopify-webhooks/
└── tests/

Database Schema

The schema is managed by SQLAlchemy models and is split into three categories:

  • Forum Data Tables: categories, topics, posts, users
  • Analysis Tables: llm_analysis, problem_themes
  • Operational Tables: checkpoints, fetch_history

The schema auto-migrates when using LLM analysis features.

Example Application: Shopify Developer Forum

This tool was demonstrated by analyzing Shopify's webhook discussions.

  • Topics: 271
  • Posts: 1,201
  • Users: 324
  • Date Range: September 2024 - October 2025

Example analysis results:

  • 15 distinct problem themes identified
  • 18 critical issues found
  • Top issue: Configuration challenges (25.1% of topics)

See the complete example analysis: examples/shopify-webhooks/LLM_ANALYSIS_REPORT.md

Development

Running Tests

pytest

Code Quality

black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/

Troubleshooting

Rate Limiting

  • Adjust rate_limit in config.yaml (default: 1 req/sec).

Database Locked

  • Only one instance can run at a time.
  • Clear stale checkpoints: forum-analyzer clear-checkpoints.

LLM Analysis Errors

  • Verify your Anthropic API key is valid and has credit.
  • Use the --limit flag for testing with smaller datasets.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Submit a pull request

License

MIT License - See LICENSE file for details.


Appendix: Glossary

Understanding the terminology used in this tool:

Discourse Forum Terms

Category
A top-level organizational unit in Discourse forums (e.g., "Webhooks & Events").

Topic
A discussion thread within a category.

Post
An individual message within a topic. The first post is the topic starter; subsequent posts are replies.

Analysis Terms

Classification The LLM-assigned type of problem or discussion in a topic (e.g., "webhook_delivery", "authentication").

Theme
A higher-level pattern grouping multiple related topics (e.g., "Webhook Delivery Failures").

Severity
The urgency/impact level assigned to a topic (critical, high, medium, low).

Workflow Terms

Collection
The process of downloading forum data (forum-analyzer collect).

Analysis
The process of using the LLM to extract insights from topics (forum-analyzer llm-analyze).

Theme Identification
The process of grouping topics into common patterns (forum-analyzer themes discover).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forum_analyzer-0.1.0.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forum_analyzer-0.1.0-py3-none-any.whl (40.0 kB view details)

Uploaded Python 3

File details

Details for the file forum_analyzer-0.1.0.tar.gz.

File metadata

  • Download URL: forum_analyzer-0.1.0.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for forum_analyzer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a889e9a4a39ccfe56a54c815d82cddc9bf951a6176f14c86a3afb7f6bf544622
MD5 c0d0ad73fe5fa8c6da156dbcabaf7e77
BLAKE2b-256 cbcb517b5d5a775d36a0a03dc65b40beada61fbc33f71049d27548b12bbf14b4

See more details on using hashes here.

File details

Details for the file forum_analyzer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: forum_analyzer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 40.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for forum_analyzer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa7d8c7718ea580a32641ed944cc6d64f68d5c1725b748eb2d725f371f741b35
MD5 bb1b34e8fcf82fbbaa74e67f3954e749
BLAKE2b-256 66eb0b8c249f9f79284d890f293d0ba734f01118f3bed49387db5515f8f874b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page