Scrape and index API docs as claude skills with gusto

Project description

Skill Jangler

Transform any documentation into AI-powered Claude skills

Automatically convert any documentation website into a Claude AI skill in minutes.

Attribution

Skill Jangler is a fork of Skill_Seekers by Yusuf Karaaslan. This project builds upon that excellent foundation with enhanced features, improved MCP integration, and ongoing maintenance.

What is Skill Jangler?

Skill Jangler is an automated tool that transforms any documentation website into a production-ready Claude AI skill. Instead of manually reading and summarizing documentation, Skill Jangler:

Scrapes documentation websites automatically
Organizes content into categorized reference files
Enhances with AI to extract best examples and key concepts
Packages everything into an uploadable .zip file for Claude

Result: Get comprehensive Claude skills for any framework, API, or tool in 20-40 minutes instead of hours of manual work.

Why Use This?

🎯 For Developers: Quickly create Claude skills for your favorite frameworks (React, Vue, Django, etc.)
🎮 For Game Devs: Generate skills for game engines (Godot, Unity documentation, etc.)
🔧 For Teams: Create internal documentation skills for your company's APIs
📚 For Learners: Build comprehensive reference skills for technologies you're learning

Key Features

✅ Universal Scraper - Works with ANY documentation website ✅ AI-Powered Enhancement - Transforms basic templates into comprehensive guides ✅ MCP Server for Claude Code - Use directly from Claude Code with natural language ✅ Large Documentation Support - Handle 10K-40K+ page docs with intelligent splitting ✅ Router/Hub Skills - Intelligent routing to specialized sub-skills ✅ 8 Ready-to-Use Presets - Godot, React, Vue, Django, FastAPI, and more ✅ Smart Categorization - Automatically organizes content by topic ✅ Code Language Detection - Recognizes Python, JavaScript, C++, GDScript, etc. ✅ No API Costs - FREE local enhancement using Claude Code Max ✅ Checkpoint/Resume - Never lose progress on long scrapes ✅ Parallel Scraping - Process multiple skills simultaneously ✅ Caching System - Scrape once, rebuild instantly ✅ Fully Tested - 96 tests with 100% pass rate

Quick Example

Option 1: Use from Claude Code (Recommended)

# One-time setup (5 minutes)
./setup_mcp.sh

# Then in Claude Code, just ask:
"Generate a React skill from https://react.dev/"

Time: Automated | Quality: Production-ready | Cost: Free

Option 2: Use CLI Directly

# Install dependencies (2 pip packages)
pip3 install requests beautifulsoup4

# Generate a React skill in one command
python3 cli/doc_scraper.py --config configs/react.json --enhance-local

# Upload output/react.zip to Claude - Done!

Time: ~25 minutes | Quality: Production-ready | Cost: Free

How It Works

graph LR
    A[Documentation Website] --> B[Skill Jangler]
    B --> C[Scraper]
    B --> D[AI Enhancement]
    B --> E[Packager]
    C --> F[Organized References]
    D --> F
    F --> E
    E --> G[Claude Skill .zip]
    G --> H[Upload to Claude AI]

Scrape: Extracts all pages from documentation
Categorize: Organizes content into topics (API, guides, tutorials, etc.)
Enhance: AI analyzes docs and creates comprehensive SKILL.md with examples
Package: Bundles everything into a Claude-ready .zip file

🚀 Quick Start

Method 1: MCP Server for Claude Code (Easiest)

Use Skill Jangler directly from Claude Code with natural language!

# One-time setup (5 minutes)
./setup_mcp.sh

# Restart Claude Code, then just ask:

In Claude Code:

List all available configs
Generate config for Tailwind at https://tailwindcss.com/docs
Scrape docs using configs/react.json
Package skill at output/react/

Benefits:

✅ No manual CLI commands
✅ Natural language interface
✅ Integrated with your workflow
✅ 9 tools available instantly (includes automatic upload!)
✅ Tested and working in production

Full guides:

📘 MCP Setup Guide - Complete installation instructions
🧪 MCP Testing Guide - Test all 9 tools
📦 Large Documentation Guide - Handle 10K-40K+ pages
📤 Upload Guide - How to upload skills to Claude

Method 2: CLI (Traditional)

Easiest: Use a Preset

# Install dependencies (macOS)
pip3 install requests beautifulsoup4

# Optional: Estimate pages first (fast, 1-2 minutes)
python3 estimate_pages.py configs/godot.json

# Use Godot preset
python3 doc_scraper.py --config configs/godot.json

# Use React preset
python3 doc_scraper.py --config configs/react.json

# See all presets
ls configs/

Interactive Mode

python3 doc_scraper.py --interactive

Quick Mode

python3 doc_scraper.py \
  --name react \
  --url https://react.dev/ \
  --description "React framework for UIs"

📤 Uploading Skills to Claude

Once your skill is packaged, you need to upload it to Claude:

Option 1: Automatic Upload (API-based)

# Set your API key (one-time)
export ANTHROPIC_API_KEY=sk-ant-...

# Package and upload automatically
python3 cli/package_skill.py output/react/ --upload

# OR upload existing .zip
python3 cli/upload_skill.py output/react.zip

Benefits:

✅ Fully automatic
✅ No manual steps
✅ Works from command line

Requirements:

Anthropic API key (get from https://console.anthropic.com/)

Option 2: Manual Upload (No API Key)

# Package skill
python3 cli/package_skill.py output/react/

# This will:
# 1. Create output/react.zip
# 2. Open the output/ folder automatically
# 3. Show upload instructions

# Then manually upload:
# - Go to https://claude.ai/skills
# - Click "Upload Skill"
# - Select output/react.zip
# - Done!

Benefits:

✅ No API key needed
✅ Works for everyone
✅ Folder opens automatically

Option 3: Claude Code (MCP) - Smart & Automatic

In Claude Code, just ask:
"Package and upload the React skill"

# With API key set:
# - Packages the skill
# - Uploads to Claude automatically
# - Done! ✅

# Without API key:
# - Packages the skill
# - Shows where to find the .zip
# - Provides manual upload instructions

Benefits:

✅ Natural language
✅ Smart auto-detection (uploads if API key available)
✅ Works with or without API key
✅ No errors or failures

📁 Simple Structure

doc-to-skill/
├── cli/
│   ├── doc_scraper.py      # Main scraping tool
│   ├── package_skill.py    # Package to .zip
│   ├── upload_skill.py     # Auto-upload (API)
│   └── enhance_skill.py    # AI enhancement
├── mcp/                    # MCP server for Claude Code
│   └── server.py           # 9 MCP tools
├── configs/                # Preset configurations
│   ├── godot.json         # Godot Engine
│   ├── react.json         # React
│   ├── vue.json           # Vue.js
│   ├── django.json        # Django
│   └── fastapi.json       # FastAPI
└── output/                 # All output (auto-created)
    ├── godot_data/        # Scraped data
    ├── godot/             # Built skill
    └── godot.zip          # Packaged skill

✨ Features

1. Fast Page Estimation (NEW!)

python3 estimate_pages.py configs/react.json

# Output:
📊 ESTIMATION RESULTS
✅ Pages Discovered: 180
📈 Estimated Total: 230
⏱️  Time Elapsed: 1.2 minutes
💡 Recommended max_pages: 280

Benefits:

Know page count BEFORE scraping (saves time)
Validates URL patterns work correctly
Estimates total scraping time
Recommends optimal max_pages setting
Fast (1-2 minutes vs 20-40 minutes full scrape)

2. Auto-Detect Existing Data

python3 doc_scraper.py --config configs/godot.json

# If data exists:
✓ Found existing data: 245 pages
Use existing data? (y/n): y
⏭️  Skipping scrape, using existing data

3. Knowledge Generation

Automatic pattern extraction:

Extracts common code patterns from docs
Detects programming language
Creates quick reference with real examples
Smarter categorization with scoring

Enhanced SKILL.md:

Real code examples from documentation
Language-annotated code blocks
Common patterns section
Quick reference from actual usage examples

4. Smart Categorization

Automatically infers categories from:

URL structure
Page titles
Content keywords
With scoring for better accuracy

5. Code Language Detection

# Automatically detects:
- Python (def, import, from)
- JavaScript (const, let, =>)
- GDScript (func, var, extends)
- C++ (#include, int main)
- And more...

5. Skip Scraping

# Scrape once
python3 doc_scraper.py --config configs/react.json

# Later, just rebuild (instant)
python3 doc_scraper.py --config configs/react.json --skip-scrape

6. AI-Powered SKILL.md Enhancement

# Option 1: During scraping (API-based, requires API key)
pip3 install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python3 cli/doc_scraper.py --config configs/react.json --enhance

# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
python3 cli/doc_scraper.py --config configs/react.json --enhance-local

# Option 3: After scraping (API-based, standalone)
python3 cli/enhance_skill.py output/react/

# Option 4: After scraping (LOCAL, no API key, standalone)
python3 cli/enhance_skill_local.py output/react/

What it does:

Reads your reference documentation
Uses Claude to generate an excellent SKILL.md
Extracts best code examples (5-10 practical examples)
Creates comprehensive quick reference
Adds domain-specific key concepts
Provides navigation guidance for different skill levels
Automatically backs up original
Quality: Transforms 75-line templates into 500+ line comprehensive guides

LOCAL Enhancement (Recommended):

Uses your Claude Code Max plan (no API costs)
Opens new terminal with Claude Code
Analyzes reference files automatically
Takes 30-60 seconds
Quality: 9/10 (comparable to API version)

7. Large Documentation Support (10K-40K+ Pages)

For massive documentation sites like Godot (40K pages), AWS, or Microsoft Docs:

# 1. Estimate first (discover page count)
python3 cli/estimate_pages.py configs/godot.json

# 2. Auto-split into focused sub-skills
python3 cli/split_config.py configs/godot.json --strategy router

# Creates:
# - godot-scripting.json (5K pages)
# - godot-2d.json (8K pages)
# - godot-3d.json (10K pages)
# - godot-physics.json (6K pages)
# - godot-shaders.json (11K pages)

# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
for config in configs/godot-*.json; do
  python3 cli/doc_scraper.py --config $config &
done
wait

# 4. Generate intelligent router/hub skill
python3 cli/generate_router.py configs/godot-*.json

# 5. Package all skills
python3 cli/package_multi.py output/godot*/

# 6. Upload all .zip files to Claude
# Users just ask questions naturally!
# Router automatically directs to the right sub-skill!

Split Strategies:

auto - Intelligently detects best strategy based on page count
category - Split by documentation categories (scripting, 2d, 3d, etc.)
router - Create hub skill + specialized sub-skills (RECOMMENDED)
size - Split every N pages (for docs without clear categories)

Benefits:

✅ Faster scraping (parallel execution)
✅ More focused skills (better Claude performance)
✅ Easier maintenance (update one topic at a time)
✅ Natural user experience (router handles routing)
✅ Avoids context window limits

Configuration:

{
  "name": "godot",
  "max_pages": 40000,
  "split_strategy": "router",
  "split_config": {
    "target_pages_per_skill": 5000,
    "create_router": true,
    "split_by_categories": ["scripting", "2d", "3d", "physics"]
  }
}

Full Guide: Large Documentation Guide

8. Checkpoint/Resume for Long Scrapes

Never lose progress on long-running scrapes:

# Enable in config
{
  "checkpoint": {
    "enabled": true,
    "interval": 1000  // Save every 1000 pages
  }
}

# If scrape is interrupted (Ctrl+C or crash)
python3 cli/doc_scraper.py --config configs/godot.json --resume

# Resume from last checkpoint
✅ Resuming from checkpoint (12,450 pages scraped)
⏭️  Skipping 12,450 already-scraped pages
🔄 Continuing from where we left off...

# Start fresh (clear checkpoint)
python3 cli/doc_scraper.py --config configs/godot.json --fresh

Benefits:

✅ Auto-saves every 1000 pages (configurable)
✅ Saves on interruption (Ctrl+C)
✅ Resume with --resume flag
✅ Never lose hours of scraping progress

🎯 Complete Workflows

First Time (With Scraping + Enhancement)

# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
python3 doc_scraper.py --config configs/godot.json --enhance-local

# 2. Wait for new terminal to close (enhancement completes)
# Check the enhanced SKILL.md:
cat output/godot/SKILL.md

# 3. Package
python3 package_skill.py output/godot/

# 4. Done! You have godot.zip with excellent SKILL.md

Time: 20-40 minutes (scraping) + 60 seconds (enhancement) = ~21-41 minutes

Using Existing Data (Fast!)

# 1. Use cached data + Local Enhancement
python3 doc_scraper.py --config configs/godot.json --skip-scrape
python3 enhance_skill_local.py output/godot/

# 2. Package
python3 package_skill.py output/godot/

# 3. Done!

Time: 1-3 minutes (build) + 60 seconds (enhancement) = ~2-4 minutes total

Without Enhancement (Basic)

# 1. Scrape + Build (no enhancement)
python3 doc_scraper.py --config configs/godot.json

# 2. Package
python3 package_skill.py output/godot/

# 3. Done! (SKILL.md will be basic template)

Time: 20-40 minutes Note: SKILL.md will be generic - enhancement strongly recommended!

📋 Available Presets

Config	Framework	Description
`godot.json`	Godot Engine	Game development
`react.json`	React	UI framework
`vue.json`	Vue.js	Progressive framework
`django.json`	Django	Python web framework
`fastapi.json`	FastAPI	Modern Python API

Using Presets

# Godot
python3 doc_scraper.py --config configs/godot.json

# React
python3 doc_scraper.py --config configs/react.json

# Vue
python3 doc_scraper.py --config configs/vue.json

# Django
python3 doc_scraper.py --config configs/django.json

# FastAPI
python3 doc_scraper.py --config configs/fastapi.json

🎨 Creating Your Own Config

Option 1: Interactive

python3 doc_scraper.py --interactive
# Follow prompts, it will create the config for you

Option 2: Copy and Edit

# Copy a preset
cp configs/react.json configs/myframework.json

# Edit it
nano configs/myframework.json

# Use it
python3 doc_scraper.py --config configs/myframework.json

Config Structure

{
  "name": "myframework",
  "description": "When to use this skill",
  "base_url": "https://docs.myframework.com/",
  "selectors": {
    "main_content": "article",
    "title": "h1",
    "code_blocks": "pre code"
  },
  "url_patterns": {
    "include": ["/docs", "/guide"],
    "exclude": ["/blog", "/about"]
  },
  "categories": {
    "getting_started": ["intro", "quickstart"],
    "api": ["api", "reference"]
  },
  "rate_limit": 0.5,
  "max_pages": 500
}

📊 What Gets Created

output/
├── godot_data/              # Scraped raw data
│   ├── pages/              # JSON files (one per page)
│   └── summary.json        # Overview
│
└── godot/                   # The skill
    ├── SKILL.md            # Enhanced with real examples
    ├── references/         # Categorized docs
    │   ├── index.md
    │   ├── getting_started.md
    │   ├── scripting.md
    │   └── ...
    ├── scripts/            # Empty (add your own)
    └── assets/             # Empty (add your own)

🎯 Command Line Options

# Interactive mode
python3 doc_scraper.py --interactive

# Use config file
python3 doc_scraper.py --config configs/godot.json

# Quick mode
python3 doc_scraper.py --name react --url https://react.dev/

# Skip scraping (use existing data)
python3 doc_scraper.py --config configs/godot.json --skip-scrape

# With description
python3 doc_scraper.py \
  --name react \
  --url https://react.dev/ \
  --description "React framework for building UIs"

💡 Tips

1. Test Small First

Edit max_pages in config to test:

{
  "max_pages": 20  // Test with just 20 pages
}

2. Reuse Scraped Data

# Scrape once
python3 doc_scraper.py --config configs/react.json

# Rebuild multiple times (instant)
python3 doc_scraper.py --config configs/react.json --skip-scrape
python3 doc_scraper.py --config configs/react.json --skip-scrape

3. Finding Selectors

# Test in Python
from bs4 import BeautifulSoup
import requests

url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))

4. Check Output Quality

# After building, check:
cat output/godot/SKILL.md  # Should have real examples
cat output/godot/references/index.md  # Categories

🐛 Troubleshooting

No Content Extracted?

Check your main_content selector
Try: article, main, div[role="main"]

Data Exists But Won't Use It?

# Force re-scrape
rm -rf output/myframework_data/
python3 doc_scraper.py --config configs/myframework.json

Categories Not Good?

Edit the config categories section with better keywords.

Want to Update Docs?

# Delete old data
rm -rf output/godot_data/

# Re-scrape
python3 doc_scraper.py --config configs/godot.json

📈 Performance

Task	Time	Notes
Scraping	15-45 min	First time only
Building	1-3 min	Fast!
Re-building	<1 min	With --skip-scrape
Packaging	5-10 sec	Final zip

✅ Summary

One tool does everything:

✅ Scrapes documentation
✅ Auto-detects existing data
✅ Generates better knowledge
✅ Creates enhanced skills
✅ Works with presets or custom configs
✅ Supports skip-scraping for fast iteration

Simple structure:

doc_scraper.py - The tool
configs/ - Presets
output/ - Everything else

Better output:

Real code examples with language detection
Common patterns extracted from docs
Smart categorization
Enhanced SKILL.md with actual examples

📚 Documentation

QUICKSTART.md - Get started in 3 steps
docs/LARGE_DOCUMENTATION.md - Handle 10K-40K+ page docs
docs/ENHANCEMENT.md - AI enhancement guide
docs/UPLOAD_GUIDE.md - How to upload skills to Claude
docs/MCP_SETUP.md - MCP integration setup
docs/CLAUDE.md - Technical architecture
STRUCTURE.md - Repository structure

🎮 Ready?

# Try Godot
python3 doc_scraper.py --config configs/godot.json

# Try React
python3 doc_scraper.py --config configs/react.json

# Or go interactive
python3 doc_scraper.py --interactive

📝 License

MIT License - see LICENSE file for details

Happy skill building! 🚀

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Oct 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skill_jangler-0.1.0.tar.gz (140.4 kB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skill_jangler-0.1.0-py3-none-any.whl (18.7 kB view details)

Uploaded Oct 20, 2025 Python 3

File details

Details for the file skill_jangler-0.1.0.tar.gz.

File metadata

Download URL: skill_jangler-0.1.0.tar.gz
Upload date: Oct 20, 2025
Size: 140.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for skill_jangler-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6765cbb24faf5732982999bbde7404a66a2e153bec0b142f650bd86846f1a85b`
MD5	`f67dfba0b099556ad27cc95b53300d41`
BLAKE2b-256	`9a81cbda4351038fca0410b3de006f7df546e822bde7588f4889c8c1c8aa1785`

See more details on using hashes here.

File details

Details for the file skill_jangler-0.1.0-py3-none-any.whl.

File metadata

Download URL: skill_jangler-0.1.0-py3-none-any.whl
Upload date: Oct 20, 2025
Size: 18.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for skill_jangler-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`896be681c821e7a1c52a20d7f50ba73fda2c844a47b691a4a490109724d7c625`
MD5	`d7016f7c000e1a1e60ef3783a1f72a9e`
BLAKE2b-256	`0a7bfa7d12927a30838fd6641fadb1aeb36e547ab388c4f2d174bea07c566906`

See more details on using hashes here.

skill-jangler 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

Skill Jangler

Attribution

What is Skill Jangler?

Why Use This?

Key Features

Quick Example

Option 1: Use from Claude Code (Recommended)

Option 2: Use CLI Directly

How It Works

🚀 Quick Start

Method 1: MCP Server for Claude Code (Easiest)

Method 2: CLI (Traditional)

Easiest: Use a Preset

Interactive Mode

Quick Mode

📤 Uploading Skills to Claude

Option 1: Automatic Upload (API-based)

Option 2: Manual Upload (No API Key)

Option 3: Claude Code (MCP) - Smart & Automatic

📁 Simple Structure

✨ Features

1. Fast Page Estimation (NEW!)

2. Auto-Detect Existing Data

3. Knowledge Generation

4. Smart Categorization

5. Code Language Detection

5. Skip Scraping

6. AI-Powered SKILL.md Enhancement

7. Large Documentation Support (10K-40K+ Pages)

8. Checkpoint/Resume for Long Scrapes

🎯 Complete Workflows

First Time (With Scraping + Enhancement)

Using Existing Data (Fast!)

Without Enhancement (Basic)

📋 Available Presets

Using Presets

🎨 Creating Your Own Config

Option 1: Interactive

Option 2: Copy and Edit

Config Structure

📊 What Gets Created

🎯 Command Line Options

💡 Tips

1. Test Small First

2. Reuse Scraped Data

3. Finding Selectors

4. Check Output Quality

🐛 Troubleshooting

No Content Extracted?

Data Exists But Won't Use It?

Categories Not Good?

Want to Update Docs?

📈 Performance

✅ Summary

📚 Documentation

🎮 Ready?

📝 License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata