Skip to main content

git log for any website. Track what changed on any site with AI-powered summaries.

Project description

crawldiff

git log for any website.

Track what changed on any website. Get git-style diffs with AI-powered summaries.
Powered by Cloudflare's /crawl endpoint.

CI PyPI License Python


crawldiff demo

pip install crawldiff
# Snapshot a site
crawldiff crawl https://stripe.com/pricing

# Come back later. See what changed.
crawldiff diff https://stripe.com/pricing --since 7d

Why

Every website monitoring tool is a SaaS dashboard built for marketing teams.

crawldiff is for developers. It's a CLI. It diffs like git. It summarizes with AI. It stores everything locally. And it's powered by Cloudflare's brand new /crawl endpoint — the same infrastructure that powers the internet.

No accounts. No subscriptions. No GUI. Just crawldiff diff.

Setup (30 seconds)

You need a free Cloudflare account. That's it.

# Install
pip install crawldiff

# Set your Cloudflare credentials (free tier: 5 jobs/day, 100 pages/job)
export CLOUDFLARE_ACCOUNT_ID="your-account-id"
export CLOUDFLARE_API_TOKEN="your-api-token"

# Or save to config
crawldiff config set cloudflare.account_id your-id
crawldiff config set cloudflare.api_token your-token

Usage

Track changes on any website

# Take a snapshot
crawldiff crawl https://competitor.com

# Later, see what changed
crawldiff diff https://competitor.com --since 7d

# Output as JSON (pipe to jq, Slack, wherever)
crawldiff diff https://competitor.com --since 7d --format json

# Save a markdown report
crawldiff diff https://competitor.com --since 30d --output report.md

Watch a site continuously

# Check every hour, get notified when something changes
crawldiff watch https://stripe.com/pricing --every 1h

# Check every 6 hours, skip AI summary
crawldiff watch https://competitor.com --every 6h --no-summary

View history

crawldiff history https://stripe.com/pricing
       Crawl History — https://stripe.com/pricing
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Job ID         ┃ Date                ┃ Pages ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ cf-job-abc-123 │ 2026-03-13 09:00:00 │    12 │
│ cf-job-def-456 │ 2026-03-06 09:00:00 │    11 │
│ cf-job-ghi-789 │ 2026-02-27 09:00:00 │    11 │
└────────────────┴─────────────────────┴───────┘

Crawl options

# Deeper crawl
crawldiff crawl https://docs.react.dev --depth 3 --max-pages 100

# Static sites (faster, no browser rendering)
crawldiff crawl https://blog.example.com --no-render

# Ignore whitespace noise
crawldiff diff https://example.com --since 7d --ignore-whitespace

AI Summaries (optional)

Raw diffs are useful. AI summaries make them actionable. crawldiff supports three providers:

# Cloudflare Workers AI (free, uses your existing CF account)
crawldiff config set ai.provider cloudflare

# Anthropic Claude
pip install crawldiff[ai]
crawldiff config set ai.provider anthropic
export ANTHROPIC_API_KEY="sk-..."

# OpenAI
pip install crawldiff[ai]
crawldiff config set ai.provider openai
export OPENAI_API_KEY="sk-..."

Don't want AI? Just use --no-summary. Diffs work perfectly without it.

How it works

1. crawldiff crawl <url>
   └─→ Cloudflare /crawl API (headless browser, respects robots.txt)
   └─→ Store Markdown snapshots in local SQLite (~/.crawldiff/)

2. crawldiff diff <url> --since 7d
   └─→ Cloudflare /crawl with modifiedSince (only fetches changed pages)
   └─→ Diff against stored snapshot (unified diff via difflib)
   └─→ AI summary (optional)
   └─→ Beautiful terminal output via rich

The key insight: Cloudflare's modifiedSince parameter means incremental crawling is built-in. On repeat diffs, only changed pages are fetched. Fast and cheap.

Why Cloudflare /crawl?

crawldiff (Cloudflare) Firecrawl Crawl4AI
Free tier 5 jobs/day, 100 pages 500 credits Self-host
Incremental crawling Built-in (modifiedSince) No No
Browser rendering Headless Chrome at the edge Yes Yes
Respects robots.txt By default Opt-in No
Pricing $5/mo (Workers Paid) $47/mo Free (self-host)
Infrastructure Cloudflare's global network Their servers Your servers

vs. other monitoring tools

Feature crawldiff Visualping changedetection.io Firecrawl
Open source Yes No Yes Yes
CLI-native Yes No No No
AI summaries Yes No No No
Incremental crawling Yes No No No
Local storage Yes No No No
JSON/pipe output Yes No Yes Yes
Free Yes Limited Yes Limited

All commands

crawldiff crawl <url>      Snapshot a website
crawldiff diff <url>       Show what changed (the main command)
crawldiff watch <url>      Monitor continuously
crawldiff history <url>    View past snapshots
crawldiff config           Manage settings

Contributing

Contributions welcome! See CONTRIBUTING.md for setup and guidelines.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawldiff-0.1.0.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawldiff-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file crawldiff-0.1.0.tar.gz.

File metadata

  • Download URL: crawldiff-0.1.0.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.12.3 HTTPX/0.28.1

File hashes

Hashes for crawldiff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 56e8014fac4e31584f6a894bba6b33aababdeda53e6db196173cdced84831cd9
MD5 8bc6ebbc52a2ca3a47ddbcf5421cbab0
BLAKE2b-256 ffa3bdb4fd76a65bdded2dcab932067c747a7223027a57e4960227a1d5b9a752

See more details on using hashes here.

File details

Details for the file crawldiff-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crawldiff-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.12.3 HTTPX/0.28.1

File hashes

Hashes for crawldiff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 271b0869812f40b814bf42308b224defd811881906e603475c2942407a13e29d
MD5 3e80754454149ce5d378c3611d7ef000
BLAKE2b-256 125d44a6a1c0649f9f85df54f60d5f6f6ae240b078b1a508ff750f2ddccddfa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page