Skip to main content

git log for any website. Track what changed on any site with AI-powered summaries.

Project description

crawldiff

git log for any website.

Track what changed on any website. Git-style diffs with optional AI summaries.
Powered by Cloudflare's /crawl endpoint.

CI PyPI License Python


crawldiff demo

pip install crawldiff
# Snapshot a site
crawldiff crawl https://stripe.com/pricing

# Come back later. See what changed.
crawldiff diff https://stripe.com/pricing --since 7d

What is this

A CLI tool for tracking website changes over time. It crawls pages via Cloudflare's /crawl endpoint, stores markdown snapshots locally in SQLite, and produces unified diffs between crawls. Optionally summarizes changes with AI.

No SaaS subscriptions. No proprietary dashboards. Just crawldiff diff.

Setup (30 seconds)

You need a free Cloudflare account. That's it.

# Install
pip install crawldiff

# Set your Cloudflare credentials (free tier: 5 jobs/day, 100 pages/job)
export CLOUDFLARE_ACCOUNT_ID="your-account-id"
export CLOUDFLARE_API_TOKEN="your-api-token"

# Or save to config (env vars take precedence over config file)
crawldiff config set cloudflare.account_id your-id
crawldiff config set cloudflare.api_token your-token

Usage

Track changes on any website

# Take a snapshot
crawldiff crawl https://competitor.com

# Later, see what changed
crawldiff diff https://competitor.com --since 7d

# Output as JSON (pipe to jq, Slack, wherever)
crawldiff diff https://competitor.com --since 7d --format json

# Save a markdown report
crawldiff diff https://competitor.com --since 30d --output report.md

Watch a site continuously

# Check every hour, get notified when something changes
crawldiff watch https://stripe.com/pricing --every 1h

# Check every 6 hours, skip AI summary
crawldiff watch https://competitor.com --every 6h --no-summary

View history

crawldiff history https://stripe.com/pricing
       Crawl History — https://stripe.com/pricing
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Job ID         ┃ Date                ┃ Pages ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ cf-job-abc-123 │ 2026-03-13 09:00:00 │    12 │
│ cf-job-def-456 │ 2026-03-06 09:00:00 │    11 │
│ cf-job-ghi-789 │ 2026-02-27 09:00:00 │    11 │
└────────────────┴─────────────────────┴───────┘

More options

# Deeper crawl
crawldiff crawl https://docs.react.dev --depth 3 --max-pages 100

# Static sites (faster, no browser rendering)
crawldiff crawl https://blog.example.com --no-render

# Ignore whitespace noise in diffs
crawldiff diff https://example.com --since 7d --ignore-whitespace

AI Summaries (optional)

crawldiff can optionally summarize diffs using an LLM. Three providers are supported:

# Cloudflare Workers AI (free, uses your existing CF account)
crawldiff config set ai.provider cloudflare

# Anthropic Claude
pip install crawldiff[ai]
crawldiff config set ai.provider anthropic
export ANTHROPIC_API_KEY="sk-..."

# OpenAI
pip install crawldiff[ai]
crawldiff config set ai.provider openai
export OPENAI_API_KEY="sk-..."

Don't want AI? Just use --no-summary. Diffs work fine without it.

How it works

1. crawldiff crawl <url>
   └─→ Cloudflare /crawl API (headless browser, respects robots.txt)
   └─→ Store Markdown snapshots in local SQLite (~/.crawldiff/)

2. crawldiff diff <url> --since 7d
   └─→ Cloudflare /crawl with modifiedSince (only fetches changed pages)
   └─→ Diff against stored snapshot (unified diff via difflib)
   └─→ AI summary (optional)
   └─→ Syntax-highlighted diffs in the terminal (via rich)

Cloudflare's modifiedSince parameter means repeat diffs only fetch changed pages, not the entire site.

Comparison

crawldiff Visualping changedetection.io Firecrawl
Open source Yes No Yes Yes
CLI-native Yes No API API
AI summaries Built-in No Via plugins Extraction
Incremental crawling Yes (modifiedSince) No No No
Local-first storage SQLite Cloud Self-host or cloud Cloud
JSON/pipe output Yes No Yes Yes
Free tier 5 jobs/day, 100 pages Limited Yes (self-host) 500 credits

All commands

crawldiff crawl <url>      Snapshot a website
crawldiff diff <url>       Show what changed (the main command)
crawldiff watch <url>      Monitor continuously
crawldiff history <url>    View past snapshots
crawldiff config set|get|show   Manage settings

Contributing

Pull requests and bug reports are welcome. See CONTRIBUTING.md to get started.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawldiff-0.1.4.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawldiff-0.1.4-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file crawldiff-0.1.4.tar.gz.

File metadata

  • Download URL: crawldiff-0.1.4.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crawldiff-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d53d960b3899423d6ac07ed03ee8858474bad592113305bf5f2cd0bf78094e2a
MD5 b62dd55020cb5a38f9595a026a32536e
BLAKE2b-256 c51d40021f74630efa8f23400005661a07a0a9bf85589e8310f8bb22683d2631

See more details on using hashes here.

Provenance

The following attestation bundles were made for crawldiff-0.1.4.tar.gz:

Publisher: publish.yml on GeoRouv/crawldiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file crawldiff-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: crawldiff-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crawldiff-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3bd3d5f11a62ebfb1dd9112f97e155f3395c3432abc0717ce6fa8fca33ca54ef
MD5 fb3e3d2bca2b7c3dfde5e5970a2a0c6e
BLAKE2b-256 0d2874409e4d1cc1ff7cd2aa84d6729b67916a6022da957b1d2a94e16c0659e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for crawldiff-0.1.4-py3-none-any.whl:

Publisher: publish.yml on GeoRouv/crawldiff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page