Lightweight web change monitoring library - clean diffs, structured alerts, no AI required.

These details have not been verified by PyPI

Project description

WatchDiff

Lightweight web change monitoring - clean diffs, structured alerts, no AI required.

WatchDiff watches web pages and tells you exactly what changed, in plain language.
No noisy HTML diffs. No external services. No AI black boxes.

At a glance

What you want	How
Monitor a URL for changes	`.watch(url, target=".price", interval=300)` + `.start()`
Target a specific element	`target=".price"` (CSS) or `target="//span[@class='p']"` (XPath)
Get notified on change	`on_change=lambda r: print(r.summary())` or `webhooks=["https://discord.com/..."]`
Render JS-heavy pages	`browser=True` (requires `pip install "watchdiff-core[browser]"`)
Avoid notification spam	`cooldown=3600` (min seconds between alerts per URL)
Rotate proxies / UAs	`proxies=[...]`, `user_agents=[...]`
Diff at paragraph level	`diff_mode="semantic"`
Persist to SQLite	`WatchDiff(store=SqliteStore(".watchdiff.db"))`
Export history	`.export_reports_csv(url)` / `.export_reports_xlsx(url)`
CLI one-liner	`watchdiff run https://example.com --target .price --interval 60`
Multi-URL config file	`watchdiff init` then edit `watchdiff.config.json`

Why WatchDiff?

Most change detection tools compare raw HTML — which means every minor script reload or ad rotation triggers a false positive. WatchDiff strips the noise first, then diffs only the content that matters.

Deterministic — same input always produces the same output
Human-readable diffs — "Price changed: $19 → $24", not a wall of HTML
Zero external services — snapshots stored locally (JSON or SQLite)
Async-ready — sync and async schedulers included
Python 3.9+ — works on Debian Bullseye, Bookworm, and Trixie

Install

pip install watchdiff-core

Or with uv:

uv add watchdiff-core

Optional extras

# JavaScript / SPA pages (Playwright headless browser)
pip install "watchdiff-core[browser]"
playwright install chromium

# XLSX export
pip install "watchdiff-core[xlsx]"

# Everything at once
pip install "watchdiff-core[all]"

Quick start

Python API

from watchdiff import WatchDiff

wd = WatchDiff()

wd.watch(
    "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
    target=".price_color",
    interval=60,
    label="Book price",
    on_change=lambda r: print(r.summary()),
)

wd.start()

CLI

# Generate a config file
watchdiff init

# Run from config file
watchdiff run --config watchdiff.config.json

# One-shot check
watchdiff check https://example.com --target .price

# Continuous monitoring (Ctrl+C to stop)
watchdiff run https://example.com --target .price --interval 60

# Snapshot history and reports
watchdiff history https://example.com
watchdiff reports https://example.com

# Clear stored data
watchdiff clear https://example.com

Features

JavaScript pages with Playwright

For pages that render content via JavaScript (SPAs, React, Vue, etc.), use the headless browser mode:

pip install "watchdiff-core[browser]"
playwright install chromium

from watchdiff import WatchDiff
from watchdiff.models import BrowserOptions

wd = WatchDiff()
wd.watch(
    "https://spa.example.com/pricing",
    target=".price",
    browser=True,
    browser_options=BrowserOptions(
        wait_for="networkidle",       # wait until network is quiet
        wait_for_selector=".price",   # also wait for this element to appear
        timeout=30000,                # ms - max wait time
    ),
)
wd.start()

wait_for accepts:

"load" — default, waits for the load event
"domcontentloaded" — faster, waits for DOM only
"networkidle" — waits until no network requests for 500ms

Proxy rotation and User-Agent rotation

Avoid blocks with automatic rotation on every request:

wd.watch(
    "https://example.com",
    proxies=[
        "http://proxy1.example.com:8080",
        "http://proxy2.example.com:8080",
        "socks5://proxy3.example.com:1080",
    ],
    user_agents=[
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ...",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 ...",
    ],
)

If user_agents is empty, WatchDiff rotates automatically among 4 built-in modern UA strings (Chrome, Safari, Firefox, Chrome Linux). No configuration required.

Proxies also work in browser mode — Playwright passes the selected proxy to Chromium.

Semantic diff mode

By default, WatchDiff diffs line by line. In semantic mode, it extracts meaningful HTML blocks — <p>, <h1>-<h6>, <li>, <td>, <th>, <blockquote> — and diffs those instead. This gives cleaner results on content-heavy pages where a single paragraph change doesn't shift dozens of lines.

wd.watch(
    "https://blog.example.com/article",
    diff_mode="semantic",   # "line" (default) or "semantic"
)

If no semantic blocks are found, the engine falls back to line mode automatically.

In the CLI:

watchdiff check https://blog.example.com/article --diff-mode semantic
watchdiff run   https://blog.example.com/article --diff-mode semantic --interval 3600

XPath selectors

target accepts both CSS selectors and XPath expressions. XPath is detected automatically by a leading / or (:

# CSS selector (default)
wd.watch("https://example.com", target=".price")
wd.watch("https://example.com", target="#main > h1")

# XPath expressions
wd.watch("https://example.com", target="//div[@class='price']")
wd.watch("https://example.com", target="//table//tr[td[1]='Revenue']/td[2]")
wd.watch("https://example.com", target="(//h2)[1]")         # first <h2> only
wd.watch("https://example.com", target="//p[contains(@class,'intro')]")

XPath is implemented via lxml (already a dependency — no extra install needed).

SQLite storage backend

By default, WatchDiff stores data as JSON files. For larger datasets or concurrent access, switch to the built-in SQLite backend — no extra dependencies required:

from watchdiff import WatchDiff
from watchdiff.store import SqliteStore

wd = WatchDiff(store=SqliteStore(".watchdiff.db"))
wd.watch("https://example.com").start()

SqliteStore is a drop-in replacement for the default Store — same interface, same behaviour. It runs in WAL mode for concurrent-read safety.

CSV and XLSX export

Export your snapshot history and diff reports to CSV (no dependencies) or XLSX (requires openpyxl):

from watchdiff import WatchDiff

wd = WatchDiff()
wd.watch("https://example.com", target=".price")

# CSV - always available, returns the CSV string
csv_text = wd.export_reports_csv("https://example.com", dest="reports.csv")
csv_text = wd.export_snapshots_csv("https://example.com", dest="snapshots.csv")

# XLSX - requires: pip install "watchdiff-core[xlsx]"
path = wd.export_reports_xlsx("https://example.com", dest="reports.xlsx")
path = wd.export_snapshots_xlsx("https://example.com", dest="snapshots.xlsx")

All export methods accept:

url — the watched URL
target — CSS/XPath filter (optional, None = full page)
limit — max rows to include (default 500)
dest — file path to write (optional for CSV, required for XLSX)

Cooldown anti-spam

Use cooldown to set a minimum delay in seconds between two alerts for the same URL. Useful when a page changes frequently but you don't want to be notified on every single check.

wd.watch(
    "https://news.example.com/live",
    target=".headline",
    interval=30,         # check every 30 seconds
    cooldown=600,        # but alert at most every 10 minutes
    on_change=lambda r: print(r.summary()),
)

Important: changes are still detected and stored during the cooldown period. Only the alerts (callbacks, webhooks) are suppressed. The full history remains available via .history() and .reports().

cooldown=0 (default) disables the feature — every change triggers an alert immediately.

In the CLI:

watchdiff run https://news.example.com --interval 30 --cooldown 600

In watchdiff.config.json:

{
  "url": "https://news.example.com/live",
  "interval": 30,
  "cooldown": 600
}

Config file workflow (`watchdiff init`)

Generate a ready-to-edit config file, then run all your watchers in one command:

watchdiff init
# Created watchdiff.config.json

Edit watchdiff.config.json:

{
  "storage": ".watchdiff",
  "watchers": [
    {
      "url": "https://store.example.com/product/42",
      "target": ".price",
      "interval": 300,
      "label": "Product 42 price",
      "diff_mode": "line",
      "browser": false,
      "cooldown": 0,
      "webhooks": ["https://discord.com/api/webhooks/YOUR_ID/YOUR_TOKEN"],
      "proxies": [],
      "user_agents": [],
      "ignore_selectors": [".cookie-banner", "#ad-container"],
      "ignore_patterns": ["\\d+ views"],
      "timeout": 15,
      "headers": {}
    },
    {
      "url": "https://blog.example.com/changelog",
      "target": "//article//p",
      "interval": 3600,
      "label": "Changelog",
      "diff_mode": "semantic",
      "browser": false,
      "webhooks": []
    }
  ]
}

Run:

# Explicit path
watchdiff run --config watchdiff.config.json

# Auto-discovery: if watchdiff.config.json exists in CWD, this also works
watchdiff run

API reference

`WatchDiff`

from watchdiff import WatchDiff
from watchdiff.store import SqliteStore

wd = WatchDiff()                              # JSON store in .watchdiff/
wd = WatchDiff(storage_dir="/data/watchdiff") # custom JSON store path
wd = WatchDiff(store=SqliteStore("db.sqlite"))  # SQLite store

`.watch(url, *, ...)`

Parameter	Type	Default	Description
`url`	`str`	-	URL to watch
`target`	`str \| None`	`None`	CSS selector or XPath. `None` = full page
`interval`	`int`	`300`	Seconds between checks
`label`	`str \| None`	URL	Human-readable name shown in logs
`headers`	`dict`	`{}`	Extra HTTP headers
`timeout`	`int`	`15`	Request timeout in seconds
`ignore_selectors`	`list[str]`	`[]`	CSS selectors to strip before diffing
`ignore_patterns`	`list[str]`	`[]`	Regex patterns to strip from text
`on_change`	`Callable \| list`	`None`	Callback(s) fired on each change
`webhooks`	`list[str]`	`[]`	Webhook URLs to POST on change
`min_changes`	`int`	`1`	Minimum number of changes to trigger alert
`diff_mode`	`str`	`"line"`	`"line"` or `"semantic"`
`browser`	`bool`	`False`	Use Playwright headless browser
`browser_options`	`BrowserOptions \| None`	`None`	Fine-tune Playwright behaviour
`proxies`	`list[str]`	`[]`	Proxy URLs - one picked randomly per request
`user_agents`	`list[str]`	`[]`	UA strings - rotated per request (built-ins used if empty)
`cooldown`	`int`	`0`	Min seconds between two alerts for this URL (0 = disabled)

# Chainable
wd.watch("https://site.com/product", target=".price", interval=300) \
  .watch("https://site.com/stock",   target=".availability") \
  .on_change(lambda r: print(r.summary())) \
  .start()

`.on_change(callback)`

def handle(report):
    print(report.summary())
    for change in report.changes:
        print(change.human())

wd.on_change(handle)

`.start(block=True)`

Start the synchronous scheduler. Blocks until Ctrl+C by default.
Pass block=False to run in the background (daemon threads).

`await .start_async()`

Async variant — use inside an existing event loop (FastAPI, aiohttp, etc.):

import asyncio
from watchdiff import WatchDiff

async def main():
    wd = WatchDiff()
    wd.watch("https://example.com", target="h1", interval=30)
    wd.on_change(lambda r: print(r.summary()))
    await wd.start_async()

asyncio.run(main())

`.check_once(url)`

Run a single immediate check without starting the scheduler loop:

report = wd.check_once("https://example.com")
if report:
    print(report.summary())

`.history(url, limit=20)` / `.reports(url, limit=20)` / `.clear(url)`

Access stored data programmatically:

snaps   = wd.history("https://example.com", limit=10)
reports = wd.reports("https://example.com", limit=10)
wd.clear("https://example.com")

`DiffReport`

report.url           # str
report.target        # str | None
report.label         # str
report.has_changes   # bool
report.added         # list[Change]
report.removed       # list[Change]
report.modified      # list[Change]
report.changes       # list[Change]  - all changes
report.compared_at   # datetime

report.summary()     # "[Book price] 1 modified - 2024-01-15 10:30:00 UTC"
report.as_dict()     # JSON-serialisable dict

`Change`

change.kind     # ChangeType.ADDED | REMOVED | MODIFIED | UNCHANGED
change.before   # str | None  - previous value
change.after    # str | None  - new value
change.context  # str | None  - surrounding text hint

change.human()  # "[~] Changed: '$19.00' - '$24.00'"

Webhooks

WatchDiff auto-detects the target service and adapts the payload:

Service	Detection	Payload
Discord	`discord.com` in URL	`{"content": "..."}` (2000-char limit)
Slack	`hooks.slack.com` in URL	`{"text": "..."}`
Custom	anything else	full `report.as_dict()`

wd.watch(
    "https://example.com",
    webhooks=[
        "https://discord.com/api/webhooks/YOUR_ID/YOUR_TOKEN",
        "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
        "https://your-api.com/watchdiff-hook",
    ],
)

CLI reference

Usage: watchdiff [COMMAND] [OPTIONS]

Commands:
  init      Generate a watchdiff.config.json template
  run       Start continuous monitoring (URL or config file)
  check     Run a single check and print the result
  history   Show snapshot history for a URL
  reports   Show diff reports for a URL
  clear     Delete all stored data for a URL

Options for run / check:
  --target      -t   CSS selector or XPath to watch
  --storage     -s   Storage directory (default: .watchdiff)
  --interval    -i   Seconds between checks (run only)
  --config      -c   Path to a watchdiff.config.json file
  --diff-mode        Diff strategy: line (default) | semantic
  --browser          Use headless browser (requires playwright)
  --cooldown         Min seconds between alerts (0 = disabled)
  --verbose     -v   Enable debug logging

Options for history / reports:
  --limit       -n   Number of entries to show (default 20)

Options for clear:
  --yes         -y   Skip confirmation prompt

Options for check:
  --json             Output raw JSON instead of formatted output

Use cases

E-commerce — track product prices and stock availability
News monitoring — detect article updates or new publications
Documentation — alert when API docs or changelogs change
Public APIs — watch JSON endpoints for schema or value changes
SPA / React apps — monitor JS-rendered content with browser=True
Compliance — audit changes on public-facing pages over time
Research — collect snapshots for longitudinal content analysis

Contributing

Missing a feature? Found a bug? Pull requests are welcome on GitHub.

If you want a feature that is not yet in the project, open an issue or submit a PR directly - contributions of any size are appreciated.

License

This project is licensed under the GNU General Public License v3.0.

You are free to use, study, modify, and distribute this software under the terms of the GPL v3.
Any derivative work must also be distributed under the same license.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

May 29, 2026

This version

0.1.3

May 5, 2026

0.1.2

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

watchdiff_core-0.1.3.tar.gz (84.6 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

watchdiff_core-0.1.3-py3-none-any.whl (47.9 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file watchdiff_core-0.1.3.tar.gz.

File metadata

Download URL: watchdiff_core-0.1.3.tar.gz
Upload date: May 5, 2026
Size: 84.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for watchdiff_core-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`38ea4956bc349376ca33750a01865e37b5108e724fa1984cedd04c4e593a2453`
MD5	`467dcd2bbe388be389299ae067b2afae`
BLAKE2b-256	`70057d88d46fc561734808e2d9ddf403168d3cb4cc8ac39159d220956054a97c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for watchdiff_core-0.1.3.tar.gz:

Publisher: release.yml on r-seize/watchdiff-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: watchdiff_core-0.1.3.tar.gz
- Subject digest: 38ea4956bc349376ca33750a01865e37b5108e724fa1984cedd04c4e593a2453
- Sigstore transparency entry: 1442462864
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: r-seize/watchdiff-py@f00dbb528f23e3d16f6ff263573929708f124494
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/r-seize
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f00dbb528f23e3d16f6ff263573929708f124494
- Trigger Event: push

File details

Details for the file watchdiff_core-0.1.3-py3-none-any.whl.

File metadata

Download URL: watchdiff_core-0.1.3-py3-none-any.whl
Upload date: May 5, 2026
Size: 47.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for watchdiff_core-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d17c3e044345c96067b98a1015818784dbdbf105debca06c462b188a49f341dc`
MD5	`7c12fcbf5da1f300f09ce236225408d3`
BLAKE2b-256	`7ac7e541ebb8a184b6b13738239956e3471f7c0f0d8b1c083a065b07dc820b43`

See more details on using hashes here.

Provenance

The following attestation bundles were made for watchdiff_core-0.1.3-py3-none-any.whl:

Publisher: release.yml on r-seize/watchdiff-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: watchdiff_core-0.1.3-py3-none-any.whl
- Subject digest: d17c3e044345c96067b98a1015818784dbdbf105debca06c462b188a49f341dc
- Sigstore transparency entry: 1442462932
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: r-seize/watchdiff-py@f00dbb528f23e3d16f6ff263573929708f124494
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/r-seize
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f00dbb528f23e3d16f6ff263573929708f124494
- Trigger Event: push

watchdiff-core 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

WatchDiff

At a glance

Quick navigation

Why WatchDiff?

Install

Optional extras

Quick start

Python API

CLI

Features

JavaScript pages with Playwright

Proxy rotation and User-Agent rotation

Semantic diff mode

XPath selectors

SQLite storage backend

CSV and XLSX export

Cooldown anti-spam

Config file workflow (watchdiff init)

API reference

WatchDiff

.watch(url, *, ...)

.on_change(callback)

.start(block=True)

await .start_async()

.check_once(url)

.history(url, limit=20) / .reports(url, limit=20) / .clear(url)

DiffReport

Change

Webhooks

CLI reference

Use cases

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Config file workflow (`watchdiff init`)

`WatchDiff`

`.watch(url, *, ...)`

`.on_change(callback)`

`.start(block=True)`

`await .start_async()`

`.check_once(url)`

`.history(url, limit=20)` / `.reports(url, limit=20)` / `.clear(url)`

`DiffReport`

`Change`