Lightweight web change monitoring library - clean diffs, structured alerts, no AI required.
Project description
WatchDiff
Lightweight web change monitoring - clean diffs, structured alerts, no AI required.
WatchDiff watches web pages and tells you exactly what changed, in plain language.
No noisy HTML diffs. No external services. No AI black boxes.
Why WatchDiff?
Most change detection tools compare raw HTML - which means every minor script reload or ad rotation triggers a false positive. WatchDiff strips the noise first, then diffs only the content that matters.
- Deterministic - same input always produces the same output
- Human-readable diffs - "Price changed: $19 → $24", not a wall of HTML
- Zero external services - snapshots stored locally as JSON
- Async-ready - sync and async schedulers included
- Configurable - target any CSS selector, ignore patterns, set webhooks
Install
pip install watchdiff-core
Or with uv:
uv add watchdiff-core
Quick start
Python API
from watchdiff import WatchDiff
wd = WatchDiff()
wd.watch(
"https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
target=".price_color",
interval=60,
label="Book price",
on_change=lambda r: print(r.summary()),
)
wd.start()
CLI
# One-shot check
watchdiff check https://example.com --target .price
# Continuous monitoring (Ctrl+C to stop)
watchdiff run https://example.com --target .price --interval 60
# Snapshot history
watchdiff history https://example.com
# Diff reports
watchdiff reports https://example.com
# Clear stored data
watchdiff clear https://example.com
How it works
Every check runs through a fixed pipeline:
Fetcher → Cleaner → Parser → DiffEngine → Store → Notifier
- Fetcher - downloads the page via
httpx(sync or async) - Cleaner - strips scripts, styles, ads, and tracking noise
- Parser - extracts the target CSS selector (or full body)
- DiffEngine - compares content using Python's
difflib.SequenceMatcher - Store - persists snapshots and reports as local JSON files
- Notifier - fires callbacks and/or webhooks on detected changes
API reference
WatchDiff
wd = WatchDiff(storage_dir=".watchdiff") # default storage directory
.watch(url, *, ...)
Register a URL to monitor. All keyword arguments are optional.
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str |
- | URL to watch |
target |
str | None |
None |
CSS selector (e.g. .price). None = full page |
interval |
int |
300 |
Seconds between checks |
label |
str | None |
URL | Human-readable name shown in logs |
headers |
dict |
{} |
Extra HTTP headers |
timeout |
int |
15 |
Request timeout in seconds |
ignore_selectors |
list[str] |
[] |
CSS selectors to strip before diffing |
ignore_patterns |
list[str] |
[] |
Regex patterns to strip from text |
on_change |
Callable | list |
None |
Callback(s) fired on each change |
webhooks |
list[str] |
[] |
Webhook URLs to POST on change |
min_changes |
int |
1 |
Minimum number of changes to trigger alert |
All methods are chainable:
wd.watch("https://site.com/product", target=".price", interval=300) \
.watch("https://site.com/stock", target=".availability") \
.on_change(lambda r: print(r.summary())) \
.start()
.on_change(callback)
Register a global callback called whenever any watched URL changes.
def handle(report):
print(report.summary())
for change in report.changes:
print(change.human())
wd.on_change(handle)
.start(block=True)
Start the synchronous scheduler. Blocks until Ctrl+C by default.
Pass block=False to run in the background (daemon threads).
await .start_async()
Async variant - use inside an existing event loop (FastAPI, aiohttp, etc.).
import asyncio
asyncio.run(wd.start_async())
.check_once(url)
Run a single immediate check without starting the scheduler loop.
report = wd.check_once("https://example.com")
if report:
print(report.summary())
DiffReport
report.url # str
report.target # str | None
report.label # str
report.has_changes # bool
report.added # list[Change]
report.removed # list[Change]
report.modified # list[Change]
report.changes # list[Change] (all changes)
report.compared_at # datetime
report.summary() # "[Book price] 1 modified - 2024-01-15 10:30:00 UTC"
report.as_dict() # JSON-serialisable dict
Change
change.kind # ChangeType.ADDED | REMOVED | MODIFIED | UNCHANGED
change.before # str | None - previous value
change.after # str | None - new value
change.context # str | None - surrounding text hint
change.human() # "[~] Changed: '$19.00' → '$24.00'"
Webhooks
WatchDiff auto-detects the target service and adapts the payload format:
| Service | Detection | Payload |
|---|---|---|
| Discord | discord.com in URL |
{"content": "..."} (2000-char limit) |
| Slack | hooks.slack.com in URL |
{"text": "..."} |
| Custom | anything else | full report.as_dict() |
wd.watch(
"https://example.com",
webhooks=[
"https://discord.com/api/webhooks/YOUR_ID/YOUR_TOKEN",
"https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
"https://your-api.com/watchdiff-hook",
],
)
Async usage
import asyncio
from watchdiff import WatchDiff
async def main():
wd = WatchDiff()
wd.watch("https://example.com", target="h1", interval=30)
wd.on_change(lambda r: print(r.summary()))
await wd.start_async()
asyncio.run(main())
CLI reference
Usage: watchdiff [COMMAND] [OPTIONS]
Commands:
run Start continuous monitoring
check Run a single check and print the result
history Show snapshot history for a URL
reports Show diff reports for a URL
clear Delete all stored data for a URL
Options (shared):
--target -t CSS selector to watch
--storage -s Storage directory (default: .watchdiff)
--interval -i Seconds between checks (run only)
--limit -n Number of entries to show (history/reports)
--verbose -v Enable debug logging
--json Output raw JSON (check only)
--yes -y Skip confirmation (clear only)
Use cases
- E-commerce - track product prices and stock availability
- News monitoring - detect article updates or new publications
- Documentation - alert when API docs change
- Public APIs - watch JSON endpoints for schema or value changes
- Compliance - audit changes on public-facing pages over time
License
This project is licensed under the GNU General Public License v3.0.
You are free to use, study, modify, and distribute this software under the terms of the GPL v3. Any derivative work must also be distributed under the same license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file watchdiff_core-0.1.2.tar.gz.
File metadata
- Download URL: watchdiff_core-0.1.2.tar.gz
- Upload date:
- Size: 45.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c91db132a50dbc026842d99963de443072c6ec93dc2a096be8d3486212b5f33
|
|
| MD5 |
252152f135965ad7c5d10299a20862d1
|
|
| BLAKE2b-256 |
a9523c591875b0c862a2366a826d05e6c02a8bfa77be07bbfeea4c8ab5e1d66a
|
Provenance
The following attestation bundles were made for watchdiff_core-0.1.2.tar.gz:
Publisher:
release.yml on r-seize/watchdiff-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
watchdiff_core-0.1.2.tar.gz -
Subject digest:
5c91db132a50dbc026842d99963de443072c6ec93dc2a096be8d3486212b5f33 - Sigstore transparency entry: 1435286318
- Sigstore integration time:
-
Permalink:
r-seize/watchdiff-py@47d645a46ae32824a77e0f438df7489cecd711c7 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/r-seize
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47d645a46ae32824a77e0f438df7489cecd711c7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file watchdiff_core-0.1.2-py3-none-any.whl.
File metadata
- Download URL: watchdiff_core-0.1.2-py3-none-any.whl
- Upload date:
- Size: 34.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e90c2e2614c44c29e72e1ff3fa74c9f2fa037c71ab7923ee88d82ffd92fa529
|
|
| MD5 |
42ecc8b6ac0bd893a6688cd9479b8184
|
|
| BLAKE2b-256 |
81cdc97fd072367ddb0910bf7484da27e5d0684e7981dab84340756d0becba06
|
Provenance
The following attestation bundles were made for watchdiff_core-0.1.2-py3-none-any.whl:
Publisher:
release.yml on r-seize/watchdiff-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
watchdiff_core-0.1.2-py3-none-any.whl -
Subject digest:
2e90c2e2614c44c29e72e1ff3fa74c9f2fa037c71ab7923ee88d82ffd92fa529 - Sigstore transparency entry: 1435286341
- Sigstore integration time:
-
Permalink:
r-seize/watchdiff-py@47d645a46ae32824a77e0f438df7489cecd711c7 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/r-seize
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47d645a46ae32824a77e0f438df7489cecd711c7 -
Trigger Event:
push
-
Statement type: